EXAMINING THE RELATIONS OF MICRO- AND MACRO-LANGUAGE SKILLS TO PERSUASIVE WRITING USING EXPLORATORY AND CONFIRMATORY FACTOR ANALYSIS WITH APPLICATION OF GENERATIVE ARTIFICIAL INTELLIGENCE-DERIVED FEEDBACK By Heqiao Wang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Special Education—Doctor of Philosophy 2025 ABSTRACT This dissertation explores the multifaceted development of persuasive writing skills among secondary school students, with an emphasis on fostering reasoning and argumentation skills essential for targeted writing tasks. Student-constructed essays serve as valuable instruments for assessing scientific literacy and higher-order thinking; however, their evaluation involves many complexities and is susceptible to potential biases and various sources of measurement error. This study addressed two primary objectives. First, it identified key features influencing persuasive essay quality by utilizing product-oriented measures to analyze both microstructural and macrostructural dimensions of writing. A latent structure of writing assessment was established, and measurement invariance was tested across students with differing special education statuses. Second, building on insights from this foundational investigation, automated feedback prompts were developed and implemented using GPT, facilitating AI-based scoring and feedback mechanisms. This research underscores the relevance of factor-analytic findings in addressing gaps related to predictive models and AI-driven content- generation tools, ultimately supporting personalized learning and adaptive written feedback. The implications of this study align with AI’s expanding role in education, offering strategic insights into maximizing AI’s utility for enhancing educational equity and instructional effectiveness. A promising approach involves leveraging factorial models to inform generative AI in delivering tailored feedback, thereby enhancing students’ writing proficiency and overall performance. Copyright by HEQIAO WANG 2025 To my beloved parents and grandparents, whose unwavering love and support have been my anchor throughout this journey. This accomplishment is as much yours as it is mine. iv ACKNOWLEDGEMENTS For years, I’ve rehearsed this moment in my dreams – the chance to express deepest gratitude to those who made this dissertation possible. Now that the time has come, I find myself at a loss for words, knowing that no acknowledgment could ever fully capture the support, wisdom, and kindness I’ve received along the way. First and foremost, I extend my deepest gratitude to my advisor and committee chair, Dr. Gary Troia, whose guidance and support have shaped every stage of my doctoral journey. Your rigorous approach to science and writing not only fostered my academic growth but also ignited my passion for scholarly inquiry. During moments when I was still finding my voice as a researcher, your belief in my potential gave me the courage to push. What I learned from you during the past years provided exactly what I needed to develop as a scholar in education. This dissertation would not exist without your belief in me. My heartfelt thanks also go to Dr. Kevin Haudek, my RA supervisor and committee member, for your invaluable support and guidance in helping me navigate the intersections of my research with broader fields. Working with you has been an intellectually rewarding journey – your insightful perspectives have strengthened my work in AI-enhanced educational assessment and profoundly shaped my research trajectory. I am deeply grateful for the opportunity to learn and grow under your guidance. I am also deeply thankful to my other committee members, Dr. Kylie Gorney and Dr. Troy Mariage, for your keen interest in my work and for your insightful feedback, statistical expertise, and thoughtful contributions to this dissertation. A special thanks to Dr. Mariage – your courses and practical wisdom taught me how to bridge the gap between theory and practice, and I will always treasure collaborating with you on our undergraduate courses at MSU. v To my MSU community – I am profoundly grateful to the faculty I had the honor of TAing for, collaborating with on projects, and learning knowledge from (Dr. Adrea Truckenmiller, Dr. Eunsoo Cho, Dr. Spyros Konstantopoulos); to my incredible cohorts from special education and other programs (Dr. Cherish Sarmiento, Dr. Tingting Li, Eunha Kim) for their camaraderie; to my NGCI colleagues; to my statistical friends (Dr. Bixi Zhang, Shimeng Dai) for their expertise; and to all my friend, both near and far. Living in Lansing while pursuing my doctorate has been a joy, thanks to this wonderful community that surrounded me. Above all, to my family – my eternal thanks to my father, Jingyang Wang, and my mother, Xinchun Wang, for being the wind beneath my wings. You are the unwavering support that has enabled me to soar. You generously invested in my education from my childhood through my master’s degree across the ocean. You’ve given everything you had, even the things you never had the chance to experience yourself, to ensure I could become a better version of myself. To my grandparents, Wei Wang and Huixia Wang, thank you for surrounding my childhood with warmth and joy that its glow still guides me today. The happiness I experienced in my hometown during my early years helped me become the optimistic person I am now, giving me the strength to face and overcome the challenges I encountered along the way. To my boyfriend, Xiaohu Lu, thank you for standing by myside throughout this doctoral journey, both physically and emotionally. Your intelligence, perseverance, and dedication to research have been a constant source of inspiration. I am especially grateful for your patience and for offering comfort during moments of rejection, frustration, and uncertainty. Your support has meant the world to me, and I am confident that a brighter future awaits us. vi TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... viii LIST OF FIGURES ....................................................................................................................... ix Chapter 1: Introduction ................................................................................................................... 1 Chapter 2: Review Of Literature .................................................................................................... 6 Chapter 3: Methodology ............................................................................................................... 65 Chapter 4: Findings and Discussion ............................................................................................. 97 Chapter 5: Conclusion ................................................................................................................ 147 BIBLIOGRAPHY ....................................................................................................................... 155 vii LIST OF TABLES Table 2-1 Grades 3-8 ELA Michigan Testing Performance Level in 2022-2023 ......................... 19 Table 2-2 Average Scale Scores (Percentages) at Each Achievement Level for NAEP Writing Report in 2011 ............................................................................................................................... 20 Table 3-1 Demographics and Descriptive Statistics for the Full Sample ..................................... 70 Table 3-2 Labels and Descriptions of All Corpus Variables......................................................... 75 Table 3-3 Labels and Descriptions of the Study Variables ........................................................... 84 Table 4-1 Means and (Standard Deviations) for Writing-Related Variables ................................ 98 Table 4-2 Correlations Among All Study Writing Related Variables in the Corpus .................. 103 Table 4-3 Factor Structure Coefficients for Micro- and Macro-structural Writing Features ......118 Table 4-4 Comparison of CFA model fit indices ........................................................................ 121 Table 4-5 Loading Estimate, Standard Error, Z-Value, and P-Value for the Higher-Order CFA Model .......................................................................................................................................... 123 Table 4-6 Fit indices for the models testing measurement invariance ....................................... 128 Table 4-7 Invariant and Non-Invariant Factor Loadings, Item Intercepts, and Error Variances in Two SPED Groups ...................................................................................................................... 131 Table 4-8 Results of Two BERT Models on Scoring Prediction Task........................................ 133 Table 4-9 Prompt For GPT ......................................................................................................... 137 viii LIST OF FIGURES Figure 2-1 Rhetorical Triangle (Ramage et al., 2016, p. 55) ....................................................... 33 Figure 2-2 Toulmin’s Model of Argumentation with Examples .................................................. 36 Figure 3-1 Implementation Workflow and Evaluation Methodology .......................................... 90 Figure 3-2 Diagrams for the One-factor, Two-factor, and Higher-order Models Evaluated ........ 93 Figure 4-1 Distribution of Study Sample Across Holistic Score Levels of the PERSUADE 2.0 Corpus ........................................................................................................................................... 97 Figure 4-2 Correlation Heatmap of Writing-Related Variables, Demographic Information, and Essay Scores................................................................................................................................ 101 Figure 4-3 Scree Plot of Parallel Analysis for Microstructural Writing Features ...................... 109 Figure 4-4 Exploratory Factor Analysis Plot ..............................................................................110 Figure 4-5 Diagram of the Higher-Order CFA Model ............................................................... 125 Figure 4-6 Predicted and True Values in BERT-Generic Model ................................................ 134 Figure 4-7 Predicted and True Values in BERT Model Enhanced with Writing Features ......... 135 Figure 4-8 Settings for GPT API Feedback Generation ............................................................. 139 Figure 4-9 Example 1: A 6th-grade student who is not identified as having disability received a score of 3 for the essay (revised essay score = 4.55) .................................................................. 140 Figure 4-10 Example 2: A 11th-grade student who is not identified as having disability received a score of 6 for the essay (revised essay score = 5.79) .................................................................. 142 Figure 4-11 Example 3: A 10th-grade student who is identified as having disability received a score of 2 for the essay (revised essay score = 4.52) .................................................................. 144 Figure 4-12 Boxplot of GPT Revised Essay Scores by Original Essay Score Levels ............... 146 ix Chapter 1: Introduction Effective writing is a foundational and important skill that individuals regularly employ across diverse educational and professional contexts (Attard, 2012; Coker Jr. et al., 2018; Fitzgerald & Shanahan, 2000; Graham & Alves, 2021; Kent & Wanzek, 2016; Troia, 2014). According to established writing models such as the Cognitive Process Theory of Writing (Flower & Hayes, 1981), the Simple View of Writing (Berninger et al., 2002), the Not-So-Simple View of Writing (Berninger & Winn, 2006), and the Direct and Indirect Effects Model of Writing (Y.-S. G. Kim & Graham, 2022), writing proficiency involves a coordinated integration of basic component skills (e.g., grammar, spelling, sentence structure), cognitive thinking processes (e.g., transcription, ideation, interpretation), executive functions (e.g., attention, goal setting, self- regulating), various knowledge domains (e.g., text structure, content, genre), and motivational attributes (e.g., self-efficacy, goal orientation, task interest and value) that writers engage during writing. A competent writer typically begins by acquiring essential skills such as solid transcription and ideation (often around grades 3 or 4), and then advances to more strategic writing by developing metacognitive abilities and gaining increased knowledge, often through instructional practices used at the secondary level. Empirically measuring writing skills presents significant challenges due to the complex nature of cognitive processes (e.g., information processing, problem solving) and psychological factors (e.g., affective stance, emotional regulation) involved. These challenges are particularly pronounced when analyzing open-ended responses and constructed essays, where biases may emerge from multiple sources of measurement error, including raters’ judgments, the backgrounds of writers and raters, individual characteristics of writers, writing prompts employed, and the writing register or genre (Wang & Troia, 2023b). Additionally, districts and schools are increasingly expected to utilize assessment 1 data to monitor students’ progress and interpret their responses to core and tiered instruction (Bondie et al., 2019). Researchers also call for providing personalized instructional practices based on the identification of students’ strengths and weaknesses through assessment (Butterfuss et al., 2022; Philippakos & FitzPatrick, 2018; Troia et al., 2022). Analyzing students’ writing performance typically involves two major approaches to capture the multifaceted features of writing. The first approach involves examining students’ writing-related processes and abilities, such as handwriting or typing fluency, spelling, decoding and word reading, working memory, background and genre knowledge, and motivation. These measures are usually collected through standardized and researcher-designed tasks, tests, and surveys, which help observe and understand students’ performance on these features and their application in varied writing contexts. These foundational processes and abilities are critical as they may influence both the quality and quantity of writing across various tasks. The second approach focuses on analyzing students’ prompted writing products by assigning different writing tasks, genres, purposes, or scenarios. Researchers then use holistic or analytic scoring schemes to quantitatively assess the resultant drafts. Raters and researchers typically examine either microstructural or macrostructural elements, or both, of an essay draft, including features such as linguistic and rhetorical components, content, structure, tone, and style. This approach provides insight into students’ written performance, emphasizing a snapshot of the final product rather than the processes activated during the writing phase. Today, researchers are exploring the automation of diagnostic processes within formative writing assessments using learning analytics tools and techniques. This involves progress monitoring on writing abilities (e.g., typing fluency: Truckenmiller et al., 2019; keystroke logging: Leijten & Van Waes, 2013; knowledge to marshal text evidence: Correnti et al., 2020), 2 quantifying text-based features (e.g., Crossley et al., 2014; McCaffrey et al., 2022), generating macrostructural features (e.g., Edwards, 2003), identifying topical patterns (e.g., Kuzi et al., 2019), and observing emotional and behavioral engagement in writing activities (e.g., Liu et al., 2018; Roscoe et al., 2017), with the aim of enhancing evaluative efficiency in writing assessments. The predictive aspect of writing scores is not the endpoint of the automated process; rather, the understanding gained from the prediction holds empirical significance (Wang & Troia, 2023b). However, there is a lack of research studies on how the prediction process can be further developed for other educational purposes, such as personalizing students’ learning and providing feedback based on their current writing performance. Generative AI has the potential to facilitate this process. 1.1 PRESENT STUDY Many students struggle with persuasive writing. It is essential to conduct a meaningful and informative assessment of persuasive writing performance to understand students’ argumentative capacity. While research has explored various factors influencing the quantity and quality of persuasive writing – such as linguistic/rhetorical measures (Jo, 2022), argumentative structure and substance (P. Stapleton & Wu, 2015), and cohesion and coherence (Andreev & Uccelli, 2024) – there remains a significant gap in understanding the internal structural relationships among these factors, which is crucial for a comprehensive evaluation of argumentative capacity. This study aims to address this gap by introducing a proof of concept for assessing persuasive writing through product-oriented measures, focusing on microstructural and macrostructural levels of performance. Moreover, AI’s ongoing prominence in our zeitgeist emphasizes the importance of maximizing its utility in education. Derived factorial models have potential to guide Generative AI in providing written feedback and enhancing students’ 3 performance, as discussed by Steiss et al. (2024) and Meyer et al. (2024). Specifically, this study seeks to answer three research questions: 1) What textual attributes serve as optimal indicators of persuasive essay quality in secondary school students? 2) To what extent do secondary students with different special needs status (i.e., students with versus without an Individualized Education Plan [IEP]) exhibit significant differences in their holistic writing scores across latent writing attributes? 3) Do essays revised by GPT, a Generative AI application, utilizing prompts derived from factor analysis, demonstrate enhanced performance compared to the original essays written by students? 1.2 ORGANIZATION OF THE DISSERTATION Following this introductory chapter, Chapter Two presents a comprehensive overview of the literature pertaining to writing performance, with a specific focus on persuasive writing at the secondary education level. The chapter establishes the theoretical frameworks or perspectives that underpin the study’s measurement and analysis. Through synthesizing empirical literature, the chapter identifies specific linguistic and structural features that characterize persuasive writing. Furthermore, it justifies the study’s significance by addressing gaps or problems identified in the extant literature. Lastly, the chapter provides a rationale for the study by highlighting its educational significance and potential contributions to the field of writing education. Chapter Three provides a detailed examination of the study’s methodology. This study primarily employs a secondary data analysis of an extant dataset (the PERSUADE 2.0 corpus) approach grounded in quantitative research methods. The chapter begins by outlining the study 4 purposes and restating the three research questions noted above. It then describes the dataset and its variables, including derived variables at both microstructural and macrostructural levels, which are employed to analyze persuasive writing samples from secondary students in the corpus. Furthermore, the chapter elaborates on the research design employed to address the three research questions. Chapter Four organizes and reports the study’s major findings. This includes descriptive results from the exploratory data analysis and the complete results addressing each of the research questions proposed. This chapter also synthesizes and discusses the results in the context of the study’s research questions and theoretical background for each research question. Chapter Five presents practical implications, concluding statements, and outlines future directions. The implications are intended to guide scholars and educators in applying the study’s findings to real-world settings. They offer strategies for integrating human intelligence with automated methods to design more effective writing prompts and essay evaluation tools that facilitate feedback and support accurate quality assessment. The study’s limitations are also noted, and recommendations for future research based on the overall results of the dissertation are included. 5 Chapter 2: Review Of Literature As stated in the preceding chapter, the primary objective of this study is to undertake exploratory and confirmatory factor analyses on Microstructural and Macrostructural Features that underpin Persuasive written composition (MMFP) and their relations to overall quality of persuasive writing. To achieve this aim, the study utilized a large-scale corpus comprising persuasive essays written by middle and high school students with diverse sociodemographic traits. In addition, as a secondary objective, the study sought to validate the practical application of insights derived from MMFP by using them to provide constructive written feedback to students using Generative AI (GenAI) systems, thereby potentially enhancing their persuasive writing skills and informing future instructional decisions related to revision and editing. The initial sections of this chapter offer background information that emphasizes the importance of writing in academic and professional contexts and illustrates students’ current writing performance broadly across all grade levels, but specifically at the secondary level, with a focus on persuasive writing. Subsequently, the chapter also discusses various evaluation concerns related to persuasive writing, informed by prior research that includes supporting theories, empirical findings, and measurement considerations. Lastly, the chapter concludes by discussing research gaps, the study’s approach to addressing these identified gaps, and the educational significance of the methodology employed in this investigation. 2.1 THE ROLE OF WRITING 2.1.1 General Significance of Writing In contemporary society, the written word functions as a versatile tool for attaining varied social and educational objectives (Graham, 2006; Klimova, 2013). For example, correspondence mediums such as letters, postcards, and emails can facilitate communication and foster 6 interpersonal bonds when physical distance separates individuals. The practice of maintaining a personal diary and engaging in self-reflection is recognized for its capacity to promote contemplation on self-identity, develop short/long-term goals, and alleviate feelings of loneliness, thereby yielding psychological and physiological benefits (Smyth, 1998). In professional spheres, composing well-structured reports and summaries may contribute to effective information dissemination and documentation of project outcomes within teams and to organizational leadership. Additionally, writing is acknowledged for its capability to influence others’ perceptions, emotions, and beliefs (Graham et al., 2012) and serves broader social goals such as shaping public opinion (Aldisert, 2009), fostering empathy (Dhurandhar, 2009), and advocating for social justice (Singh, 2011). Within the scope of this study, focusing on the educational domain necessitates an exploration of the pivotal role that writing assumes in academic success. In the context of K-12 English Language Arts (ELA) education, students undergo a developmental trajectory marked by the acquisition of increasingly proficient and sophisticated writing skills, coupled with cognitive and metacognitive strategies, as they progress through grades (Graham & Harris, 2010; G. A. Troia et al., 2013; Wang & Troia, 2023a). At the early stages of learning to write, narrative writing holds a significant place in the early elementary school curriculum, largely due to its resonance with oral language expression (Spencer & Petersen, 2018). Students engage in crafting imaginative worlds, constructing plots, and developing characters. These practices have been empirically demonstrated to enhance students’ capacities for creative thinking (Eser & Ayaz, 2021) and storytelling (Rambe, 2017). Narrative writing also serves as a nexus for reinforcing basic component skills, including transcription, vocabulary and grammar usage, and sentence construction (Olinghouse, 2008; Salas & Silvente, 2020; Wiliana & Djajanegara, 2019). 7 Proficiency in these foundational areas holds considerable implications for overall writing quality and establish a robust foundation for the subsequent acquisition of skills in more complex writing genres that require advanced writing skills (Deane et al., 2008; Puranik & Lonigan, 2014). From grades 3 and 4 onwards, students are expected to read and write informational text across diverse content areas, and this expectation intensifies as they advance through school (Jeong et al., 2010). The significance of the informational genre is underscored by its prevalence in many state and national writing assessments. Research indicates that the proportion of informational text featured in standardized tests can be as substantial as 70% to 80% (see Palumbo & Sanacore, 2009). Students with limited exposure to informational text are prone to achieving lower scores on these standardized assessments (Heider, 2009). Prior studies reveal a positive correlation between students’ proficiency in writing informational text (including those who have received targeted informational writing interventions) and enhanced performance in multiple learning dimensions (Graham & Perin, 2007; Graham et al., 2012). This positive correlation may be attributed to the fact that informational writing can lead to a better understanding of content area concepts (Parson, 2013), increased knowledge about key topics (Taboada & Guthrie, 2006), and improved information processing and abstraction (Fox, 2009). Persuasive writing is another essential writing genre systematically incorporated into K- 12 education. The National Assessment of Educational Progress (NAEP) writing report card (National Center for Education Statistics [NCES], 2012) underscores that students often encounter heightened challenges with persuasive writing more so than narrative, descriptive, or expository registers. Within this genre of writing, students are tasked with explaining complex and interdisciplinary concepts by incorporating specific, relevant details, substantiating their 8 claims, and effectively convincing their intended audience to align with their stance on a topic. By the end of twelfth grade, students are anticipated not only to adhere to the foundational aspects of content standards in the Common Core State Standards (CCSS; National Governors Association Center for Best Practices [NGACBP] & Council of Chief State School Officers [CCSSO], 2010) but also to extend their persuasive composition capabilities, including the ability to differentiate claims for alternate or opposing viewpoints, and to clarify the relationships between claims and reasons, reasons and evidence, and claims and counterclaims. This increased complexity renders persuasive essays more demanding, as outlined in the CCSS in ELA for persuasive writing. Mastery in persuasive writing guides students towards nuanced argumentation (Brockman, 2020), evidence-based reasoning (Hemberger et al., 2017), and effective communication of ideas (F. I. A. Aziz & Ahmad, 2017), thereby fostering a comprehensive and higher-order skills set in written expression. The act of writing is often considered synonymous with the act of thinking. It is a cognitive process, which necessitates the application of analytical thinking skills to creatively and critically organize ideas (Flower & Hayes, 1981; Grimberg & Hand, 2009; Menary, 2007). Previous research studies have illuminated various educational and psychological benefits associated with the act of writing. In the educational realm, as noted earlier, practicing writing across different genres has been identified as a means to enhance students’ learning in multifaceted ways. The efficacy of writing to learn is increased when students are guided to employ specific cognitive and metacognitive strategies of self-regulated learning (Fry & Villagomez, 2012; Hübner et al., 2010). Furthermore, aligning writing tasks with students’ preferred writing approaches can enhance the effectiveness of this process (Kieft et al., 2008). The practice of writing to learn supports students’ active learning, retention, and writing 9 development (Fry & Villagomez, 2012). For instance, research indicates that college students, when engaged in process-oriented writing activities, exhibit a higher level of comprehension of scientific concepts and greater knowledge acquisition compared to controls using multiple- choice assessments to evaluate their understanding of the source materials (Royse et al., 2024). Moreover, the act of writing proves particularly beneficial in bolstering students’ learning outcomes within science, technology, engineering, and math (STEM) disciplines, as it serves as a vehicle for thought, reasoning, and knowledge-in-use (Boscolo & Mason, 2001; McNeill & Krajcik, 2009). Klein (2006) observed that non-STEM majors at the postsecondary level showed greater posttest transfer of scientific concepts when they processed and conveyed new information through writing rather than through verbal expression. However, research comparing students’ oral and written expression of scientific concepts remains limited, and the reasons for the differences between these modalities remain underexplored. Writing often involves the use of more precise and sophisticated vocabulary than oral expression, which may explain why writing- to-learn activities are frequently considered more effective for assessing students’ conceptual understanding in disciplinary education (Chen et al., 2023; Royse et al., 2024; Visser et al., 2018). This potential makes the textual analysis of vocabulary in written form particularly pertinent to this study, as vocabulary plays a critical role in carrying meaning at the secondary school level. This highlights the importance of written expression as a valuable tool for assessing student knowledge. From a psychological perspective, writing can promote self-concept and self-efficacy. For example, in descriptive writing, where students structure their experiences and derive meaning from significant event, the writing process enables their integration of their personal experiences into their self-schema, contributing to the development of more positive self-perceptions 10 (Graybeal et al., 2002). In turn, students with enhanced academic self-concept following writing tend to achieve greater academic success in subsequent academic years (Muijs, 2011; Pajares et al., 1999). This correlation is attributed in part to a clearer self-awareness regarding writing strengths and weaknesses, which empowers writers to identify areas for improvement. Writing can enhance self-regulation and bolster individuals’ control over challenging thoughts and emotions. It allows individuals to actively observe, monitor, and assess their emotional expression and regulation (Schmitz & Perels, 2011). The resulting sense of control over emotions directly contributes to improved well-being and a reduction in negative emotions (C. M. Stapleton et al., 2021). 2.1.2 Academic Significance of Secondary Writing Despite the significant role of writing in educational, psychological, and social domains, a substantial number of students in the United States graduate from secondary schools without attaining proficiency in writing (Graham et al., 2014). Only 27 % of grade 12 students demonstrated performance at or above the “proficient” level in writing according to NAEP (NCES, 2012). This indicates a widespread deficiency across the nation in constructing written responses that effectively achieve the communicative objectives of writing, with “proficient” writing characterized by well-organized and coherent text coupled with appropriate transitions and diverse sentence constructions (S. A. Crossley & McNamara, 2016). In addition, half of twelfth grade learners grapple with rudimentary aspects of writing, such as employing detailed and factual descriptions, making appropriate lexical choices, and utilizing varied sentence structures (e.g., Wang & Troia, 2023a). To understand these concerns, it is imperative to elucidate the educational significance of writing at the secondary level. 11 2.1.2.1 Academic Writing as a Discipline-Specific Skill Guided by the belief that the responsibility for instructing secondary academic writing should be a collaborative effort involving both ELA teachers and educators from other academic fields (Russell, 1997), the Writing Across the Curriculum Clearinghouse serves as a publishing collaborative that has spurred a reevaluation of secondary student academic writing within educational communities. This reconceptualization includes acknowledging students’ literacy and language experiences beyond the classroom, incorporating the principles of Writing in the Disciplines (WID; Blumner & Childers, 2016). WID is dedicated to developing socially mediated communication skills and genre knowledge specific to individual academic disciplines (Broadhead, 1999). In fact, there is a noticeable absence of writing or writing instruction in the typical ELA middle school classroom (Applebee & Langer, 2015; Graham & Perin, 2007), but there has been an intensified focus on writing instruction in other content areas such as science- related subjects (Miller et al., 2016). Aligned with CCSS objectives, writing can be considered as an instrument to facilitate learning across varied content areas, which signifies a significant shift in policy and practice within secondary schools in the U.S. The diversity in disciplinary practices mandates distinct modes of written communication at the secondary level, such as technical writing in the sciences, persuasive writing in business, and argumentative writing in literary analysis (Ezza et al., 2020). To conclude, the secondary school environment is characteristically “discipline-driven” and “discipline-delineated” (Miller et al., 2016). High school students are now required to use writing as a tool for analyzing and reflecting on information in language arts, social studies, science, and various technical subjects. A substantial body of literature has revealed the instructional practices of writing across various disciplinary areas in middle and high school (e.g., Anders & Guzzetti, 2020; Graham et 12 al., 2020; Shemwell, 2020). Drew and colleagues’ (2017) study provides valuable insights into the instructional strategies employed in disciplinary classes, specifically focusing on grades 6 to 12 science classes. Through qualitative analysis of teacher surveys, the study reveals that science teachers recognize the alignment of writing with the broader objectives of science education. They intentionally choose to integrate writing into their science classes, teaching students how to create scientific texts and to acquire and utilize scientific vocabulary, which ultimately supports knowledge building and application. The study further emphasizes that teachers who assign writing tasks that allow students to analyze and synthesize information while employing discipline-specific genres are more likely to promote deep learning in their students. This approach not only enhances students’ understanding but also empowers them to contribute to effective scientific communication, which is also consistent with the overarching goal of science education as outlined by National Research Council (NRC, 2012). Lastly, the study identifies evidence-based practices for teaching writing during secondary science classes by demonstrating the efficiency of incorporating writing strategy instruction within inquiry-based pedagogy to support learners of all proficiency levels. Graham and colleagues (2013) offered a comprehensive national overview of middle- school writing instruction, drawing from participating teachers’ self-reports regarding their preparedness to teach writing, beliefs about teaching responsibilities, utilization of evidence- based writing instruction practices, assessment methods, incorporation of technology, and adaptations for struggling writers. A significant finding from this national survey indicates that middle school teachers perceive the teaching of writing as both a personal and shared responsibility, despite many educators across disciplines lacking sufficient preparation to teach writing and facing substantial constraints on instructional time dedicated to writing. Within the 13 limited instructional time available, teachers focus on employing writing activities to enhance students’ learning and instruct them on techniques for summarizing material they read. On a monthly basis, teachers encourage students to emulate models of proficient writing, provide instruction on fundamental writing skills, evaluate students’ writing using rubrics and other assessment tools, and teach strategies for planning, revising, and crafting paragraphs. This study offers insights into the deficiencies of writing instruction in middle school, highlighting issues such as inadequate time allocated for teaching writing and limited opportunities for students to engage in writing activities. The study suggests that addressing these gaps requires collaborative efforts with educators from various disciplines. Academic writing at the secondary level serves as a valuable foundation for content learning and is intrinsically linked to subsequent postsecondary pursuits. The overarching objective of disciplinary literacy is to equip students with the necessary skills for sophisticated literacy demands in college and careers (G. A. Troia & Maddox, 2004). This preparation is accomplished through focused instruction on discipline-specific literacy strategies within core content-rich areas, such as mathematics, history/social studies, and science, at both the secondary and postsecondary levels (Fang & Schleppegrell, 2010; Shanahan & Shanahan, 2008).Teachers in various disciplines who adopt WID methodologies, whether intentionally or implicitly, impart to their secondary students the expectations and nuances of writing in a collegiate context, which can better facilitate students’ smooth transition to postsecondary writing expectations and tasks. The strategies, knowledge, and skills derived from secondary academic writing and writing instruction can be effectively transferred to ensure success in postsecondary writing endeavors (WWC, 2016). 14 2.1.2.2 Writing Using Cognitive and Metacognitive Abilities In alignment with the CCSS for ELA standards, secondary school writing necessitates students to “write like specialists” (Dressen-Hammouda, 2008). In addition to being discipline- driven, secondary school writing also requires students to master more sophisticated and higher- order writing skills, strategies, and conventions in comparison to elementary school-aged writing, which often concentrates on basic elements such as spelling, handwriting, grammar, and essential text elements and basic genres. The complexity of writing increases for secondary school-aged students, demanding not only cognitive resources for organizing, storing, and activating knowledge and skills in the composition process (Shen & Troia, 2018; Weinstein & Hume, 1998) but also necessitating adequate metacognitive abilities and strategies (Graham & Harris, 2010; Yamson & Borong, 2022). These metacognitive skills enable students to monitor their writing-related thoughts, emotions, and behaviors, maintain a positive attitude toward writing, and optimally utilize cognitive processes to achieve learning objectives (G. Troia, 2014). In secondary classrooms, writing tasks often involve limited analysis, interpretation, and composition (Applebee & Langer, 2015). For example, students frequently encounter writing assignments such as worksheets or brief composing tasks that require reflection on source texts, followed by creating mental representations of the texts and producing responses or summaries based on these cognitive processes (Cer, 2019). More advanced-level writing in secondary education demands that students engage in complex cognitive activities such as estimating the needs of one’s audience, setting long- and short-term rhetorical and personal goals, self- monitoring writing processes and performance over time, and self-evaluating outcomes in comparison with established goals (Graham & Harris, 2010). These cognitive and metacognitive activities not only assist secondary education students in crafting high-quality texts but also help 15 writers supervise and correct written errors, enhance the overall learning process, and contribute to the development and regulation of learners’ awareness at rhetorical and cognitive levels for writing (Ramadhanti & Yanda, 2021). In the meta-analysis conducted by Dignath and Büttner (2008), it was found that metacognitive knowledge and strategies, such as self-regulated learning, can be effectively nurtured at both primary and secondary school levels. The overall effect size (ES) of interventions/programs on students’ writing performance at the secondary school level (ES = 0.71) was observed to be slightly higher than that at the primary school level (ES = 0.68). This may be attributed to the fact that children entering primary school typically exhibit limited reflection and control over their learning compared to their counterparts entering secondary school (Paris & Newman, 1990). Research on metacognitive development indicates that younger or inexperienced student writers often encounter challenges in utilizing metacognitive strategies. This is because they may not have sufficient cognitive capacity available to employ additional strategies alongside the demanding tasks of reading or writing (Alexander et al., 1998). In contrast, older or mature students who have automated the processes of reading and writing have more cognitive capacity available for metacognitive activity. Consequently, they can derive greater benefits from strategy training in this context. Explicit engagement in cognitive and metacognitive activity through writing instruction at the secondary school level yields valuable benefits. It enables students to contemplate their individual learning characteristics and aids them in mastering content (Conley, 2014). Through the entire process, students can assess their knowledge and skills in specific areas, develop a repertoire of strategies to acquire knowledge, and discern appropriate actions for various academic tasks (Bürgler et al., 2021; Hartman, 2001). These components, encompassing self- 16 knowledge, reflective thinking, planning and organizing, employing effective strategies, and evaluating written products, which all involve metacognition, ultimately contributing to students’ success in their postsecondary pursuits (Bauer, 2014; Mytkowicz et al., 2014). 2.2 AN OVERVIEW OF STUDENT WRITING 2.2.1 Current State of Writing Performance With the widespread adoption of the CCSS across the nation, most states have developed or embraced new writing assessments for elementary through high school students (Kelly-Riley, 2017). The rapid expansion and evolution of educational testing in the United States over the past several decades have been significantly influenced by university initiatives aimed at shaping secondary-level curriculum to better prepare students for postsecondary-level work (Ramirez et al., 2018). Students demonstrate significant variability in their written composition performance, with a substantial majority (on average, 73%) performing below proficiency standards across elementary, middle, and high school levels (Truckenmiller et al., 2021). Building on the earlier discussion regarding the role of writing as an indispensable component of the K-12 curriculum, it becomes crucial to assess the current state of students’ writing performance through state and national standardized assessments. 2.2.1.1 State Testing Supported by the federal Race to the Top initiative (U.S. Department of Education, 2009), two state-led consortia, namely the Smarter Balanced Assessment Consortium (SBAC) and the Partnership for Assessment of Readiness for College and Careers (PARCC), launched their novel assessments in the spring of 2015, impacting roughly half of the U.S. Even in states not aligned with these consortia, new writing assessments were formulated to adhere to the CCSS and vie for funding from the Race to the Top program. Over forty-one states include a writing component in 17 their ELA assessments (Jeffery, 2009). In twenty of these states, students must pass the state test to graduate from high school (Kober et al., 2011). Nevertheless, numerous high school students encounter challenges in writing, particularly those with disabilities. In Michigan, the current statewide assessments primarily include the M-STEP (Michigan Student Test of Educational Progress), a summative assessment gauging students’ knowledge and capabilities based on Michigan’s CCSS-aligned academic standards. Additionally, the PSAT (Preliminary Scholastic Assessment Test) assesses eighth grade reading, writing, and math. MI- Access is another test aligned with Michigan’s alternate content expectations, designed for students with significant cognitive disabilities, for whom the M-STEP, even with accommodations, is deemed inappropriate. Michigan’s state assessment program aims to furnish districts and schools with information about students’ proficiency based on the academic standards, aiding in the formulation of continuous improvement goals. These state assessments also offer teachers, parents, and other stakeholders’ insights into individual students’ knowledge and performance in key content areas, ensuring compliance with the Every Student Succeeds Act and the Individuals with Disabilities Education Act. Table 2-1 presents the percentage of students across grades 3 to 8 falling into the various performance level categories of advanced, proficient, partially proficient, and not proficient for M-STEP and PSAT during the 2022-2023 school year. The M-STEP results reveal that only 20.3% to 27.1% of students in Michigan across grades 3 to 7 achieved the proficient level in ELA. Notably, 63.1% of seventh graders are partially or not proficient according to their ELA scores. For Grades 3 to 5, over 40% of students achieved advanced or proficient levels, but this figure decreases starting from grade 6, where less than 40% of students achieved the advanced or 18 proficient levels. The PSAT results also indicate that 40.3% of eighth-grade students did not achieve advanced or proficient levels according to their ELA scores. Table 2-1 Grades 3-8 ELA Michigan Testing Performance Level in 2022-2023 Assessment Name Grade Level Content 3rd Grade 4th Grade 5th Grade 6th Grade 7th Grade 8th Grade Students Advanced/ Proficient 40.9% 44.3% 43.9% 37.5% 36.9% 59.7% 2.2.1.2 National Testing M-STEP M-STEP M-STEP M-STEP M-STEP PSAT Advanced Proficient Partially Proficient Not Proficient Partially/Not Proficient Number Assessed 20.6% 22.7% 17.4% 11.6% 9.8% 37.7% 20.3% 21.6% 26.5% 25.9% 27.1% 22.0% 24.5% 20.0% 21.3% 26.7% 28.1% 15.1% 34.6% 35.6% 34.8% 35.8% 35.0% 25.1% 59.1% 55.7% 56.1% 62.5% 63.1% 40.3% 98,715 97,894 98,403 99,114 98,211 98,932 The NAEP stands as the largest nationally representative assessment of various academic subject areas in the United States (Mo & Troia, 2017a). Enhancing the generalizability of NAEP results and providing actionable policy implications would significantly augment the value of NAEP findings (Williamson, 2006). Specifically, NAEP gauges the writing proficiency of U.S. students by administering assessments to sample groups representative of the nation’s student population. The most recent available data on students’ writing performance is from the 2011 school year, sourced from the U.S. Department of Education, Institute of Education Sciences, National Center for Education for Writing Assessment. The 2011 writing assessment marked the inception of the first NAEP computer-based writing assessment, developed under a new NAEP writing framework (National Assessment Governing Board, 2011) where the substantial role of computers in the writing process was acknowledged. Students were randomly assigned two writing tasks, each requiring 30 minutes for task completion. For eighth-grade students, 35% of prompts focused on writing persuasive essays, 35% on explanatory essays, and 30% on narrative essays. Grade 12 students were assigned 40% of prompts for persuasive essays, 40% for explanatory essays, and 20% for narrative essays. Furthermore, students were expected to engage 19 in various tasks for diverse audiences and participate in processes of generating, revising, and editing. The results of the 2011 NAEP writing assessment for eighth and twelfth graders are detailed in Table 2-2. For eighth graders, 20% scored below basic, 54% were at or above basic and below proficient, 24% were at or above proficient and below advanced, and 3% achieved advanced. Among twelfth graders, 21% scored below basic, 52% were at or above basic and below proficient, 24% were at or above proficient and below advanced, and 3% achieved advanced. These findings indicate a significant lack of proficiency in writing among U.S. secondary school students, reflecting a decline when compared to the 2007 NAEP assessment results (Salahu-Din et al., 2008). This suggests that most secondary school-aged students are not adequately equipped with the necessary writing skills and knowledge for postsecondary education before high school graduation (Williamson, 2006). Table 2-2 Average Scale Scores (Percentages) at Each Achievement Level for NAEP Writing Report in 2011 Grade Level Average Scale Score Below Basic At Basic At Proficient At Advanced 8th Grade 3% 12th Grade 3% 20% 21% 54% 52% 24% 24% 150 150 2.2.2 Secondary Writing Difficulties Through state and national assessment outcomes presented in the previous section, it is evident that a substantial portion of U.S. students do not attain proficiency in writing before graduating from high school. Many of these students, grappling to become more adept users of the discourses required for college-level classes, harbor a belief that they lack skills or talents needed to write well (National Commission on Writing, 2004). This negative perception often leads to a fear of failure in academic writing tasks, resulting in resistance to assignments– manifested through late submissions, incomplete work, resorting to plagiarism, or even failure or withdrawal from the class (Fernsten & Reda, 2011). It is crucial to comprehend the prevalent 20 writing challenges faced by students, particularly those at the secondary school level and, importantly, to understand the root causes of these writing difficulties. Firstly, student writers, particularly those transitioning to junior high school, may still lack maturity in in basic writing skills (Graham et al., 2023). The rapid changes in writing demands at the secondary level may reveal gaps in these fundamental skills, impeding students’ ability to effectively express their ideas. These gaps may include weak essential skills including transcription such as handwriting, typing, spelling, and punctuation, along with challenges in sentence construction arising from limited grammar knowledge, vocabulary, and exposure to diverse sentence structures (G. A. Troia et al., 2011). These factors significantly impact the readability of written text and overall writing performance on composition tasks (Graham et al., 1991). The transition from elementary to secondary school marks a pivotal phase where students are expected to refine their writing capabilities, and the adaptation to these elevated expectations can lead to initial struggles. Secondly, aside from the basic component skills, students entering secondary school may frequently encounter challenges related to higher-order cognitive abilities that are essential for tackling the progressively complicated writing tasks demanded at this educational stage (Demetriou et al., 2020). According to Hayes’ model of writing (1996), the individual writer’s cognitive processes are intricately connected with elements of the task environment, encompassing both social factors (e.g., the audience, collaborators, cultural norms) and physical elements (the composing medium, text already produced, source materials). However, novice writers frequently overlook the importance of engaging in these cognitive writing processes (Salas & Silvente, 2020). Instead, they may rely on a retrieve-and-write process, wherein they compose solely by “generating or drawing from memory a relevant idea, write it down, and use 21 each preceding phrase or sentence to stimulate the next idea” (cited by Graham & Harris, 1997, p. 235). This approach simplifies writing tasks by eliminating the sophisticated advancement of rhetorical goals and simultaneously, minimizes the use of planning, monitoring, editing, revising, and other strategic behaviors (Graham & Harris, 1997). The neglect of these components may limit students’ opportunities to evaluate their ideas and writing goals. Additionally, they are less likely to detect written errors and assess the overall organization of their text. In contrast, writers who actively engage in these writing processes typically employ more diverse and technical vocabulary in their compositions (Koutsoftas & Petersen, 2017) and demonstrate greater accuracy in their grammar usage (Mackie & Dockrell, 2004), among other positive attributes. Thirdly, students’ knowledge base regarding writing and its genres, devices, and conventions can greatly impact their writing and present additional writing challenges (Graham & Harris, 2010). As students progress through middle and high school, writing tasks become more demanding with increased requirements related to writing conventions, prompts integrating reading sources, objectives for intended audiences, etc. (Deane et al., 2008). The process of crafting a coherent text involves the utilization of different types of knowledge (Hayes, 1996; Olinghouse & Graham, 2009; Saddler & Graham, 2007; Troia et al., 2022), including topic knowledge largely derived from source materials, prior knowledge that enables writers to extract and evaluate the information mentioned in the source materials, genre knowledge associated with writing intentions across genres/registers (e.g., narrative, explanatory, declarative, procedural writing knowledge), linguistic knowledge related to mechanics and discourse, and so forth. These knowledge sources collectively impact students’ language use in terms of conventions, grammar, handwriting, and spelling, as well as guiding their adaptation of writing style, structure, and tone to meet the specific requirements of different writing tasks. There is a growing concern among 22 educational scholars that students at the secondary level may struggle to activate and apply their knowledge effectively in their writing endeavors (De La Paz & Graham, 2002; Graham et al., 2014; Trapman et al., 2018). Fourthly, a lack of motivation to write represents another important factor that may undermine students’ writing abilities. An array of writing motivation (including self-efficacy, goal orientation, task interest and value, attributions for outcomes) among secondary school students predict their writing competence (Klassen, 2002; Pajares & Johnson, 1994; Troia et al., 2012). However, students in middle school tend to have diminished self-efficacy beliefs and possess lower self-concepts than students in elementary school (Pajares & Valiante, 1999; 2001). This decline is linked to their perception of high school classes as emphasizing a performance goal orientation and having reduced confidence in their writing abilities, particularly concerning grammar, usage, and other mechanical skills appropriate to the complexity of assigned tasks (Pajares & Cheong, 2003). The decreasing self-efficacy and persistence toward writing at the secondary level renders students pessimistic about their capabilities to generate and organize ideas for writing, impeding their ability to transcribe ideas into sentences (Camfield, 2016; Pajares, 2003). Additionally, secondary students may lack the stamina and working memory capacity needed to both sustain their writing efforts and effectively correct errors in their papers. Factors contributing to writing difficulties among secondary school-aged students can be attributed to both internal and external factors. Internally, psychological elements, such as stereotypes, can affect students from diverse cultural, racial, and educational backgrounds. These students may be conscious of stereotypes about themselves, such as being poor communicators, which can influence their writing behaviors and self-perceptions. For instance, students of color may internalize beliefs that they write less proficiently than their peers from dominant groups, 23 leading to poorer self-perceptions of their writing abilities (Abrahams, 1972). Children identified as at-risk in writing or with disabilities may experience diminished self-perceptions, exacerbated by comparisons with their peers within the classroom setting (Hamilton, 2011; Wright et al., 2021). This may deter their engagement with school writing tasks. Additionally, gender perceptions can impact students’ writing experiences. Girls may perceive themselves as better writers than boys in their class or school, attributing this belief to stronger domain-specific self- concepts and self-efficacy for self-regulation (Pajares et al., 1999). Female writers also may exhibit lower apprehension toward writing tasks (Pajares et al., 1999). These internal factors contribute to the complex landscape of writing difficulties among secondary school students. External factors also wield considerable influence on what challenges students encounter in writing. These factors involve aspects such as teaching styles employed by educators (e.g., Ramli et al., 2020), the quality of instructional materials (Wang & Troia, 2023a), the occurrence of corrected feedback (M. M. Nelson & Schunn, 2009), and the presence of peer support and scaffolding (Taheri & Nazmi, 2021). The absence of writing strategy instruction (Rodríguez- Málaga et al., 2021; Shen & Troia, 2018), writing process instruction (Graham & Sandmel, 2011; Troia & Olinghouse, 2013), and instructional scaffolds for motivation to write (Troia, 2002; Troia et al., 2012) can also hinder students’ academic development. Additionally, the lack of guidance on utilizing digital writing tools can impede students’ performance across various writing tasks (Ekholm et al., 2018). Incorporating these instructional components not only benefits students on diverse writing tasks (Troia et al., 2022) but also enables them to self- monitor and regulate their writing behaviors and outcomes (Graham & Harris, 2010). This, in turn, nurtures an awareness of their writing strengths and weaknesses, fostering metacognition as students engage in reflective thinking about their writing processes and products. 24 2.3 PERSUASIVE WRITING 2.3.1 Definition The definitions of persuasive writing within rhetorical and educational discourse exhibit a degree of uniformity. For instance, Eemeren et al. (2001) characterize persuasion as “a verbal and social activity of reason, with the objective of enhancing (or diminishing) the acceptability of a contentious standpoint for the audience by presenting a set of propositions designed to justify (or refute) the standpoint before a rational judge” (see p. 5). Güneş (2016) posits that persuasive writing involves creating texts by specifying, elaborating, predicting, and justifying reasons in a manner acceptable to others. Purdue Online Writing Lab defines the persuasive essay as a written genre that requires students to explore a topic, gather, generate, and evaluate evidence, and succinctly establish a position on the subject. In the most prevalent U.S. English language arts standards, persuasive writing involves the writer’s endeavor to convince or persuade the audience to adopt a specific point of view or take a particular action through the construction of logical arguments and a cohesive summary (CCSS for English Language Arts and Literacy in History/Social Studies Science and Technical Subjects, 2010). The provided definitions highlight three key aspects of persuasive writing. First, composing a persuasive essay is a social activity by nature, involving dialogue among individuals (e.g., between audience and authors) who may hold divergent perspectives on an issue, aiming to convince each other by providing compelling evidence. Second, the presentation of a constellation of propositions suggests that arguments have a discernible structure and organization, as evidenced in much of the educational literature (Crammond, 1998; De La Paz et al., 2012; Durst et al., 1990; Uccelli et al., 2013). Third, constructing arguments within persuasive essays is an act of reasoning, and individuals of reason employ critical standards to 25 assess the acceptability of a standpoint. These critical standards may include criteria such as the incorporation of argumentative discourse elements, the writer’s attentiveness to audience considerations, and the appropriateness of their argumentative strategies in relation to their objectives (Tindale, 2002). It is widely accepted that effectively defending arguments involves addressing critical questions about the pertinence of chosen argumentative strategies (Ferretti et al., 2009; Johnson, 2014; Walton, 2013). 2.3.2 Importance Persuasive writing is a crucial skill for students to develop, as it is highly regarded in higher education contexts (F. I. A. Aziz & Ahmad, 2017). Additionally, persuasive writing is a challenging task that necessitates the use of sophisticated language to analyze, discuss, and address controversies in a manner that is clear, convincing, and respectful of different perspectives. In the educational context, as per the Michigan K-12 Standards for English Language Arts (Michigan Department of Education, 2010), which are based closely on the CCSS, the academic standards articulate the expected learning progressions for persuasive writing in Michigan. These standards serve as a deliberate framework guiding local curriculum development and emphasize the increasing proficiency expected of students in writing persuasively using various language dimensions as they advance through higher grade levels. These dimensions include adept mastery of vocabulary and syntax, development and organization of ideas, and engagement with progressively demanding content and sources. For instance, when composing persuasive essays, grade 6 students in Michigan are required to build arguments supporting claims with clear reasons and relevant evidence by: (a) introducing claim(s) and organizing reasons and evidence clearly, (b) supporting claims(s) with clear reasons and relevant evidence, using credible sources and demonstrating an understanding 26 of the topic or text, (c) using words, phrases, and clauses to clarify the relationships among claim(s) and reasons, (d) establishing and maintaining a formal style, and (e) providing a concluding statement or section that follows from the presented argument (Common Core State Standards, NGACBP & CCSSO, 2010). While grade 11-12 students are expected to adhere to the aforementioned content standards, they are additionally tasked with extending and refining their ability to introduce precise and knowledgeable claim(s), differentiate the claim(s) from alternate or opposing claims, and devise an organization that logically sequences claim(s), counterclaims, reasons, and evidence. Moreover, they are required to clarify the relationships between claim(s) and reasons, between reasons and evidence, and between claim(s) and counterclaims. A brief examination of these standards reveals that Michigan’s standards (and by proxy, the more broadly adopted CCSS for ELA) aim to develop heightened proficiency in persuasive writing, guiding students toward nuanced argumentation, evidence-based reasoning, and effective communication of ideas. Learning to write persuasive essays offers numerous personal and psychological advantages. These skills are deemed essential for future academic and career endeavors. Firstly, persuasive writing fosters critical thinking as students are prompted to deeply think about and provide rationales for various issues and advocate for their beliefs (Giri & Paily, 2020; Susilawati et al., 2019). Secondly, it hones students’ ability to effectively and logically articulate and organize ideas by leveraging their skills in researching, analyzing, and applying prior knowledge to support their main arguments (Kim et al., 2021; Rubiaee et al., 2019). Thirdly, it encourages students to consider diverse perspectives, fostering empathy and an understanding of the concerns and values of others through perspective-taking approaches (Cho et al., 2021; Hung & Wyer, 2014). Additionally, engaging in persuasive writing can enhance students’ confidence in 27 their communication abilities (McElligott, 2014). Moreover, these persuasive, reasoning, and analytical skills acquired through persuasive writing during K-12 schooling can be applied across various contexts beyond secondary education, contributing to success in different aspects of life (Streibel, 2014). 2.3.3 Performance Composing persuasive papers is universally acknowledged as a challenging mode of communication for K-12 students. Consistent findings from national and international assessments indicate that students face greater challenges in persuasive writing tasks compared to informative or narrative writing tasks (Applebee et al., 1994; Mo & Troia, 2017b). Ferretti and Graham (2019) note the “gradual development” of written persuasion. Nippold and Ward- Lonergan (2010) also characterize persuasive writing as a “demanding communication task that requires sophisticated cognitive and linguistic abilities” (p. 238). Typically, students encounter various challenges when tasked with writing persuasive essays, primarily owing to the complex nature of this form of discourse. One primary factor contributing to these challenges is the limited exposure students have to persuasion until the secondary grades. Instructional focus has traditionally been placed on narrative writing up to that point (Applebee, 1986). The introduction of persuasive writing typically begins in middle school, where students are tasked with constructing persuasive pieces supported by evidence to influence a target audience. The experience with persuasive essays deviates from everyday language or oral arguments (R. Andrews et al., 2009), making students less familiar with the nuances of persuasive text structure and topics. When composing essays, students must enhance their awareness of the persuasive intent, standpoint on the given issue, context of the argument, and their understanding of the dilemma at hand (Jonassen & Kim, 28 2010). While children exhibit audience awareness in oral argumentation from a young age, the shift from oral debate to understanding and articulating written arguments presents challenges that are gradually overcome through formal and explicit instruction (Coirier et al., 1999; Stein & Bernas, 1999). Secondly, writing persuasive essays in secondary school may introduce novel rhetorical and linguistic challenges for students. Effective academic persuasive writing often demands organizing discourse not merely around a sequence of events but “by employing a stepwise argumentative structure to a series of ideas, frequently incorporating later-acquired discourse markers” (e.g., nevertheless, on the one hand; Uccelli et al., 2013, p.40). Academic persuasive essays also surpass the mere expression of emotions or reactions toward events and necessitate that writers articulate their stances toward specific ideas, such as expressing the degree of certainty about their assertions (e.g., it might be that, certainly; Berman & Nir-Sagiv, 2004). Additionally, a persuasive text, akin to other expository subgenres (i.e., compare/contrast, cause/effect, problem/solution), poses inherent challenges related to use of diverse and complex sentence structures (Karasinski, 2023; Deng et al., 2022). Thirdly, composing a persuasive written argument demands a spectrum of higher-order abilities, including perspective-taking, critical thinking, and problem-solving, contingent upon a robust understanding of the subject matter (Toulmin, 2003). The acquisition of these is expected to lay the foundation for the discipline-specific formal argumentation anticipated in subsequent secondary school years (Hillocks, 2002). Interestingly, proficient writers appear to be most cognitively engaged when writing persuasive essays and least engaged when writing other genres such as descriptive essays, suggesting recruitment of more higher-order intellectual abilities (Bouwer et al., 2015). Some persuasive writing tasks also require students to engage in close 29 reading of a source text, followed by comprehending the essence of key ideas, internally debating opposing ideas, and integrating ideas from source material into their own writing. However, students in middle school are not frequently exposed to reading materials incorporating extensive argumentation (Chambliss, 1995), implying that many students will struggle with this register. Fourthly, young writers in early middle school may experience difficulties in effectively translating ideas and knowledge to communicate with a specific group or audience (Scardamalia & Bereiter, 1987). In the case of persuasive essays, students are usually required to articulate arguments in preparation for a classroom discussion, where other students are identified as the intended audience. However, given that the teacher often assumes the role of the discussion leader, students might craft argumentative essays with the teacher as the intended audience. This potential shift in audience could contribute to variability observed in students’ persuasive writing performance. For example, grade 6 students were found to face more challenges when composing persuasive papers for their teacher compared to when writing for a peer, resulting in increased score variance (Crowhurst & Piche, 1979). Existing scholarly research on writing assessment provides conclusive evidence regarding a relationship between various components of the writing task, such as communicative purpose and intended audience, and the overall quality of writing (e.g., Bouwer et al., 2018). Fifthly, students might bring inherent biases to the discussion of a given issue. They might hold pre-existing notions about the topic or subject, which poses a challenge in objectively assessing viewpoints and opposing views with regard to the evidence presented rather than relying on preconceived notions (Perkins et al., 1991). Effectively analyzing, evaluating, and generating persuasive texts requires awareness of one’s biases, the capacity to dismantle those 30 biases, questioning the authority of the source material, and discerning biases from within the text—a constellation of activities many students find challenging, especially at the middle and high school levels (Aziz & Said, 2020; Boyle & Hindman, 2015). 2.3.4 Theoretical Frameworks In the domain of persuasive writing, some theoretical frameworks or perspectives serve as pillars that influence scholarly discourse and provide a structured foundation for framing research inquiries. These frameworks/perspectives offer a systematic approach to mapping out existing research and streamline the synthesis and assessment of literature pertinent to the research questions and design outlined in Chapter Three. Moreover, they facilitate identifying gaps and limitations in prior work, providing insights into how these contributions align with the broader landscape of writing scholarship. In this section, three prominent theories underpinning the inquiry in this area are introduced and elucidated, with relevance to diverse aspects of persuasive writing and writing instruction. 2.3.4.1 Ethos, Logos, Pathos Ethos, logos, and pathos represent the three rudimentary components of persuasive rhetoric introduced by the ancient Greek philosopher Aristotle. To achieve rhetorical efficacy and persuasive success, an author must engage the audience through a nuanced crafting of their argument, strategically considering the means by which audience agreement can be attained (Wachsmuth et al., 2018). Aristotle bestowed these modes of engagement with the Greek terms we continue to employ today. Logos denotes the realm of logic, reason, and rationality. An author relying on logos employs logic, thoughtful structure, and objective evidence to appeal to the audience’s intellect. This involves furnishing information amenable to fact-checking and presenting thorough 31 explanations to substantiate key points. Logical appeals rest upon rational modes of thinking, including elements such as elaboration, cause/effect, deductive reasoning, inductive reasoning, comparison, exemplification, elaboration, and coherent thinking. Pathos entails authors tapping into the audience’s emotions to garner agreement with the author’s claim. Authors employing pathetic appeals seek to evoke a spectrum of emotions, including anger, pride, joy, rage, or happiness. Pathos-based rhetorical strategies aim to induce an audience to open up emotionally to the topic, the argument, or the author. By leveraging emotions, an author can exploit the audience’s vulnerability, leading them to perceive the argument as compelling. This may involve the use of emotionally laden language, expressive descriptions, and vivid imagery to immerse the reader in a specific emotional mindset. Ethos appeals include audience values and authorial credibility or character. When authors make ethical appeals, they seek to resonate with the values or ideologies held by the audience. This connection to values can evoke a sense of what is morally right, fostering an alignment with the author’s argument. The ethical appeal is intertwined with ethos in the sense of authorial credibility. The trustworthiness of the author is determined by their knowledge, expertise in the subject matter, and personal history and traits. Figure 2-1 illustrates the interconnection among the three persuasive appeals of rhetoric within the context of composing persuasive essays. 32 LOGOS How can I make the argument internally consistent and logical? How can I find the best reasons and support them with the best evidence? RHETORICAL STRATEGIES PATHOS ETHOS How can I make the reader open to my message? How can I best appeal to my reader’s values and interests? How can I engage my reader emotionally and imaginatively? How can I present myself effectively? How can I enhance my credibility and trustworthiness? Figure 2-1 Rhetorical Triangle (Ramage et al., 2016, p. 55) The introduction of the Aristotelian rhetorical theory in this context arises from the recognition that incorporating its three appeals in writing is widely perceived as enhancing the effectiveness and persuasiveness of discourse. This observation holds true not only in everyday informal writing, e.g., on social media (Marcotte & Stokowski, 2021; Nelzén, 2018) but also in formal writing genres like legal discourse (McCormack, 2014; Smith, 2014). It extends into educational settings where learners are instructed to manifest writing behaviors and use techniques associated with the three appeals (FitzPatrick & McKeown, 2021; Khairuddin et al., 2021; Mohamad et al., 2022), all aligned with the overarching objective of persuading their intended readers. Within educational settings, writers seeking to appeal to logos often demonstrate proficiency in presenting factual information, statistical data, and various forms of logical evidence to substantiate their arguments. The efficacy of these strategies can be assessed through a quantitative and sorted framework (e.g., Toulmin’s Model, discussed later in this section) which delineates specific elements essential for contributing to the provision of evidence and qualifications. To appeal to pathos, writers strategically deploy language choices related to 33 intonation, stylistic components, and emotionally charged vocabulary. This strategic use of such linguistic tools can be methodically assessed through sentiment analysis of persuasive essays, aiming to elicit an emotional response from the audience. Establishing ethos, on the other hand, involves the writer positioning themselves as an authority on the subject matter. This is often achieved by expressing sufficient content knowledge and a commitment to good character and intentions. Cultivating ethos through discourse relies on the use of rhetorical language to mediate between the speaker and the audience, thereby projecting character traits that evince credibility (Rideout, 2016). Durst et al. (1990) conducted an analysis of persuasive essays composed by high school juniors and seniors with the aim of identifying the specific rhetorical and linguistic features that contributed to raters’ holistic judgements about the overall quality of the essays. The study incorporated three primary persuasive appeals—ethos, logos, and pathos—grounded in Aristotle’s rhetorical theory of persuasion. Their analytic system, a variation of the one developed by Connor & Lauer (1985), employed 23 persuasive appeals to identify features contributing to text persuasiveness. Using a comparable four-point scale to measure students’ utilization of the persuasive appeals (refer to Appendix 5: Persuasive Appeals Rubric in their study for detailed descriptions), Durst and colleagues found notable relationships in correlation analysis with holistic scores. Logical appeals demonstrated a substantial correlation with r = 0.73, while pathetic appeals exhibited a correlation of r = 0.38, and ethical appeals had a correlation of r = 0.27, all statistically significant. In a stepwise regression analysis, the use of logical appeals accounted for approximately 53% of the variance in holistic scores. Students demonstrated more frequent and effective use of logical appeals compared to ethical or pathetic appeals. This discrepancy in appeal usage may be attributed, in part, to teaching practices. The 34 high school writing curriculum placed greater emphasis on the use of logical appeals, with significant instruction on the development of robust reasons and convincing arguments. In contrast, secondary language arts instruction allocated relatively less attention to engaging audience attitudes, values, and emotions, or presenting a caring and knowledgeable image. Overall, this study offers a detailed exploration of rhetorical elements to understand the nature of persuasive writing among high school students. 2.3.4.2 Toulmin’s Model of Argumentation The Toulmin model of argumentation, originally devised to explicate the macrostructure of argumentative essays and nowadays widely adopted as a foundational framework for scientific argumentation within the field of postsecondary education, holds a prominent place in this section. Its inclusion as one of the significant models is justified by its relevance in examining students’ persuasive writing within prior scholarly works (Aziz & Said, 2020; Junaidah Januin, 2021; Liu & Wan, 2020), and its significance in shaping research study design. Importantly, this theoretical model played a pivotal role in the annotation of persuasive elements within the PERSUADE 2.0 Corpus (see Crossley et al., 2022, for details), which is the database used in this secondary data analysis study. Toulmin (1958) formulated his method of argumentation based on a model comprising three essential components integral to persuasive essays. According to this model, an individual (1) puts forth a claim (or main argument), then (2) provides grounds to substantiate that claim, and (3) backs the grounds with a warrant (Karbach, 1987, p.81). Toulmin’s model also further incorporates three additional elements: (4) providing supplementary backing to support the warrant, (5) acknowledging an alternative viewpoint through rebuttal, and (6) introducing qualifiers to indicate that a claim may not universally hold true in all circumstances. These 35 additional elements may be incorporated as necessary. Figure 2-2 illustrates an example of how these elements can be depicted in a persuasive essay. Backing Research links phone use in schools to declining academic performance. Warrant Restricting phones in schools creates a more conducive learning environment. Grounds Phones in class lead to reduced attention and lower academic performance. Qualifier Acknowledging technology’s value, clear policies are needed to ensure responsible use in schools. Mobile phones should not be allowed in schools. Claim Rebuttal While some argue for phones as educational tools, challenges in effective integration remain. Figure 2-2 Toulmin’s Model of Argumentation with Examples In the persuasive genre, developing writers often lack text structure knowledge to differentiate a persuasive schema from more prevalent text structures, which may result in integrating non-argumentative components when responding to persuasive prompts (Crowhurst, 1990). Even when children demonstrate awareness of the structural elements inherent in the argumentative/persuasive genre, they may still struggle to effectively utilize all the essential elements to construct a coherent essay (Wingate, 2012). Moreover, the organization of these structural components to establish a logical flow within the paper proves challenging for school- age writers (Calfee & and Chambliss, 1987). 36 Wolfe et al. (2009) discovered that both persuasiveness and perceived quality of persuasion could be enhanced by presenting and rebutting counterarguments. Although Toulmin’s framework is significant for emphasizing the importance of addressing alternative viewpoints when making claims, researchers face challenges in consistently applying the model. This difficulty arises partially because students’ arguments often span multiple elements. For example, claims may be implicit in persuasive discourse and require deduction. Ambiguity in identifying data, warrants, and backing can lead to coding difficulties and reduced reliability (Simon, 2008). Nevertheless, Toulmin’s model of argumentation remains a prominent framework for classifying text elements in written persuasion (see Aziz & Ahmad, 2017; Aziz & Said, 2020; Magalhães, 2020) and understanding students’ justifications as the basis of their claims and conclusions by analyzing the interrelations between surface structure and substance (Stapleton & Wu, 2015). Zimmerbaum (2014) recognized the necessity for a paradigm shift in ELA classroom instruction at the secondary school level. In response, an instructional framework was developed, employing a critical questions strategy and the Toulmin model of argument. This framework aimed to assist nine eighth-grade participants in developing the essential skills required for constructing logical, reasoned arguments within persuasive writing. The analysis of pretest and posttest results indicated a notable increase of 1.22 points (on a 5-point scale) in the average persuasive writing scores of the nine secondary students. This enhancement was attributed to guided discussions and practice sessions focused on gathering evidence, scrutinizing evidence for claim development, employing critical questions for claim testing and revision (e.g., Where does the topic occur in the text? Whose actions relate to the topic? When do these actions occur?; refer to Appendix O in Zimmerbaum, 2014 for details), and utilizing the Toulmin model as a 37 guiding structure to organize their thoughts about a claim. These outcomes suggest that the incorporation of the Toulmin model of argumentation can play a pivotal role in aiding secondary educators as they nurture the development of argumentative and persuasive skills among their students. Persuasive essays rely on the depth of students’ prior knowledge and the research they can access and comprehend, as claims and evidence must come from these sources. While the Toulmin model helps assess students’ ability to analyze and apply information—particularly in evaluating claims, evidence, and rebuttals—it does not fully address how students activate prior knowledge or conduct research. The effectiveness of this approach depends heavily on how teachers use it to engage students with data and guide them in developing stronger, evidence- based arguments. At the secondary level, persuasive and argumentative essays often find prominence within disciplinary content area classrooms, particularly in subjects like social science and natural science, where students are instructed to articulate, elaborate, reflect upon, and synthesize the laws, theories, principles, and concepts introduced to them (Sampson et al., 2013). Recognizing this, some researchers have extended the application of Toulmin’s model to other disciplinary classes in secondary schools. For instance, Giri and Paily (2020) conducted a study focused on exploring the efficacy of Toulmin's argument structure within the Think-Read-Group- Share-Reflect (TRGSR) scientific argumentation strategy for enhancing critical thinking among secondary students. This quasi-experimental study employed a pretest-posttest control group design, involving 50 twelfth-grade students in total. The experimental group received instruction on Toulmin’s argumentation, while the control group followed the traditional teaching approach. Diverging from previous studies, the primary outcome variable in this research aimed to discern differences in students’ critical thinking abilities (measured via multiple choice or true/false 38 items tapping inferencing, deduction, assumptions, interpretation, and evaluation on the Watson- Glaser critical thinking appraisal Form S) between the two groups. The results revealed a demonstrable improvement in critical thinking ability among students in the experimental group. The teaching strategy incorporating the Toulmin components proved to be relatively more effective (F = 83.12, p < 0.001) compared to the control group’s traditional teaching approach. 2.3.4.3 Writer(s)-Within-Community Model The Writer(s)-Within-Community Model (WWC; Graham, 2018) posits that writing and writing instruction are influenced by the communities or contexts in which they occur, as well as by the cognitive abilities and resources of writers and educators involved in writing tasks within these settings. Developed writers typically undergo five major production processes to accomplish writing tasks (Steiss et al., 2024): conceptualization (creating a mental representation of the task), ideation (generating, developing, and conveying new ideas), translation (converting thoughts into language), transcription (spelling, handwriting, and keyboarding), and reconceptualization (revising and editing). In the current study, the primary focus is on the reconceptualization process, which aligns with the secondary purpose of this research to use MMFP to structure formative feedback provided by generative artificial intelligence (GenAI, specifically GPT). We posit that GPT-generated feedback based on derived MMFP models can aid students in refining their drafts by assessing how their writing aligns with ideal genre standards and providing specific steps for improvement. This assumption is grounded in empirical findings from prior studies, which have shown that writers benefit from multiple sources of information regarding revision, including peer review (Wu & Schunn, 2021), self- assessment (C. S. Johnson & Gelfand, 2013), and teachers’ corrective feedback (Link et al., 39 2022). These approaches have been proven effective in enhancing the quantity and quality of revised drafts and overall writing proficiency. Furthermore, the importance of reconceptualization is particularly highlighted for secondary students involved in persuasive writing tasks. This cohort of learners is at a critical stage in developing persuasive writing skills, which necessitates sophisticated language use and advanced cognitive abilities (Masrul & Yuliani, 2023). In addition, the variation in writing contexts, such as writing purposes, the perceived value of writing, types of writing tasks, and common writing practices, can all significantly influence students’ learning experiences and impact the quality of their written products. Effective writing instructions ideally entails teachers providing frequent and personalized feedback across multiple drafts; however, this process is usually time-consuming and resource intensive. Therefore, well-structured and informative written feedback promptly provided by GenAI is presumed to be more advantageous for secondary students in enhancing their persuasive writing compared to other approaches. 2.3.4.4 Genre Pedagogy In addition to the two theoretical frameworks for how to persuade/argue (the Aristotle and Toulmin frameworks), it is imperative to delineate an instructional framework in this subsection –genre-based pedagogy–which is informed by the sociocognitive pragmatics-based framework and other theoretical perspectives such as Halliday’s systemic functional linguistics (1994). The sociocognitive pragmatics-based framework regards oral and written language use as socioculturally situated cognitive practices (Bazerman & Paradis, 1991; Snow & Uccelli, 2009). This perspective implies that, as writers mature and navigate a broadening array of language- mediated social contexts, they continually acquire new modes of speaking and writing, often shaped by linguistic and cognitive demands (Uccelli et al., 2013). Certain text genres are 40 considered more linguistically and cognitively demanding than others, and individuals vary significantly in their exposure to different communicative contexts (W. Qin & Uccelli, 2016). As a result, it is anticipated that learners may excel in writing within one genre but encounter challenges in another. Secondary academic writing, representing a crucial developmental stage in education, can thus be conceptualized as integral to the growth of young writers. Secondary writing experiences expose students to various disciplinary content areas and help them acquire a flexible repertoire of language forms and functions across a broadening range of social contexts, thereby preparing them for advancing to post-secondary education (Lavelle et al., 2002). Educational linguists have elaborated on the effectiveness of genre pedagogy in teaching learners how to use genre-specific linguistic resources to convey content knowledge, engage in interpersonal relationships, and organize texts within the academic genres that adolescents encounter (Christie, 1998; Rose & Martin, 2012). For instance, in the study conducted by Ramos (2015), researchers implemented a genre-based pedagogy instructional approach that integrated components such as building vocabulary knowledge, facilitating understanding and discussion of ideas and concepts, and offering explicit instruction on linguistic resources, projecting authoritative stances, and constructing well-organized texts. The observed improvement in participants’ writing from pretest to posttest in this study was attributed to the students’ enhanced control over the linguistic resources required to compose academic persuasive essays. In alignment with the Common Core State Standards (NGACBP & CCSSO, 2010), which mandate educators to equip all K-12 learners with the skills to cultivate academic literacy practices, the adoption of genre pedagogy emerges as a potential avenue to achieve this educational objective. In a study conducted by Ramos (2019), twenty students spanning grades 9 to 12, selected from a U.S. northeastern urban public high school, were subjected to an 41 instructional intervention known as the “reading to learn” approach rooted in genre pedagogy. This pedagogical approach involved guiding students through the process of deciphering unfamiliar vocabulary, concepts, and idiomatic expressions within persuasive texts. Additionally, the intervention sought to foster the development of background knowledge concerning common arguments related to the subject topic, along with enhancing students’ linguistic awareness by directing their focus towards metalanguage elements (such as nominalization and causal verbs) embedded in the reading materials. The results of the study demonstrated a substantial increase in participants’ effective utilization of key academic linguistic resources aimed at constructing persuasive discourse, as evidenced by the observed shift in the effective use of the 14 linguistic resources measured (e.g., nominalization, projection, evaluative language, modality) from pretest to posttest. These findings imply that the genre pedagogy-based approach holds promise in facilitating the development of academic literacy practices among adolescents. This implication also guides this study’s design, as a broad range of genre-specific language features that characterize persuasive writing will be explored. These features will be thoroughly addressed in the next section. 2.4 WRITING MEASUREMENT 2.4.1 Features When examining the features of academic writing, educational researchers consistently direct their attention to both the cognitive processes involved in writing and the resultant written product (S. A. Crossley, 2020; M. D. Johnson, 2014; Torrance et al., 2021). Hayes’ cognitive model of writing (1996) outlines three principal stages in the writing process: planning (pre- writing), drafting (composing or translating), and reviewing (editing or revising). Proficient writers demonstrate a capacity to monitor their processes and progress throughout the entire 42 writing endeavor, ultimately producing an acceptable final written product (Graham & Harris, 2013; Graham & Perin, 2007; Laist, 2021). This study’s secondary data analysis specifically emphasizes the attributes of the writing product over writing processes due to: (1) the absence of features related to writing processes in the PERSUADE 2.0 corpus and (2) the belief that features of the writing product can serve as reflections of students’ writing processes, aligning with prior work on the effects of writing process instruction on writing outcomes (e.g., Cutler & Graham, 2008; Graham & Sandmel, 2011; G. A. Troia & Olinghouse, 2013). To delve further, during the pre-writing phase, writers who sophisticatedly organize ideas and establish writing goals exhibit a propensity for generating a writing product characterized by clarity, well-structured language, technical terminology, and creative linguistic choices to explicate the subject matter (De La Paz & Graham, 1997; Olinghouse, 2008). Similarly, engagement in the reviewing phases, wherein writers modify their drafts, tends to yield a high-quality writing product with fewer language and mechanical errors and enhanced coherence (De La Paz & Sherman, 2013; Shen & Troia, 2018b). When evaluating written products across various discourse genres, particularly in persuasive writing that requires multifaceted writing skills, researchers routinely delve into the analysis of microstructural and macrostructural elements (S. Hall-Mills & Apel, 2015; Karasinski, 2023; Richards, 2013). Examining discourse elements at both levels within a text holds considerable promise for capturing the variations in language features across diverse groups in their written expression. The subsequent discussion elaborates on macrostructural and microstructural elements due to their significance for feature extraction and selection in the subsequent data analysis section. 43 2.4.1.1 Microstructural Elements The examination of microstructure elements in written products involves various levels of language, including word, sentence, and discourse levels. Puranik and colleagues (2008) identified productivity, accuracy, and complexity as significant dimensions warranting investigation in students’ writing. In a study by Troia, Shen, and Brandon (2019), multiple measures of written expression were explored as predictors of narrative writing performance for 362 students in grades 4 through 6. At the word level, the token-type ratio for content lexemes indicated word productivity, while mean textual lexical diversity, mean content word frequency, and mean syllables per word acted as proxies for word complexity. Word accuracy was reflected by the proportion of words absent spelling and/or capitalization errors. Moving to the sentence level, metrics such as the percentage of complex sentences, mean words per sentence, and mean words before the main verb offered insights into syntactic complexity. Indicators of sentence accuracy included the percentage of grammatical sentences and mean punctuation errors in a sentence. Finally, at the discourse level, the total number of words written reflected text productivity (which was redundant with number of sentences), while the incidence of connectives and narrativity score served as indicators of discourse complexity. These writing metrics have contributed to a burgeoning body of literature advocating for the adoption of a levels of language framework in the assessment of writing quality. The utilization of a combination of indices encompassing word, sentence, and discourse levels has demonstrated effectiveness in gauging distinctions among written products and the individuals accountable for their creation (see Carvalhais et al., 2021; M. Kim & Crossley, 2018; Sarmiento et al., n.d.; G. A. Troia et al., 2019; Wang & Troia, 2023b). These findings are consistent with existing writing research, highlighting robust correlations between factors such as spelling 44 accuracy, sentence grammaticality, and text length with writing quality. These correlations effectively differentiate between mature and immature writers while also showcasing variations across different grade levels. The subsequent subsections expound upon the empirical findings pertaining to students’ performance at the word, sentence, and discourse levels in their written expression, while also examining variations in writing performance among students with diverse cultural and language backgrounds. Word level Hayes’ (1996) cognitive model of writing emphasizes the cognitive and linguistic resources essential for producing high-quality text. Long-term memory is a pivotal cognitive resource through which writers translate and interpret ideas, experiences, and sensory images into linguistic form (Olinghouse & Wilson, 2013). Adept use of vocabulary can facilitate this complex process. Another theoretical framework validating the important role of the lexicon in text is Scardamalia & Bereiter's (1987) knowledge-telling model, which further elucidates the links between vocabulary and long-term memory. For instance, vocabulary can convey content knowledge, given that many topics necessitate specialized terminology (Harmon et al., 2005). Vocabulary is also implicated in discourse knowledge, with the assumption that it can discriminate the characteristics of different genres of text (Biber, 1989; Hasan, 2014). The acquisition of vocabulary demands substantial cognitive effort and knowledge resources, particularly during middle and high school (Elleman et al., 2019; Groves, 2016). However, vocabulary acquisition is further complicated by the diverse nature of words, some of which may have different linguistic and grammatical forms, such as polysemous words (Bowers, 2011). Students may also face challenges when confronted with idiomatic expressions, making it difficult for them to choose the appropriate meaning of words (Rohmatillah, 2014). These 45 challenges in vocabulary learning, when coupled with insufficient and ineffective vocabulary teaching strategies and interactions, can give rise to more serious problems. These may include difficulties in achieving writing proficiency, language-based problems, elevated dropout rates, and a substantial word gap compared to peers (Armstrong et al., 2018; Duff & Brydon, 2020; Graham et al., 2016). Apart from recognizing the cognitive challenges of vocabulary acquisition often experienced by students, Olinghouse and Wilson (2013) conducted a quantitative study to describe the surface-level characteristics of successful writing concerning vocabulary types. This research serves as an empirical guide for understanding aspects vocabulary instruction should entail, guiding educators on where to place emphasis when teaching students to enhance their vocabulary and evaluate their writing performance. The study investigated the relationship between the targeted vocabulary constructs and text quality across three genres (narrative, informational, opinion). The findings disclosed variations in the type of vocabulary used by fifth- grade students across these genres. High-quality narrative writing showcased greater vocabulary diversity and maturity but less elaboration and fewer academic words. Conversely, strong persuasive compositions featured higher vocabulary diversity and a more formal register. High- quality informative texts exhibited a greater number of content words, increased elaboration, and higher maturity levels. These findings suggest that word selection in writing appears to be influenced by genre, with somewhat different vocabulary constructs contributing to writing quality depending on the genre being scrutinized. In Sarmiento et al.’s (2024) study, researchers extended Olinghouse and Wilson’s (2013) investigation by shifting the focus to vocabulary constructs that represent the academic register in grades 5 and 8. The online automated text analysis tool Coh-Metrix was employed to extract 46 vocabulary features, including mean lexical diversity, mean number of words per sentence, left embeddedness, total number of words, and incidence of causal and adversative connectives. These indicators served to assess students’ use of academic language via lexical diversity and density. The study also considered long words (words with 7 or more letters) and the frequency of academic words within students’ essays to represent their use of complex and academic language. Results indicated that diverse vocabulary predicted writing quality in grade 5 but not in grade 8, but the number of long words was related to both writing quantity and quality at both grades. The researchers suggested that educators can identify characteristics of these long words for targeted instruction to enhance students’ academic success. For instance, proper nouns, nominalizations, gerunds, hyphenated words, and words with affixations are believed to significantly improve both the quality and quantity of students’ academic written language. To sum up, the lexicon assumes a pivotal role in both constructing and interpreting meaningful text (Engber, 1995, p.141). As academic tasks become increasingly complex throughout middle and high school, writers are expected to deploy academic vocabulary effectively to achieve success in their scholarly endeavors (W. Nagy & Townsend, 2012). Hence, students’ vocabulary knowledge and lexical competence hold a fundamental position in quality text production (Maamuujav, 2021). Numerous studies have indicated significant and positive associations of various vocabulary-relevant constructs, such as vocabulary size (LAUFER & NATION, 1995; Tömen, 2016), lexical diversity (Gómez Vera et al., 2016; Sadeghi & Dilmaghani, 2013), use of sophisticated words (Harb, 2018), academic words (Y.-S. Kim et al., 2013; Sarmiento et al., 2024), and word spelling accuracy (S. Andrews et al., 2020; G. A. Troia et al., 2019) with text quality. 47 Prior scholarly research has yielded valuable insights into the types of words deemed suitable for instruction in general writing classrooms. However, students from diverse populations, especially those facing cognitive and psychological challenges, require a nuanced approach to vocabulary learning. For instance, children at risk for learning disabilities often manifest neurologically-based developmental delays in perception, short-term and working memory, self-regulation, phonological awareness, and orthographic knowledge (Breznitz, 1997; Wolf, 1997). These deficiencies can pose challenges in learning vocabulary (e.g., technical terminology), making it difficult to comprehend the meanings, relationships, and conceptual interpretations of important terms (Bryant et al., 2003; Kame’enui & Baumann, 2012). Prior studies have indicated that children with language-based learning disabilities exhibit lower productivity (Mackie & Dockrell, 2004; Scott & Windsor, 2000), reduced lexical diversity (Fey et al., 2004), and less grammatical accuracy when using vocabulary with inflectional endings (Dockrell et al., 2007; Gillam & Johnston, 1992) compared to their typically achieving peers when tasked with composing essays. Similarly, non-native English speakers may confront challenges in vocabulary acquisition, influenced by reasons often rooted in cultural factors. Understanding a word involves familiarity with its literal meaning, varied connotations, syntactic constructions it entails, morphological options it presents, and a rich array of semantic associates such as synonyms and antonyms (W. E. Nagy & Scott, 2000). Dobbs and Kearns (2016) found that English language learners with limited English proficiency were less likely to use and attempt words than their native speaker counterparts when analyzing their language arts standardized testing (i.e., Massachusetts Comprehensive Assessment System) results. 48 Sentence level Syntax refers to the set of rules governing the arrangement of words into larger meaningful units, such as phrases, clauses, and sentences (Kamhi & Catts, 1999). Understanding, activating, and applying syntactic knowledge has been linked to enhanced writing quality in students (Cain, 2007). This correlation is apparent throughout the school years, starting as early as grade 4 when students incorporate more intricate syntactic structures specific to genres in their written compositions (Furey et al., 2017). This developmental progression tends to plateau around grade 7 or 8 (A. J. Truckenmiller & Petscher, 2020). The coherence of writing relies significantly on the adept use of precise and varied grammatical sentence constructions (Witte & Faigley, 1981). Hence, students’ proficiency in comprehending and producing diverse grammatical structures within sentence contexts plays a pivotal role in achieving this (Catts et al., 2006; Cutting & Scarborough, 2006). Constructing texts with diverse sentence structures introduces complexity to the writing task and provides writers with a means to articulate meanings and intentions precisely (Graham & Harris, 2013). Proficient writers skillfully diversify their sentence structures in accordance with the writing purpose and intended audience. For example, short and simple sentences with minimal variation can facilitate the readability and comprehension of the text for less knowledgeable or sophisticated readers, whereas lengthy and complex sentences can prevent monotony, infuse rhythm, and present ideas in a coherent manner. Depending on the writing purpose and genre, incorporating syntactically sophisticated sentences enables the expression of complex ideas within a singular structure and elucidates explicit relationships between sentence constituents (Jagaiah, 2017; H. Kim & Ro, 2023; Maamuujav, 2021). For example, in persuasive writing, establishing close causal links between 49 facts and claims requires manipulating sentence structures to condense intricate information. While complex syntax may present comprehension challenges due to heightened text difficulty, it signifies sophistication in written texts (McNamara et al., 2010). However, the imperative to employ syntactically complex sentences may not be universally applicable across all writing types or registers, given variations in topics, writing purposes, and audiences that impose differing syntactic demands on the writer (Jagaiah et al., 2020). Research suggests that syntactically complex sentences are often associated with the argumentative or persuasive genre, where writers must intricately connect arguments and evidence (Beers & Nagy, 2009). Considerable research in writing has consistently substantiated that higher quality writing generally incorporates more sophisticated and complex syntactic features (Beers & Nagy, 2011; S. A. Crossley, 2020; Deng et al., 2022; Maamuujav, 2022; McNamara et al., 2014), such as longer sentences, a greater variety of sentential structures, increased clausal subordination, a higher rate of production of dependent clauses, and greater phrasal complexity. An array of writing measures that are used to evaluate syntactic structures include clauses per T-unit, words per clause, mean clause length, mean number of words per sentence, mean number of words prior to the main verb of the main independent clause, etc. However, it is worth noting that some studies argue that simple sentences can also be powerful, and the effectiveness may be contingent on the genre and task types (Beers & Nagy, 2011). For example, Crowhurst (1980) found that persuasive essays with high syntactic complexity scores received significantly higher quality ratings than essays with low syntactic complexity in grades 10 and 12, although no difference was observed in grade 6. Conversely, for narratives, the only significant finding was that, in grade 12, low syntactic complexity was associated with higher quality. This may be 50 attributed to the fact that students by the end of secondary education can strategically vary sentence structures to express ideas and captivate their audience. Students with low proficiency in writing and limited writing knowledge, particularly those with learning disabilities and language impairments, often encounter challenges when attempting to craft well-constructed and sophisticated sentences (Saddler & Preschern, 2007). Typically, these struggling writers generate sentences that are less syntactically complex and exhibit consistent grammatical errors (Myklebust, 1980; Puranik et al., 2007). This assertion is supported by previous research; for example, Koutsoftas and Gray (2012) demonstrated that students with language learning disabilities exhibited lower sentence complexity in their written narrative samples compared to their typically achieving peers. However, the two groups did not show a significant difference in sentence complexity measures in expository writing, suggesting that expository writing may necessitate writers to produce more complex sentences than narrative writing. Discourse level Analyzing language samples at the discourse level is another vital aspect of assessing students’ writing quality. It is important to note that the discourse-level features in this secondary data analysis are related to surface-level language features that were validated in a three-level language framework using confirmatory factor analysis (Wilson et al., 2017) rather than the discourse elements that focus on macrostructural elements to construct the entire text (which will be discussed in the next section), despite that some studies may mix these two aspects together. Troia, Shen, and Brandon (2019) conducted an analysis of narrative writing samples from 362 students across grades 4 through 6. The researchers identified discourse-level variables, including text productivity (total words written in the entire essay), text complexity (incidence of 51 connectives), narrativity, and writing process use (evidence of planning, drafting, revising, or a combination thereof), which were significantly associated with writing quality, explaining 15.5% of the variance in quality. It was anticipated that discourse-level features in essay writing contribute to macro-level structures such as organization, style, coherence in content, and other global language conventions that play a substantial role in shaping the overall appearance of the essay. 2.4.1.2 Macrostructural Elements Macrostructure analysis primarily occurs at the discourse level (Scott, 2009). Macrostructure is defined as the “gist,” the “topic of the text,” the “summary,” or the “overall notion of what the text is about” (Witte & Faigley, 1981). Unlike microstructure dimensions of compositions, which usually involve small-unit and surface-level changes that do not greatly impact the overall meaning of a text, macrostructural elements demand changes at the rhetorical level of writing (S. S. Hall-Mills, 2009; Karasinski, 2023). These changes contribute to shaping global structure and cohesion and can serve as reflections of writers’ techniques and abilities. For novice writers, understanding the strategic approach to composing in macrostructural elements can be a challenging endeavor. Existing literature underscores the significance of discourse knowledge that can largely impact the macrostructural levels of text, which varies based on the assigned topic or type of writing task, not only in early writing development but also in higher grades (Olinghouse et al., 2015; Olinghouse & Graham, 2009; Scardamalia & Paris, 1985). Bereiter and Scardamalia (1987) proposed the knowledge-telling model, highlighting the significance of discourse knowledge in the writing development of young students. According to this model, a child develops a mental representation of the writing task by defining the topic and utilizing their 52 discourse knowledge to determine the type of text to be produced. As the child writes, this mental framework directs the retrieval of relevant information from long-term memory. The retrieved information is then evaluated for its alignment with the topic and text type, with discourse knowledge playing a role in this assessment. If the content is deemed suitable, it is transcribed into the written text (Olinghouse & Graham, 2009, pp. 39-41). The resulting text then prompts a further search of long-term memory. Essentially, when employing the knowledge- telling approach to writing, young authors likely draw on their understanding of what makes a compelling story and effective writing, alongside relevant content knowledge, to shape the writing task. They use this mental representation to seek out appropriate content from long-term memory. For instance, in Olinghouse and Graham (2009)’s study, an examination was conducted to ascertain whether five types of discourse knowledge—substantive, procedural, motivational, basic story elements, and irrelevant—made a significant and unique contribution to predicting second- and fourth-grade students’ story writing performance. The hierarchical regression analyses revealed that the collective contribution of the five types of discourse knowledge significantly predicted overall story writing quality, quantity, and lexical diversity. In the subsequent subsections, some prominent macrostructural elements are discussed through empirical findings. Effectiveness of text structure Toulmin’s framework of argumentation has predominantly focused on the field-invariant features of an argument by highlighting its six structural elements. However, there exists limited understanding of the effectiveness of arguments constructed by students even when they have adhered to the basic argument structure in their writing. For example, Clark and Sampson (2007) found that students’ arguments frequently included incorrect scientific concepts, despite 53 exhibiting a relatively sophisticated argument structure. This discovery aligns with much other scholarly research that emphasizes the structure of arguments without delving into the content of the argumentation (D. Liu & Wan, 2020; P. Stapleton & Wu, 2015; Zhang, 2018). In response to this gap, scholars in the field of formal reasoning and argumentation have proposed three criteria–acceptability, relevance, and sufficiency/adequacy–to assess the soundness of arguments (Bickenbach & Davies, 1996). Acceptability pertains to premises that are reasonable to accept as true; relevance is a prerequisite for something serving as evidence for the conclusion; and sufficiency/adequacy implies that all premises, considered collectively, should provide enough support to justify belief in the conclusion (see Crossley et al., 2023, for more details). Various schemes illustrating different degrees of detail for assessing the quality of reasoning have also been proposed (Erduran et al., 2004; Means & Voss, 1996). These criteria provide insightful implications for distinguishing between effective and ineffective persuasive writing. Firstly, it is essential to present ideas in a manner that effectively supports viewpoints, acknowledges alternate perspectives, and illustrates the limited persuasiveness of alternate arguments through comparison to the main arguments. Secondly, the quality of reasoning in arguments should be relevant, accurate, and structurally logical. In other words, both surface structure and substantive content need consideration when evaluating the overall quality of a persuasive or argumentative essay. Sandoval and Millwood (2005) provided insight into the coordination and quality of claims and evidence. Proficient use of structural elements in written argumentation reflects students’ tacit understanding of the meaning of these elements and how they support specific claims. Therefore, the orchestration of claims and evidence serves as a cognitive skill signifying an understanding of the data or concepts that underlie claims. More importantly, it is part of a broader social practice employed to persuade an 54 audience. Students’ argumentation and explanations thus mirror their ideas about what makes an argument persuasive to their intended audience. Tone Tone, in the realm of writing, refers to the use of language tailored to the audience by incorporating appropriate seriousness and markers of politeness and degree of certainty (Kilgannon, 2022). The CCSS for ELA encompass language standards that emphasize “effective choices for meaning or style” (CCSS, 2010; Aull, 2015). Writers are tasked with maintaining an appropriate tone that considers the potential audience to mitigate resistance. In persuasive writing, the tone serves to convince the audience to align with the author’s perspective, taking on an assertive, passionate, or even aggressive quality (Hinkel, 2003; Ho & Li, 2018). The goals of the audience influence various aspects of writing tone, prompting the use of respectful, formal language, markers of politeness, and qualifications. For instance, Midgette, Haria, and MacArthur (2008) present criteria for evaluating the tone of persuasive essays written by fifth- and eighth-grade students, illustrating how the use of respectful, formal language and markers of politeness significantly influences essay coherence and audience engagement. Notably, their study reveals that girls scored higher than boys in persuasive tone, demonstrating greater evidence of adapting language to be acceptable to the reader, resulting in a tone that is more respectful and less prone to rudeness or expressions of anger. Discourse markers In persuasive writing, students are expected to adhere to specific language conventions and employ organizational markers (Dobbs, 2014; Uccelli et al., 2013) that are precise and suitable for the communicative context of discussing issues and arguments, albeit not overly specialized to be heavily discipline-specific (Bailey, 2007). Even very young students have 55 demonstrated attempts at utilizing complex language with discourse markers when engaging in persuasive writing (Nippold et al., 2005). While limited studies have specifically analyzed proficiency in linguistic conventions marking textual organization within persuasive writing, an array of factors has been identified as contributors to higher quality writing, such as the use of cohesive devices (Crowhurst, 1980), rhetorical devices (Connor, 2004), and hedges and (Alward et al., 2012; Durik et al., 2008). In a study focusing on the academic persuasive writing of high school students, Uccelli, Dobbs, and Scott (2013) found that organizational markers indicating sequences of ideas and certain markers of stance, linguistic indicators conveying attitudes toward propositions, contributed to overall writing quality beyond the impact of lexical and syntactic variables. This current study aims to extend these findings and address a research gap by closely examining the academic language resources that middle school students employ in their persuasive writing. 2.4.2 Issues and Concerns The assessment of writing has persistently presented challenges for educators and researchers (Fernsten & Reda, 2011; Huot, 1990; Slomp, 2012; Warschauer & Grimes, 2008), with significant concerns revolving around the complex nature of writing, the constrained size of educational datasets, issues related to scoring validity, and the absence of direct implications for instructional practices. Specifically, the growing prevalence of automated essay scoring techniques over the past decade, which compels a shift towards reliance on large-scale datasets and machine-driven algorithms and a departure from the traditional reliance on human raters, has sparked skepticism regarding the robustness and effectiveness of contemporary approaches to writing assessment (Deane, 2013; Hannah et al., 2023; Wang & Troia, 2023b). 56 The primary cluster of concerns is derived from the fact that writing, by its nature, is a complex, recursive, and sociocognitive activity. This complexity makes it challenging to capture the intended purpose of writing and may also indicate a lack of effective writing constructs involved in measurement to comprehensively assess writing. Deane (2012) proposed a way forward toward a sociocognitive approach to writing analysis. One avenue involves refining the restricted constructs currently involved in automated assessment systems to replicate human scoring in high-stakes contexts, while another pathway suggests that a richer conception of writing should be drawn from various sources that consider the social and cognitive dimensions of writing, as suggested by Flower (1994) and Hayes (2012). The constrained viewpoints in writing assessment may also be attributed to the inclination of researchers to predominately concentrate on scrutinizing microstructural elements, which are frequently identified as responsive to developmental changes across all K-12 grade levels (R. Berman & Verhoeven, 2002) and are deemed more straightforward to identify and extract, in contrast to macrostructural features that represent the abstract portrayal of the global meaning structure (Sanders & Schilperoord, 2006) and that prove to be a more challenging aspect to capture. To capture the complexity of writing, a variety of scoring rubrics such as primary trait, analytic, and holistic scoring rubrics are considered for assessing writing quality in distinct ways. However, ongoing debates persist regarding the preferred rubric choice, with challenges emerging from each method. For instance, analytic scoring, associated with specific rhetorical situations and distinct dimensions, encounters difficulties as these dimensions often interrelate rather than being clearly distinguishable (Huot, 1990; Lloyd-Jones, 1977). This interdependence poses validity concerns for raters and may necessitate a time-consuming and costly scoring process (G. A. Troia et al., 2019). Trait scoring rubrics typically employ dichotomous criteria to 57 determine the presence or absence of construct-relevant ideas in student responses (Kaldaras & Haudek, 2022). Nevertheless, this scoring schema may give rise to challenges associated with imbalanced data (Wang et al., 2024). Conversely, holistic scoring utilizes multi-leveled coding schemes with the objective of delivering a singular, overall judgment of students’ writing quality (Jescovitch et al., 2021). Yet, this approach may confront issues related to data granularity, especially concerning polar scoring levels, where students’ scores often cluster around the middle levels. To conclude, the selection of rubrics often leads to controversy, with insufficient evidence supporting the universal superiority of one scheme over another in human coding (Tomas et al., 2019). The choice of a scoring method is highly contingent upon the specific writing constructs being assessed and the intended purpose of the overall judgement, yet challenges persist in terms of scoring validity and reliability regardless of the chosen scheme. Another cluster of concerns centers around the perceived absence of a clearly articulated framework for recognizing the interconnections between assessment results and their implications for instruction and broader assessment systems. Factors such as types/genres (e.g., integrated vs. independent, narrative vs. non-narrative) and purposes of writing assessment tasks (e.g., formative vs. summative), along with the location/situation of how the writing activities occur (in schools vs. outside-of-school, naturalistic vs. guided), contribute to the diversity and complexity of writing assessment. Educators proficient in writing assessment understand the purpose of the assessment, the appropriateness of instruments and conditions, and the interpretation of assessment data (Inbar-Lourie, 2008). Educators can also play a crucial role in bridging the research-practice gaps in writing assessment and shaping assessments aligned with realistic classroom contexts. GenAI offers educators a powerful tool for developing “mentor texts” that align closely with specific rubrics. This functionality enables educators to illustrate 58 various measurable aspects of writing at both micro- and macrostructural levels in a concrete and accessible manner. However, translating assessment data into actionable insights for teachers can be challenging. Research has shown that assessment data is intertwined with the thoughts and actions of educators, particularly as they model their reasoning processes for students (Ball, 2000; Schunk & Zimmerman, 2007). Hence, assessment data can be conceptualized as a mediator of teaching performance. In the discussion section, this topic will be further explored, focusing on how different types of assessment data can prompt educators to customize their instructional practices effectively to address specific writing features. Contemporary writing assessments often prioritize enhancing modeled performance on scores rather than providing constructive feedback, which is a significant reason why practitioners may perceive writing assessment as less informative due to inadequate feedback to guide students’ improvement (Wilson et al., 2021). Additionally, some assessments encounter ethical concerns and measurement bias as they may, intentionally or unintentionally, overlook variations related to students’ language background, presentation of presumed racial/ethnic identity, and beliefs or conventions specific to a given cultural context (Fan et al., 2019), particularly among underrepresented student groups. In addition to the aforementioned general measurement issues in writing, the evaluation of persuasive essays presents some specific challenges. A critical concern lies in determining the validity of persuasive discourse structures, including the assessment of evidence strength, logical reasoning, and emotional appeal. Raters often apply distinct criteria in evaluating what qualifies as a compelling persuasive element (Christian & Iryna, 2014). Some may utilize a lexicon of surface cue words and phrases to delineate arguments at the sentence level, aiding in the evaluation of individual argument content and enriching the feature set for precise score 59 determination (Burstein et al., 1998). However, this approach has been heavily criticized, asserting the necessity for a more fine-grained analysis of arguments that includes argument components and their interrelations. Another interesting facet of persuasive writing assessment is its context-dependence, where the effectiveness of persuasive strategies varies based on the intended audience. Research has identified that secondary students tend to employ logical appeals with more persuasive power than pathetic or ethical appeals (DURST et al., 1990; Shermis et al., 2013). 2.5 RATIONALE FOR THE STUDY In this section, the rationale for the study is presented alongside identified research gaps and insights gained from the literature review. Additionally, the educational significance that the study aims to achieve is discussed. 2.5.1 Research Gaps Given the documented significance of persuasive writing within secondary education for a diverse student population (Beyreli & Konuk, 2018; Boyle & Hindman, 2015; Thomas, 2014) and its implications for preparing students for postsecondary education and beyond (L. Aull, 2015; Garwood & Van Loan, 2019), this study aims to contribute to the expanding body of literature on persuasive writing assessment and inform pedagogical practices in this domain. The study seeks to address several critical gaps in the extant literature. Firstly, there is a dearth of scholarship examining students’ writing performance at the middle or secondary school levels, particularly in the realm of persuasive writing. Much of the literacy research on existing diagnostic measurement historically prioritizes reading over writing (see Paul & Clarke, 2016, for a review), and narrative over non-narrative essays (see Graham, Kim et al., 2023, for a review). Additionally, research concentrating on evaluating writing abilities among secondary 60 school students often prioritizes objectives such as determining course placement, ensuring program effectiveness, or driving curriculum reform (Bishop et al., 2015; Graham et al., 2005; McKeown et al., 2020), rather than identifying students who may be struggling with writing. These scholarly observations suggest that the available literature in evaluating and addressing writing difficulties among secondary students is lacking compared to the extensive literature focused on reading, native writing tasks, and writing assessment for purposes other than problem identification. Another gap in the extant research can be identified as the predominant focus of existing diagnostic measures on assessing microstructural elements of writing, such as spelling, punctuation, capitalization, and grammar, rather than macrostructural elements such as organization and text content. Specifically, in the realm of persuasive writing, prior studies have primarily concentrated on evaluating the presence or absence of main elements according to Toulmin’s model of argument (F. I. B. A. Aziz & Said, 2020; CRAMMOND, 1998; Junaidah Januin, 2021; Sundari & Febriyanti, 2021) without adequately addressing whether these elements are effectively articulated. Researchers have noted instances where students may include main elements but fail to do so in a manner that enhances persuasiveness (P. Stapleton & Wu, 2015). Moreover, existing research tends to neglect the examination of genre-specific features of text, such as appropriate persuasive tones and content richness. This oversight is significant given that writing standards and admissions tests for college students, such as the ACT, SAT, and GRE, place greater emphasis on macrostructural skills than microstructural skills. Consequently, the current diagnostic measures in secondary writing inadequately assess the types of writing skills that hold utmost importance in university-level writing (Richards, 2013). 61 There is a dearth of research delineating the characteristics of typical and atypical writing in secondary school students. While some studies offer norms for various aspects of typical secondary level writing (R. Andrews et al., 2009; Uccelli et al., 2013), these norms often exclude students who receive special education services. As a result, it becomes challenging to utilize existing literature to discern differences between secondary school students with and without writing difficulties. A differentiated approach to writing remediation is essential for addressing the diverse needs of learners with varying learning profiles. In this context, GenAI has the potential to customize remediation plans tailored to the specific levels of individual students. Current research also neglects to investigate whether students’ sociocultural backgrounds, such as English learner status, might affect the association between writing constructs and essay scores. In terms of measurement domain, much prior research has concentrated on directly examining the raw writing constructs derived from measurement tools and mathematical algorithms. However, this approach may not comprehensively address all writing constructs in a single investigation and may potentially lead to issues such as multicollinearity, overfitting, and unreliability. It is crucial to mitigate these challenges by reducing the multidimensional nature of writing constructs and categorizing them into latent structures that can be validated through statistical testing methods such as structural equation modeling or machine-based clustering algorithms. In this study, this aspect will be addressed by condensing the numerous writing measures into several latent writing constructs related to microstructural elements (word, sentence, discourse) and macrostructural elements (content, tone/style, structure) through exploratory and confirmatory factor analyses. This process will further elucidate the significance of these measures specifically tailored for persuasive writing. 62 Finally, it is also worth noting that the current zeitgeist of integrating GenAI into schooling has prompted educators to carefully consider its ethical and effective use within educational settings. Given the paramount importance of writing in both learning and assessment processes, and the strong alignment of writing tasks with GenAI capabilities, there is a critical need to make thoughtful and well-informed decisions regarding how GenAI should be leveraged to support the development of students’ writing skills (Escalante et al., 2023). In today’s context, with the widespread growth of GenAI in various domains, it can be chaotic as GenAI and its subfields (e.g., prompt engineering) typically necessitate substantial resources and often bewilder users due to their inherent flaws and may potentially undermine public trust. Many scholars are endeavoring to explore how to best harness the capabilities of GenAI (Chan & Hu, 2023). The study objectives align with discourse on this topic by aiming to contribute insights on structuring prompts and optimizing the efficacy of GenAI in educational contexts. This endeavor encompasses considerations of ethical usage, effective integration into pedagogical strategies, and optimizing its potential benefits for students’ writing development through organizing input prompts based on the derived MMFP. 2.5.2 Educational Significance The current study holds significant implications for both researchers and educators in advancing the literature on persuasive writing skills among diverse student populations, considering factors such as gender, English language proficiency, receipt of special education services. Furthermore, this study aims to develop an automated feedback system to inform writing performance for diverse student groups. The anticipated findings are poised to contribute in the following key ways. 63 Firstly, insights into the writing characteristics influencing student writing quality will deepen understanding of secondary school students’ writing abilities, particularly concerning common writing activities at this educational level. Additionally, organizing discrete writing measures into multidimensional latent writing constructs using exploratory and confirmatory methods will bolster subsequent model development and applications, such as validating an automated feedback system for delivering differentiated and individualized feedback and instruction within a formative assessment context. It is also important to consider how feedback is delivered through various practices (e.g., process, strategy, vocabulary, and grammar instruction) and pedagogical tools (e.g., graphic organizers, mentor texts). The discussion section will further explore how GenAI-based learning systems could be utilized to promote writing and examine the potential achievements of our students with this support. Establishing evidence of the importance of writing measures is a crucial validation step, as formative writing assessments play a pivotal role in furnishing educators with actionable information about students’ current and projected levels of achievement. Secondly, the study’s comparisons between typical secondary school students and those who receive special education services aim to shed light on the similarities and differences between these groups, aiding in the identification of atypical writing patterns at the middle and high school levels. Furthermore, the study’s exploration of persuasive writing features linked to overall writing quality and differences in writing performance across groups seeks to guide educators in targeting specific features during intervention to yield significant improvements in overall writing proficiency. 64 Chapter 3: Methodology Chapter Three provides a comprehensive overview of the study’s objectives and thorough descriptions of the persuasive writing corpus to be used and the analysis methods to be employed to achieve the objectives. The mechanisms used to prepare the corpus and extract data for analysis are described in detail. 3.1 OBJECTIVES AND RESEARCH QUESTIONS Building on theoretical frameworks and empirical evidence discussed in Chapter Two, this study investigates the complex dynamics of the existence and effectiveness of language use at both microstructural and macrostructural levels, and its significant impact on the persuasive writing performance of secondary students. The study is descriptive in nature and employs secondary data analysis of an extant large-scale dataset—the PERSUADE 2.0 corpus. The study examines the underlying writing constructs within this corpus, which are then employed to develop and fine-tune a large language model–Bidirectional Encoder Representations from Transformers (BERT)–aimed at predicting human-generated holistic scores of the corpus persuasive essays. Additionally, the study addresses the variability in writing constructs and persuasive performance among diverse student populations. The resulting language model is designed to predict the holistic scores of GPT-revised essays based on specific prompts informed by insights from the MMFP (i.e., Microstructural and Macrostructural Features that underpin Persuasive written composition) system. These scores are subsequently evaluated to ascertain any improvement compared to the original student essay scores. The study addresses the following research questions: 1) What textual attributes serve as optimal indicators of persuasive essay quality in secondary school students? 65 2) To what extent do secondary students with different special needs status (i.e., students with and without an Individualized Education Plan [IEP]) exhibit significant differences in their holistic writing scores across latent writing attributes? 3) Do essays revised by GPT, a Generative AI application, utilizing prompts derived from factor analysis, demonstrate enhanced performance compared to the original essays written by students? 3.2 DATASET The K-12 CCSS for ELA highlights the cultivation of argumentation skills in writing instruction, emphasizing the need for students to achieve proficiency in using valid reasoning and providing relevant and sufficient evidence to support claims. Additionally, more sophisticated components such as counterarguments and qualifiers are deemed essential in the compositions of proficient writers. However, the 2012 NAEP Writing Report Card reveals that only about 25% of students’ argumentative essays were deemed competent. In alignment with the current educational paradigm, which acknowledges the potential of machine-based algorithms and Natural Language Processing (NLP) tools to analyze large-scale corpora and gain nuanced insights into students’ writing performance and capabilities, Dr. Scott Crossley and his research team developed and curated the PERSUADE 2.0 (Persuasive Essays for Rating, Selecting, Analyzing, and Understanding Discourse Elements) corpus. This corpus provides educational researchers with the opportunity to conduct quantitative analyses aimed at understanding the connections between the production of argumentative elements, their effectiveness, and quality ratings, with the expectation that such investigations will offer valuable insights informing classroom practices and assessment. 66 In recent years, approximately 80% of the PERSUADE 2.0 corpus has been made publicly accessible on the Kaggle website1, specifically for two NLP related competitions as part of the Feedback Prize series. The corpus comprises a collection of ~25,000 argumentative essays composed by students in grades 6 through 12 from various regions across the United States. Baffour, Saxberg, and Crossley (2023) provided a comprehensive summary of the outcomes stemming from the application of large language model solutions within the Feedback Prize competition series. Their analysis highlighted the identification of potential biases present in the distribution of labels representing outcomes, agreement levels, and demographic representation within the corpus. Such insights underscore the ongoing commitment of educational assessment researchers to refine and enhance algorithmic models based on the PERSUADE 2.0 corpus. Of particular emphasis in these endeavors are considerations related to student demographic factors, including but not limited to race/ethnicity, SES, English language proficiency, and participation in special education. This study is driven, in part, by the imperative to explore and mitigate such considerations within algorithmic modeling frameworks. This study conducts a secondary data analysis of the PERSUADE 2.0 database, which contains annotated essays including evaluations of argumentative elements, argumentation effectiveness, and holistic writing quality (ranging from 1 to 6). 3.3 PARTICIPANTS More than 25,000 persuasive essays (both source text independent and dependent essays) were gathered for this corpus. Within the dataset, a specific subset of 20,823 textual records incorporates detailed demographic information for individual student writers. The primary focus of this study centers on this particular subset of 20,823 essays for secondary analysis, given the 1 https://www.kaggle.com/datasets/nbroad/persaude-corpus-2 67 critical importance of demographic information in achieving the research objectives. It is important to note that within this subset of 20,823 essays, a total of 2,979 essays, which contain all the necessary features for this investigation, were utilized to conduct factor analysis and multigroup structural equation modeling. The remaining essays were used to validate the large language model for score prediction and to prompt GPT for essay revision. The demographic data encapsulates variables including gender (categorized as male or female), grade level (categorized as grade 6, 8, 9, 10, 11, or 12), English language learning status (categorized as native English speaker or non-native English speaker), race/ethnicity group (categorized as White, Black/African American, Hispanic/Latino, Asian/Pacific Islander, American Indian/Alaskan Native, or Multiracial/Other), socioeconomic disadvantage status (categorized as economically disadvantaged or not economically disadvantaged; per Crossley, Baffour et al., 2022, economic disadvantage refers to student eligibility for federal assistance through programs such as free or reduced-price school meals, Temporary Assistance for Needy Families [TANF], or Supplemental Nutrition Assistance Programs [SNAP]), and student special education status (categorized as having an IEP or not). The exploration of group differences grounded in these demographic factors aims to provide valuable insights into distinct sub- populations, thereby contributing to a nuanced understanding of the persuasive essays within the dataset. Table 3-1 presents a pivot table delineating the demographic characteristics of students included in the corpus disaggregated by grade level. Male and female students are nearly evenly distributed, with male students comprising 48.8% (n = 10,170) and female students constituting 51.2% (n = 10,653) of the cohort. The students are predominantly native speakers of English (89.4%). In terms of racial/ethnic composition, 45.1% identify as White, while the remainder 68 constitutes various Black, Indigenous, and People of Color (BIPOC) groups. A majority of students, 53.5%, are identified as not economically disadvantaged. Additionally, 12.9% of students were identified as having a disability and received corresponding special education services during the data collection period. In terms of grade distribution, the cohort includes 1,372 sixth graders (6.6%), 9,629 eighth graders (46.2%), 20 ninth graders (0.1%), 6,315 tenth graders (30.3%), 3,083 eleventh graders (14.8%), and 404 twelfth graders (1.9%). Acknowledging the insufficient representation of grade 9 students, which may potentially compromise the reliability of findings, the decision was made to exclude all pertinent data and corresponding information pertaining to grade 9 to mitigate this issue. Chi-square tests were conducted to compare demographics across different grade levels (see Table 1). The results indicated significant differences in gender, English language learner status, race/ethnicity, socioeconomic status, and special education status among the five grade levels. The null hypotheses were rejected, and we can reasonably infer that the distribution of students across grades in terms of demographics did not adhere to a uniform pattern. It is important to note that, to address the first research question, which aims to understand the underlying constructs of persuasive writing essays, a subset of 2,977 essays were selected after excluding grade 9 data (n = 2 in the subset). This subset includes all the observed language features necessary for conducting factor analysis. No missing data were detected. The remaining subset will be utilized to address the second research question, which focuses on validating the efficacy of providing improved/revised essays. These essays are derived from those written by peers who scored one point higher on the 6-point scale and from GPT-generated essays prompted based on the MMFP results. 69 Table 3-1 Demographics and Descriptive Statistics for the Full Sample Grade 6 Grade 8 Grade 10 Grade 11 Grade 12 Grand Total Chi-square tests Gender Male Female Grand Total ELL Status Native English Speakers Non-native English Speakers Grand Total Race/Ethnicity Group White Black/African American Hispanic/Latino Asian/Pacific Islander American Indian/Alaskan Native Multiracial/Other Grand Total SES Status Not Economically Disadvantaged Economically Disadvantaged Grand Total SPED Status Not Identified with Disability Identified with Disability Grand Total 633 739 1372 1338 30 1372 732 205 332 40 9 54 4573 5056 9629 9163 455 9629 4741 1532 2230 722 55 349 3158 3157 6315 5207 990 6274 2600 1337 1701 338 50 289 1372 9629 6315 672 700 1372 1299 73 1372 5507 4058 9565 8799 830 9629 2691 3624 6315 4955 1360 6315 1588 1495 3083 2835 248 3083 1290 538 662 440 6 147 3083 2113 970 3083 2730 353 3083 203 201 404 24 380 404 22 55 272 50 0 5 404 121 283 404 336 68 404 10170 10653 20823 18584 2106 20782 9388 3682 5198 1591 120 844 20823 11116 9643 20759 18135 2688 20823 2(4) = 23.563, p < 0.001 2(4) = 3761.4, p < 0.001 2(4) = 2390, p < 0.001 2(4) = 8033, p < 0.001 2(4) = 657.56, p < 0.001 Note. The Race/Ethnicity variable was recoded into a binary format (i.e., White as 0 and all other categories as 1). This recoding was necessary because the chi-square test results were not valid due to some cells having fewer than 10 cases, which is inadequate for a reliable chi-square analysis. 70 3.4 CORPUS VARIABLES 3.4.1 Task Types In addition to participant variables, the PERSUADE 2.0 corpus encompasses several variables pertinent to the writing task. Specifically, one such variable concerns the type of writing assignment. The corpus comprises two distinct sub-corpora, consisting of source-based essays (n = 11,953) and independent essays (n = 8,870). Source-based writing requires students to refer to a provided text, whereas independent writing excludes this requirement. While students may not be required to have specific background knowledge for the independent set, their general understanding of the topic could still influence their writing process. 3.4.2 Prompt Types Concerning prompt types, the source-based set derives from seven distinct writing prompts and related sources, while the independent set is derived from eight unique writing prompts. In total, there are 15 unique writing prompts, and all prompts and sources are accessible within the PERSUADE 2.0 corpus. 3.4.3 Word Count Word count information is also available in the corpus. The mean word count for all students’ persuasive essays is 409.23 words, with a standard deviation of 235.78. The minimum word count is 146 words, and the maximum is 8,922 words. According to Crossley, Baffour et al.’s (2022) description, the PERSUADE corpus was restricted to essays with a minimum of ~150 words, of which 75% of the words had to correctly spelled American English words. These filters were implemented to ensure adequate coverage of argumentative discourse elements in the texts and to guarantee that the essays contained sufficient recognizable language output for the 71 development of NLP features to inform algorithm development, as outlined by Crossley & Kyle (2018). 3.4.4 Discourse Elements An integral aspect of the PERSUADE 2.0 corpus lies in its incorporation of annotations for essential discourse elements within each essay. The process of annotating essays for key argumentative elements is of importance to advancing our understanding of the argumentative strategies employed by student writers. Furthermore, annotations also help discern how these features contribute to successful writing. To facilitate this annotation process, the PERSUADE research team developed an annotation rubric, which addresses three key dimensions: (a) the identification of argumentative elements in essays; (b) the delineation of relations among these elements; and (c) the assessment of the effectiveness of the identified elements. Every essay in the corpus underwent human annotation to identify argumentative discourse elements and their interrelations. Adopting a double-blind rating methodology ensured rigorous evaluation, with each essay independently reviewed by two expert raters, and any disparities adjudicated by a third expert rater. A comprehensive exposition of the rating process is available in Crossley, Baffour et al. (2022). The annotation rubric, specifically designed for the identification and evaluation of discourse elements prevalent in argumentative writing, drew inspiration from the seminal works of Nussbaum, Kardash, and Graham (2005) and Stapleton and Wu (2015). These annotation schemes represent adapted or simplified versions of the Toulmin argumentative framework (1958). The discourse elements that were annotated for each essay are: • Lead – an introduction begins with a statistic, a quotation, a description, or some other device to grab the reader’s attention and point toward the thesis. 72 • Position – an opinion or conclusion on the main question. • Claim – a reason that supports the position. • Counterclaim – a claim that refutes another claim or gives an opposing reason to the position. • Rebuttal – a claim that refutes a counterclaim. • Evidence – ideas or examples that support claims, counterclaims, rebuttals, or the position. • Concluding Statement – a concluding statement that restates the position and claims. 3.4.5 Effectiveness Scores for Discourse Elements In addition to being labeled as distinct rhetorical and argumentative elements (i.e., discourse elements) by expert raters, the corpus also includes effectiveness scores that gauge the quality of these argumentation elements. Raters underwent comprehensive training using a standardized rubric, where they were tasked with assigning each element a score of effective, adequate, or ineffective. A detailed description accompanied each category, providing an illustrative example of a discourse element fitting that specific rating. Crossley and his research team annotated a total of 159,228 elements, ensuring at least a 50% overlap. Among these, ~120,000 elements were scored as adequate, ~32,000 as effective, and ~6,000 as ineffective. Two inter-rater reliability (IRR) statistics were computed: exact agreement and a weighted Cohen’s kappa. The overall exact agreement for all elements stood at .718, with specific elements ranging between .68 and .81. Notably, exact agreement tended to be higher for source-based dependent essays compared to independent essays. Cohen’s kappa of the effectiveness ratings ranged between .17 and .38 for specific elements, demonstrating a comparatively lower level of agreement. Once again, Cohen’s kappa exhibited higher values for 73 dependent essays as opposed to independent essays. The overall Cohen’s kappa for effectiveness across all elements was .32, indicating a fair level of agreement. In instances of disparities in effectiveness ratings, a third expert rater served as an adjudicator to render a final decision. For detailed IRR statistics pertaining to PERSUADE 2.0 effectiveness ratings, refer to Crossley, Baffour et al.’s preprint paper. 3.4.6 Holistic Essay Score The primary focus of this investigation centers on the holistic essay score, which serves as the dependent variable for statistical analyses and for subsequent language model development to predict human-assigned scores accurately. The evaluators responsible for annotating discourse elements for the corpus essays and assessing the effectiveness of each element also holistically scored each essay. In the case of independent essays, raters underwent training utilizing a standardized SAT (i.e., Scholastic Aptitude Test) holistic essay scoring rubric with a 1 to 6 scale; a score of 6 signifies an essay demonstrating clear and consistent mastery of writing. This SAT rubric was slightly adapted for source-based essays with the inclusion of evidence from the source as a criterion for writing quality. An example of the wording for an essay scored 6 using the independent essay rubric reads as follows: “A typical essay employs clearly appropriate examples, reasons, and other evidence to support its position.” In the case of the dependent rubric, the wording was adjusted to state: “A typical essay utilizes clearly appropriate examples, reasons, and other evidence taken from the source text(s) to support its position.” The objective was to align the two rubrics as closely as possible to ensure comparability between scores for independent and source-based essays. As with the effectiveness ratings, expert raters employed a double-blind rating process, with 100 percent adjudication ensuring that each essay received holistic scores from two raters, 74 with a third rater intervening if necessary. Raters underwent training specific to the prompts and essay types used in the PERSUADE 2.0 corpus. Interrater agreement before adjudication indicated robust agreement among expert raters, with a weighted kappa of .75 and a Pearson r value of .75. The holistic scoring rubrics for both independent and source-based essays are provided in the Appendix. The overview of the variables included in the PERSUADE 2.0 corpus, along with detailed descriptions, is presented in Table 3-2. Table 3-2 Labels and Descriptions of All Corpus Variables VARIBLE Essay ID LABEL DESCRIPTOR essay_id_comp Unique identifier assigned to anonymized essays in the corpus Discourse Text full_text Complete text of each essay composed by participating students Holistic Essay Scores holistic_essay_score Word Count Task Type Gender word_count Task Gender Grade level grade_level English Language Learning Status Racial/Ethnicity Group ell_status race_ethnicity Holistic scores (ranging from 1 to 6) assigned by human raters for each essay Total number of written words in each essay Task type for each essay (categorized as independent and source-based task) Demographic information indicating the gender of students (categorized as male or female) Demographic information indicating the grade level of students (categorized as Grade 6, 8, 9, 10, 11, 12) Demographic information indicating the non-native English language learner status of students (categorized as Yes or No) Demographic information indicating the race/ethnic group of students (categorized by White, Black/African American, Hispanic/Latino, Asian/Pacific Islander, American Indian/Alaskan Native, Multiracial/Other) Socioeconomic Status economically_disadvantaged Demographic information indicating the socioeconomic status of students Student disability status student_disability_status Discourse Start Point discourse_start Discourse End Point discourse_end Discourse Type discourse_type Discourse Type Numbers discourse_type_num Discourse Effectiveness discourse_effectiveness (categorized as economically disadvantaged or not economically disadvantaged) Demographic information indicating whether students have a disability requiring an IEP (categorized as having a disability or not having a disability) Starting point of the discourse within the essay for identifying each discourse type, denoted by a continuous numerical representation indicating the location of the first word occurrence within the entire essay Ending point of the discourse within the essay for identifying each discourse type, denoted by a continuous numerical representation indicating the location of the last word occurrence within the entire essay Discourse components within the chosen excerpt of the full persuasive essay, classified into categories of lead, position, claim, counterclaim, rebuttal, evidence, concluding statement, or unannotated segments Numerical representation assigned to discourse elements within the chosen excerpt of the entire persuasive essay. For instance, if claims appear more than once in the full essay, they are designated as “claim 1,” “claim 2,” and so forth The effectiveness of discourse elements, as evaluated by human raters, categorized into three classifications: ineffective, adequate, and effective 75 3.6 DERIVED VARIABLES The investigation of varied linguistic features across diverse student populations and the optimization of automated essay scoring systems to improve modeling performance serves as the impetus for this investigation. In this section, surface-level (i.e., microstructural) and higher meaning-level (i.e., macrostructural) attributes are introduced as derived variables. These attributes may play a pivotal role in discerning both shared characteristics and differences among student groups. Moreover, they may contribute significantly to the development and refinement of large language models for precise score prediction to evaluate GPT-revised essays. The models are expected to facilitate the implementation of AI revision systems that offer insightful revision suggestions to students on their original drafts. 3.6.1 Microstructural Elements In this study, the primary NLP tool employed for textual analysis is the Coh-Metrix 3.0 language analysis tool, which was utilized to computationally evaluate each essay within the corpus. This tool facilitates the extraction of a diverse set of linguistic and textual features, as elucidated by McNamara and Graesser (2012). The software computes 108 linguistic and text- based features, which can be grouped into 11 broad categories reflecting various aspects of written language according to Graesser et al.’s (2004) manual. These categories encompass descriptive information, text easability principal component scores, referential cohesion, latent semantic analysis, lexical diversity, connectives, situation model, syntactic complexity, syntactic pattern density, word information, and readability. It is important to note that the computation of these features is not arbitrary but rather grounded in educational assessment theories. These theories, rooted in discourse comprehension, provide substantive evidence that different dimensions of discourse, such as representations, 76 structures, strategies, and processes, can be systematically measured from surface to deeper levels of meaning (McNamara et al., 2010). For example, cohesion and coherence in written language can play a significant role in text comprehension. Texts with low cohesion are generally more easily understood than those with high cohesion; however, including cohesive elements or discourse markers in high-cohesion texts can enhance comprehension, particularly for less proficient readers (McCarthy et al., 2019). Numerous studies have demonstrated that Coh-Metrix indices, including referential cohesion, latent semantic analysis, and incidence of connectives, can be used to evaluate cohesion and coherence in essays (McNamara et al., 2014; McNamara & Graesser, 2012). The selection of Coh-Metrix indices is also theoretically justified by the levels-of- language framework positing that a writer’s proficiency is organized hierarchically across the subword/word, sentence, and discourse levels of language. Previous studies, as exemplified by the work of Wilson, Roscoe, and Ahmed (2017), have confirmed that nine Coh-Metrix measures extracted from middle school students’ timed constructed responses align with word-, sentence-, and discourse-level skills, maintaining a consistent latent factor structure across grades. In this study, Coh-Metrix (a research version rather than the openly available online tool) was used to evaluate the written texts from the constructed persuasive essays available in the PERSUADE 2.0 corpus. Each student essay underwent evaluation using the levels-of-language framework (refer to Troia et al., 2019 for details). This helps to establish proof of concept for the application of automated measures, grounded in the levels-of-language approach, to facilitate the assessment of persuasive essays authored by a large student population. In this study, a range of measures were extracted from Coh-Metrix 3.0 to potentially evaluate writing skills across word, sentence, and discourse levels of language. The selection of 77 these measures was informed by their relevance as demonstrated in Chapter Two on writing measurements and analysis, as well as their plausible connection to the three language levels. Table 3-3 provides a summary of the selected indices and their interpretations. Word-level indices included nine measures that captured the sophistication and diversity of students’ word choice and vocabulary: mean syllables per word, measure of textual lexical diversity, mean word concreteness, mean word familiarity, mean word frequency, mean word hypernymy, mean word imageability, mean word meaningfulness, and mean word polysemy. Sentence-level indices included 11 measures that evaluated the syntactic complexity and diversity of sentences within students’ texts and syntactic cohesion: mean sentence length, standard deviation of sentence length, mean number of words before main verb, mean number of modifiers per noun phrase, mean minimal edit distance for all words, mean syntactic similarity for all sentences, incidence of agentless passive voice, noun phrase density, verb phrase density, negation density, and referential cohesion. Discourse-level indices included 20 measures that examined features related to voice and perspective, semantic overlap, discourse cohesion, and coherence: incidence of first-person pronouns, incidence of second-person pronouns, incidence of third-person pronouns, argument overlap for all sentences, argument overlap for adjacent sentences, LSA overlap of adjacent paragraphs, LSA overlap of adjacent sentences, LSA given/new information, narrativity Z-score, deep cohesion, total words written, the incidence of all connectives, the incidence of causal connectives, the incidence of logical connectives, the incidence of adversative and contrastive connectives, the incidence of temporal connectives, the incidence of additive connectives, the incidence of positive connectives, the incidence of negative connectives, and readability. 78 3.6.3 Macrostructural Elements 3.6.3.1 Argument structure elements and effectiveness The analysis of argument structure and its effectiveness is integral to the study. As aforementioned, the PERSUADE 2.0 corpus is characterized by its inclusion of annotations pertaining to argument structure and corresponding effectiveness scores (ineffective, adequate, effective) assigned by human raters to each identified element within a student persuasive essay. These features served as an important cluster of predictors for holistic scores of persuasive essay quality. 3.6.3.2 Content attributes Constructing persuasive arguments represents a pivotal skill for students as it facilitates the articulation of their thoughts regarding pertinent issues, grounded in their topical knowledge derived either through source material or world knowledge (Duschl & Osborne, 2002). The entirety of their essays’ content, particularly the arguments employed to articulate their stance, holds significance within the realm of literacy (Evagorou et al., 2023). This significance stems from the potential to reflect the students’ depth of understanding and mastery of the subject matter, as well as their proficiency in synthesizing and analyzing information effectively. Given the persuasive nature of the essays, their content also serves to underscore language proficiency in conveying ideas with clarity and persuasiveness, both of which are imperative competencies in academic and professional domains for secondary-level students. For this study, a text mining approach was utilized to compute the score point level using maximum cosine similarity across all score points and within each prompt to represent the content of the essay. This extraction method aligns with the approach utilized in Zupanc and Bosnic (2017) and Attali (2011). Cosine similarity measures the similarity between document 79 pairs by assessing the angle between their vectors, with values ranging from 0 to 1. A higher value signifies greater semantic and content similarity between the documents. Specifically, the cosine similarity of each essay was computed with human-assigned holistic scores ranging from 1 to 6, respectively, resulting in six content feature scores. The maximum cosine similarity score was retained as a metric representing the similarity between the target essay and the comparison essays. Because there are fifteen different prompts (with three of these prompts excluded because their corresponding texts did not contain all necessary features) in the corpus, cosine similarity scores were also calculated separately within each prompt. This approach yielded six content feature scores, which depict the content similarity between the target essay and the remaining essays across various score levels within the same prompt. 3.6.3.3 Style/Tone Sentiment analysis is a field of measurement that investigates attitudes, opinions, and emotions directed towards a specific entity (Mostafa, 2017). It delves into the unique and evolving tones adopted by individual writers in relation to their subjects or topics (Shermis et al., 2013). For instance, when a writer intends to express disagreement with a scenario, sentiment analysis sheds light on the negativity, positivity, or neutrality inherent in the language used (Janda et al., 2019). In the context of automated essay scoring, sentiment analysis assumes a significant role, particularly in persuasive essays. Writers, especially in this genre, must articulate and substantiate their viewpoints on a subject, wherein the tone and manner of expressing textual sentiment significantly impact their writing. When analyzing students’ texts in the PERSUADE 2.0 corpus, well-crafted persuasive essays clearly articulate the opinion of the student writer and build support for their standpoint by expressing relevant ideas and concepts for their argument. Sentiment expressions unveil a 80 writer’s judgments, evaluations, and feelings, likely used to express a preference for a particular position or to highlight the shortcomings of an alternative position. Furthermore, a considerable body of prior work has indicated variations in persuasiveness and tones across diverse student cohorts. Notably, students with disabilities encounter persistent difficulties in the domain of persuasive writing (Gleason, 1999; Nippold et al., 2005). For example, a persuasive essay may involve choosing a particular stance, developing structured paragraphs that substantiate facets of an opinion or argument, and persuading the reader to align with one’s viewpoint (P.-L. Chuang & Yan, 2023; Englert et al., 2009). Specific difficulties exhibited by students with disabilities in crafting persuasive (argumentative or opinion) essays include adopting an inappropriate narrative writing style, using unsupported or nonexistent evidence, overlooking opposing perspectives, and presenting an argument that is consistent with the contrary standpoint (De La Paz et al., 2012; Gleason, 1999; Wissinger & De La Paz, 2020). SEntimental ANalysis and Cognition Engine (hereafter, SEANCE) is a sentiment analysis tool used for academic discourse that relies on multiple preexisting sentiment, social-positioning, and cognition dictionaries. SEANCE offers a user-friendly interface, featuring 254 core indices and 20 component indices based on recent advancements in NLP sentiment analysis. Beyond the core indices, SEANCE allows for customization, enabling the inclusion of specific parts of speech and control for instances of negation. Notably, in SEANCE, any negated target word is disregarded within the relevant category. For instance, in processing the sentence "He is not happy," the term "happy" would not be counted as a positive emotion word. This methodology has demonstrated efficacy in excluding approximately 90% of negated words (Hutto & Gilbert, 2014). 81 SEANCE also incorporates the Stanford part of speech (POS) tagger (Toutanova et al., 2003) as implemented in Stanford CoreNLP (Manning et al., 2014). The POS tagger facilitates POS-tagged specific indices for nouns, verbs, and adjectives. This is crucial in sentiment analysis, as adjectives, verbs, and adverbs may convey unique aspects of sentiment more emphatically (Hatzivassiloglou & McKeown, 1997; Hu & Liu, 2004; Subrahmanian & Reforgiato, 2008). SEANCE provides reports on both POS and non-POS variables. Significantly, many vectors in SEANCE remain neutral regarding POS, allowing accurate processing of poorly formatted texts that may pose challenges for a POS tagger. In this study, component scores extracted from the SEANCE analysis tool were used to measure aspects of emotion, cognition, and social order within the essays. These scores included the component score of negative adjectives (NRC negative adjectives, NRC disgust adjectives, NRC anger adjectives, GI negative adjectives, Lu Hui negative adjectives), the component score of social order (RC ethics verbs, GI need verbs, and RC rectitude words), the component score of positive adjectives (Lu Hui positive adjectives, Bader positive adjectives, GI positive adjectives, and Laswell positive affect adjectives), the component score of joy (NRC joy adjectives, NRC anticipation adjectives, NRC surprise adjective), and the component score of trust verbs (NRC trust verbs, NRC joy verbs, and NRC positive verbs). These scores collectively represent the tone of the essays. Additionally, effective persuasive writing is accessible to readers because writers adhere to shared conventions for organization and signal their stance on a particular topic. In their research, Uccelli, Dobbs, and Scott (2013) discovered that the frequency of organizational markers and epistemic hedges significantly predicted high school students’ persuasive essay writing quality. In academic persuasive writing, students are typically expected to employ 82 language markers that are more precise and suitable for the communicative context than those used in everyday interactions. However, these markers should not be overly specialized to the point of being discipline-specific (Bailey, 2007). To address this requirement, this study draws inspiration from the work of Islam, Xiao, and Mercer (2020), who compiled separate lists of hedge words, hedging phrases, and booster words. They also developed a rule-based algorithm that detects sentence-level hedges using these lexicons. Their NLP package and language marker lists are accessible on GitHub and align with the methodology employed in this study. Specifically, the stance markers were categorized into booster words (~128), discourse markers (~360), and hedge words (~80). Collectively, these stance markers were identified to enrich the analysis of students’ persuasive writing. The labels and descriptions of all the derived variables involved can be found in Table 3- 3. 83 Table 3-3 Labels and Descriptions of the Study Variables Coh-Metrix Measure Label Brief Description Microstructural Elements Word Level 1 Mean syllables per word 2 MTLD 3 Mean word concreteness 4 Mean word familiarity 5 Mean word frequency 6 Mean word hypernymy 7 Mean word imageability 8 Mean word meaningfulness 9 Mean word polysemy Sentence Level 10 Mean sentence length 11 Standard deviation of sentence length 12 Mean number of words before main verb 13 Mean number of modifiers per noun phrase 14 Mean minimal edit distance for all words 15 Mean syntactic similarity for all sentences 16 Incidence of agentless passive voice 17 Noun phrase density 18 Verb phrase density 19 Negation density 20 Referential Cohesion Discourse Level Incidence of first-person pronouns 21 Incidence of second-person pronouns 22 23 Incidence of third-person pronouns 24 Argument overlap for all sentences 25 Argument overlap for adjacent sentences 26 LSA overlap of adjacent paragraphs 27 LSA overlap of adjacent sentences 28 LSA given/new information 29 Narrativity Z-score 30 Deep cohesion 31 TWW 32 All connectives DESWLsy LDMTLD PCCNCz WRDFAMc WRDFRQc WRDHYPnv WRDIMGc WRDMEAc WRDPOLc DESSL DESSLd SYNLE SYNNP SYNMEDwrd SYNSTRUTt DRPVAL DRNP DRVP DRNEG PCREFz WRDPRP1s WRDPRP2 WRDPRP3s CRFAOa CRFAO1 LSAPP1 LSASS1 LSAGN PCNARz PCDCz DESWC CNCAll Word length, number of syllables, mean Lexical diversity, MTLD, all words Text Easability PC Word concreteness, z score Familiarity for content words, mean Average word frequency for content words Hypernymy for nouns and verbs, mean Imageability for content words, mean Meaningfulness, Colorado norms, content words, mean Polysemy for content words, mean Sentence length, number of words, mean Sentence length, number of words, standard deviation Left embeddedness, words before main verb, mean Number of modifiers per noun phrase, mean Minimal edit distance, all words Sentence syntax similarity, all combinations, across paragraphs Agentless passive voice density, incidence Noun phrase density, incidence Verb phrase density, incidence Negation density, incidence Text easability PC Referential cohesion, z score Use of first-person pronouns (personalization) Use of second-person pronouns (informality) Use of third-person pronouns Argument overlap, all sentences, binary, mean Argument overlap, adjacent sentences, binary, mean Semantic similarity across paragraphs Semantic similarity across sentences Semantic similarity of new text to prior text Text Easability PC Narrativity, z score Underlying conceptual cohesion Total Word Written All connectives incidence 84 Table 3-3 (cont’d) 33 Causal connectives 34 Logical connectives 35 Adversative and contrastive connectives 36 Temporal connectives 37 Additive connectives 38 Positive connectives 39 Negative connectives 40 Readability CNCCaus CNCLogic CNCADC CNCTemp CNCAdd CNCPos CNCNeg RDFRE Causal connectives incidence Logical connectives incidence Adversative and contrastive connectives incidence Temporal connectives incidence Additive connectives incidence Positive connectives incidence Negative connectives incidence Flesch Reading Ease Macrostructural Elements Structure 41 All elements 42 Lead 43 Position 44 Claim 45 Counterclaim 46 Rebuttal 47 Evidence 48 Concluding statement 49 Total effectiveness score for all elements 50 Effectiveness score for lead 51 Effectiveness score for position 52 Effectiveness score for claim 53 Effectiveness score for counterclaim 54 Effectiveness score for rebuttal 55 Effectiveness score for evidence 56 Effectiveness score for concluding statement Concluding_Statement_effective Effectiveness scores of annotated concluding statement Content 57 Content score 1 Occurrences of all annotated elements Occurrences of annotated lead Occurrences of annotated position Occurrences of annotated claim Occurrences of annotated counterclaim Occurrences of annotated rebuttal Occurrences of annotated evidence Occurrences of annotated concluding statement Effectiveness scores of all annotated elements Effectiveness scores of annotated leads Effectiveness scores of annotated positions Effectiveness scores of annotated claims Effectiveness scores of annotated counterclaims Effectiveness scores of annotated rebuttals Effectiveness scores of annotated evidences All_elements Lead Position Claim Counterclaim Rebuttal Evidence Concluding Statement All_effective_score Lead_effective Position_effective Claim_effective Counterclaim_effective Rebuttal_effective Evidence_effective C1 58 Content score 2 59 Content score 3 60 Content score 4 61 Content score 5 C2 C3 C4 C5 85 Maximum similarity score between target essay and other essays scored as 1 Maximum similarity score between target essay and other essays scored as 2 Maximum similarity score between target essay and other essays scored as 3 Maximum similarity score between target essay and other essays scored as 4 Maximum similarity score between target essay and other essays scored as 5 Table 3-3 (cont’d) 62 Content score 6 C6 Maximum similarity score between target essay and other essays scored as 6 Tone/Style 63 All stance markers 64 Booster words 65 Discourse markers 66 Hedge words 67 Negative adjectives component 68 Social order component 69 Positive adjectives component 70 Joy component 71 Trust verbs component All stance markers incidence Booster words incidence Discourse markers incidence Hedge words incidence all_markers booster_words discourse_words hedge_words Negative_adjectives_component Component score of negative adjectives Social_order_component Positive_adjectives_component Joy_component Trust_verbs_component Component score of social order Component score of positive adjectives Component score of joy Component score of trust verbs 86 3.7 RESEARCH DESIGN 3.7.1 Data Preparation 3.7.1.1 Subsample for Factor Analysis To ensure data readiness and the accurate extraction of language features, a subset of 2,977 essays underwent screening and correction for common grammatical and mechanical errors by two trained human raters. Interrater reliability for this subset was evaluated using a two-way random-effects model with absolute agreement intraclass correlation (ICC). The resulting ICC values for all identified errors are reported in this subsection. No missing data was detected in the subset. Normality assessments were performed using absolute values of skewness and kurtosis (z-skew and/or z-kurtosis ≥ |3.29| corresponding to an alpha level of 0.001, as suggested by Tabachnick & Fidell (2013), alongside P-P plots. Twenty-seven out of 71 variables exhibited a clear departure from normality: LDMTLD, PCCNCz, DESSL, DESSLd, SYNLE, SYNMEDwrd, DRPVAL, DRNEG, PCREFz, WRDPRP1s, WRDPRP2, WRDPRP3s, LSASS1, LSAGN, PCDCz, DESWC, CNCCaus, CNCADC, CNCTemp, CNCNeg, RDFRE, negative component score, joy component score, boosters, hedges, discourse markers, and all markers. It is worth noting that transforming these variables may result in the loss of their inherent scaling properties. Furthermore, these features are exclusively utilized for factor analysis, and the choice to fine-tune the estimator parameter (e.g., maximum likelihood with robust standard errors) to address non-normality issues has been made. Hence, it was decided not to transform these variables to achieve normality. 87 3.7.1.3 Subsample for Constructing and Validating Large Language Model The remaining subset of the data (n = ~22,000) was dedicated to subsequent phases of large language model construction and validation. Specifically, two columns within the subset of data were used for this purpose: the first column contains the full text data, which serves as the foundation for constructing the large language model. No missing data were identified within this subset. The second column used is holistic essay score, which serves as the ground truth for model validation by assessing modeling performance. This subset is important in addressing RQ3. The fine-tuned model resulting from this process is utilized to predict GPT-revised essays based on students’ original essays. To revise the essays, students’ original persuasive essays are input into GPT, prompting the system to revise the writing while considering research-based traits at the levels of language, tone/style, organization, and content. More specific and informative traits will be elaborated upon in Chapter Four based on findings from exploratory and confirmatory factor analyses. The prediction scores are then compared with the students’ original scores to evaluate the increase in performance. 3.7.2 Analytic Procedure The current study involves a secondary data analysis primarily employing social science quantitative research methodologies to investigate three key research questions. This study follows the principles of Design-Based Research (DBR; Barab & Squire, 2004). The ultimate goal is to examine the effectiveness of machine intelligence techniques within the realm of secondary writing assessment design. DBR is characterized by relating learning theories to practice and showing improvements of student learning in real-world settings (DBR Collective, 2003). While this study follows the DBR framework for its core processes, the objective of determining whether the collected and analyzed data lead to improved student learning through 88 the instructional tools will be a future endeavor. This aspect is further discussed in the future direction section of Chapter Five. Reeves & McKenney (2012) outlined three core processes of DBR: (a) analysis and exploration, (b) design and construction, and (c) evaluation and reflection. This study follows the three processes, and the implementation workflow is presented in Figure 3-4. 89 Stages Products RQs Analysis Stage 1 Initial Assessment Design Exploratory factor analysis for feature reduction Confirmatory factor analysis for model validation Multigroup SEM analysis for demographic differences Microstructural (word, sentence, discourse) and macrostructural (content, structure, tone/style) writing features identification for two purposes: (1) building and validating essay scoring model and (2) crafting GPT prompts for feedback provision RQ1. What textual attributes serve as optimal indicators of persuasive essay quality in secondary school students? RQ2. To what extent do secondary students with varying special needs status (i.e., students with an Individualized Education Plan [IEP]) exhibit significant differences in their holistic writing scores across latent writing attributes? Methods: Exploratory and confirmatory factor analysis; Multigroup SEM analysis Workplaces: SPSS, R Stage 2 Automated Essay Scoring Develop a scoring model based on BERT by incorporating the identified writing features Validate the scoring model on a separate dataset not used in the first stage Persuasive essay scoring model based on key writing dimensions, designed for future use without requiring any feature extraction on new dataset Stage 3 Generative AI Feedback System Draft GPT prompt for revising students’ persuasive essays Use validated scoring model to evaluate GPT-revised essays A refined GPT prompt for revising students’ persuasive essays aligned with expected writing constructs, and an evaluation of the effectiveness of the GPT prompt in enhancing essay revisions Figure 3-1 Implementation Workflow and Evaluation Methodology 90 RQ3. Do essays revised by GPT, a Generative AI application, utilizing prompts derived from factor analysis, demonstrate enhanced performance compared to the original essays written by students? Methods: BERT, prompt engineering, and model evaluation Workplaces: PyTorch, GPT API 3.7.2.1 RQ1: What textual attributes serve as optimal indicators of persuasive essay quality in secondary school students? To address RQ1, I begin by employing exploratory factor analysis (EFA) to uncover underlying causal structures. This method allows me to discern the number and nature of latent variables that explain shared variability among observed indicators (C. D. Stapleton, 1997). In the context of the PERSUADE 2.0 corpus, six dimensions of writing are identified: language skills at the word, sentence, and discourse levels, as well as content, structure, and tone/style. These constructs encapsulate aspects of students’ language use in response to specific prompts, thereby reflecting their writing ability through various writing-related measures in their essays. In this study, EFA with varimax rotation method is applied to (1) explain the common variance in a group of variables (e.g., DESWLsy, LDMTLD, PCCNCz, WRDFAMc, WRDFRQc, WRDHYPnv, WRDIMGc, WRDMEAc, WRDPOLc) by associating their observed scores with a reduced number of underlying latent factors (e.g., word level complexity) and (2) derive a loading matrix that demonstrates how each variable can be expressed as a linear combination of the common factors. These loadings (ranging from -1 to 1) are akin to correlations between the variables and the latent factor it represents. The study adheres to established guidelines (Costello & Osborne, 2019; Osborne, 2014) (Costello & Osborne, 2019; Osborne, 2014) recommending attributing items with absolute loadings of 0.5 or greater to the respective factor, with higher loadings indicating stronger associations. The varimax rotation method is employed to enhance the interpretability of the factors. Additionally, Kaiser-Meyer-Olkin (KMO) values for each category are reported to evaluate the adequacy of the data for EFA analysis. Following the EFA, confirmatory factor analysis (CFA) is employed to establish evidence of construct validity within the hypothesized structural equation model. CFA facilitates the 91 concept proof of discriminant validity (Wilson et al., 2017), which assesses the differentiation among hypothesized levels of factors. In this study, CFA is applied across three distinct models, aligning with recommendations by Thompson (2000) and Rodgers et al. (2020). The first model (Model 1) is a baseline model where all writing variables are loaded onto a single factor (i.e., overall writing). The second model (Model 2) examines a two-factor model where all Coh- Metrix writing variables are loaded onto a microstructural factor, while other writing measures are loaded onto a macrostructural factor. The third model (Model 3) to be tested is a higher-order model that explores the possibility of six first-order factors (i.e., word, sentence, discourse, content, structure, tone/style) explained by the microstructural and macrostructural factors, respectively. Diagrams depicting each tested model are provided in Figure 3-1 for clarity and reference. 92 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 microstructure microstructure Scores Model 1 Model 2 Model 3 Figure 3-2 Diagrams for the One-factor, Two-factor, and Higher-order Models Evaluated 3.7.2.2 RQ2: To what extent do secondary students with different special needs status (i.e., students with and without an Individualized Education Plan [IEP]) exhibit significant differences in their holistic writing scores across latent writing attributes? To address RQ2, multigroup structural equation modeling (multigroup SEM) (Byrne, 2016) facilitates the comparison of multiple samples across various population groups within the identified CFA model from RQ1. This analytical method assesses potential significant differences among various groups while assuming their equality and examining the concept of invariance. In the present study, the comparison focuses on group differences based on special education status. 93 3.7.2.3 RQ3: Do essays revised by GPT, a Generative AI application, utilizing prompts derived from factor analysis, demonstrate enhanced performance compared to original essays written by students? To address RQ3, the study follows a two-step approach. The first step involves developing and fine-tuning a large language model customized specifically for this study. This model combines features derived from the MMFP with contextualized word embeddings from the Bidirectional Encoder Representations from Transformers (BERT) model. BERT has been widely acknowledged in prior research for its effectiveness in assessing students’ responses to essay prompts and questions (Cochran et al., 2024; Z. Liu et al., 2023). It has emerged as the industry standard for a variety of NLP downstream tasks, e.g., score prediction. In addition, BERT excels as a contextualized embedding model for domain-specific tasks, which provides dynamic contextual representations for words within the essay to preserve both semantic and syntactic information (Wang et al., 2024). By fusing BERT-derived features with MMFP-derived features, the constructed language model aims to reflect students’ comprehension of designated topics, thereby enhancing model performance and advancing our understanding of students’ writing proficiency in persuasive writing tasks. This resulting language model is then utilized to predict students’ persuasive writing scores. In the second step, the GPT model is prompted to revise students’ essays based on their original writing. A specific prompt, informed by insights from the MMFP model, is designed and integrated into the GPT to generate revised essays. It is hypothesized that the GPT-revised essays will demonstrate improvements across dimensions such as word usage, sentence structure, discourse organization, content coherence, structural clarity, and tone/style, all derived from the MMFP. These revised essays are then scored using the large language model developed in Step 94 1. Subsequently, the scores of the students’ original essays are compared with the scores of the GPT-revised essays to evaluate any observed improvements. 3.7.3 Model Fit To evaluate the fit of estimated EFA, CFA, and multigroup SEM models, two measures of absolute fit, the Chi-square test of model fit and the Standardized Root Mean Residual (SRMR), are employed. These measures gauge the extent to which the model’s parameter estimates align with those derived from the observed data. The SRMR quantifies discrepancies between the model’s implied correlations and the actual correlations observed in the raw data. It ranges from 0 to 1, with a value of 0 indicating a perfect fit and values below 0.08 suggesting good fit. Additionally, I evaluate model fit using two incremental/comparative measures: the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI). These metrics assess how well the postulated model fits relative to a more constrained baseline model. CFI and TLI values range from 0 to 1, with values approaching 0.9 indicating a relatively good model fit (Bentler, 1990). Acceptable fit values lie between 0.90 and 0.95, while values above 0.95 are considered indicative of excellent fit. Lastly, a parsimony-adjusted measure of fit, the Root Mean Square Error of Approximation (RMSEA) is utilized, which considers the number of estimated parameters in the model. RMSEA values below 0.05 indicate well-fitting models, while values exceeding 0.10 suggest poor fit. To validate the constructed language model, performance was assessed using mean absolute error (MAE), the standard deviation of MAE, and R-squared (R²), each averaged over a tenfold cross-validation. MAE measured the average magnitude of absolute differences between predicted and actual values, while the standard deviation of MAE indicated model stability. The low MAE values suggested that predicted values closely align with actual values, indicating a 95 high degree of accuracy. R², as a goodness-of-fit measure in regression models, represented the proportion of variance in the outcome variable—holistic essay scores—explained by the model. 3.7.4 Software The analysis was conducted using the R Studio (R Core Team) lavvan (Rosseel, 2012) and “psych” package (Revelle, 2015) to conduct exploratory data analysis, EFA, CFA, and multigroup SEM with maximum likelihood estimation. The Python library Pytorch was utilized for training and evaluating BERT deep learning models. The pretrained BERT model can be accessed through the Huggingface library. GPT-3.5, accessed via the OpenAI API, is employed to revise students’ essays. 96 Chapter 4: Findings and Discussion In this chapter, I conduct a thorough analysis of the findings pertaining to the three research questions. Additionally, I interpret these results in relation to the existing literature. This section aims to offer an in-depth exploration of the results, emphasizing significant patterns and themes that have emerged from the data. 4.1 DESCRIPTIVE STATISTICS Figure 4-1 illustrates the distribution of students across various grade levels sampled within each holistic score level (ranging from 1 to 6) of the PERSUADE 2.0 corpus. The observed data exhibit statistically significant differences from the expected values, as evidenced by Chi-square test for independence concerning grade level and holistic score, 2(20) = 6339.3, p < 0.001. Hence, the null hypothesis is rejected, indicating that the data do not adhere to an equal distribution and that a relationship exists between grade level and holistic essay score. Grade level will not be considered as a grouping variable in the subsequent multigroup analysis and will be controlled as a covariate in further data analyses. Figure 4-1 Distribution of Study Sample Across Holistic Score Levels of the PERSUADE 2.0 Corpus 97 Table 4-1 presents basic descriptive statistics pertaining to the writing measures under investigation in the present study. Table 4-1 Means and (Standard Deviations) for Writing-Related Variables Holistic Score DESWLsy LDMTLD PCCNCz WRDFAMc WRDFRQc WRDHYPnv WRDIMGc WRDMEAc WRDPOLc DESSL DESSLd SYNLE SYNNP SYNMEDwrd SYNSTRUTt DRPVAL DRNP DRVP DRNEG PCREFz WRDPRP1s WRDPRP2 WRDPRP3s CRFAOa CRFAO1 LSAPP1 LSASS1 LSAGN PCNARz PCDCz DESWC CNCAll CNCCaus CNCLogic CNCADC CNCTemp CNCAdd CNCPos CNCNeg RDFRE Lead Position Claim Counterclaim Rebuttal Evidence Concluding statement Lead_effective Position_effective Claim_effective Counterclaim_effective Rebuttal_effective Grade 6 1.92 (0.82) 1.33 (0.08) 68.0 (18.05) 0.17 (1.05) 582 (6.67) 2.58 (0.17) 1.46 (0.17) 404 (23.67) 431 (13.2) 4.65 (0.44) 20.0 (20.7) 8.9 (4.45) 3.54 (3.50) 0.67 (0.15) 0.85 (0.15) 0.10 (0.03) 3.53 (4.61) 358 (34.6) 269 (34.9) 8.7 (8.6) 0.64 (1.18) 21.65 (27.46) 34.95 (31.94) 22.05 (29.11) 0.56 (0.20) 0.64 (0.20) 0.29 (0.16) 0.17 (0.08) 0.30 (0.06) 1.12 (0.66) 0.87 (1.19) 296 (102) 94 (21.6) 28.6 (13.9) 46.4 (17.8) 9.9 (8.5) 13.6 (9.26) 50.6 (16.1) 87.5 (22.3) 8.6 (8.05) 75.1 (12.0) 0.30 (0.46) 0.77 (0.42) 2.34 (2.11) 0.18 (0.50) 0.13 (0.43) 2.32 (1.42) 0.56 (0.51) 0.23 (0.47) 0.67 (0.54) 0.70 (0.58) 0.13 (0.33) 0.09 (0.28) Grade 8 3.06 (1.18) 1.36 (0.10) 66.9 (17.9) -0.35 (0.94) 583 (6.26) 2.60 (0.15) 1.52 (0.18) 393 (21.3) 425 (15.9) 4.81 (0.57) 22.1 (15.7) 11.5 (8.6) 4.28 (8.48) 0.65 (0.16) 0.82 (0.19) 0.07 (0.03) 5.02 (5.73) 342 (31.6) 272 (41.3) 16.7 (10.3) 1.14 (1.30) 17.91 (18.14) 22.7 (24.3) 3.14 (8.03) 0.60 (0.19) 0.68 (0.19) 0.34 (0.19) 0.24 (0.09) 0.34 (0.06) 1.10 (0.71) 1.72 (1.51) 384 (199) 101 (23.1) 36.4 (15.6) 57.2 (18.3) 19.8 (11.0) 14.9 (9.19) 46.4 (15.5) 88.1 (22.6) 15.3 (9.38) 69.9 (12.1) 0.53 (0.50) 1.00 (0.13) 2.80 (1.61) 0.36 (0.63) 0.27 (0.54) 2.91 (1.24) 0.83 (0.38) 0.60 (0.73) 1.08 (0.58) 1.38 (0.88) 0.30 (0.54) 0.23 (0.50) 98 Grade 10 2.59 (1.29) 1.46 (0.11) 75.2 (19.5) -0.25 (0.855) 581 (8.78) 2.46 (0.20) 1.54 (0.18) 399 (18.7) 426 (15.4) 4.26 (0.51) 24.1 (15.4) 11.4 (10.4) 4.40 (3.74) 0.75 (0.19) 0.86 (0.13) 0.08 (0.03) 5.48 (5.08) 337 (33.2) 255 (40.6) 11.9 (9.1) 0.69 (1.19) 8.45 (12.4) 11.3 (15.9) 4.08 (8.93) 0.61 (0.20) 0.68 (0.20) 0.40 (0.21) 0.26 (0.10) 0.34 (0.06) 0.56 (0.77) 1.10 (1.23) 362 (174) 92.7 (20.5) 30.7 (11.9) 51.0 (17.4) 20.1 (9.87) 13.7 (8.61) 45.8 (14.7) 79.7 (18.9) 15.0 (8.66) 59.7 (13.6) 0.51 (0.50) .93 (0.27) 2.53 (1.61) 0.23 (0.57) 0.15 (0.45) 2.22 (1.25) 0.78 (0.42) 0.50 (0.67) 0.81 (0.62) 0.85 (0.62) 0.20 (0.49) 0.14 (0.43) Grade 11 4.85 (1.12) 1.48 (0.12) 73.2 (16.8) 0.03 (0.62) 584 (5.66) 2.45 (0.15) 1.52 (0.12) 405 (13.6) 433 (11.3) 4.36 (0.47) 24.4 (6.02) 10.2 (5.31) 5.14 (1.66) 0.74 (0.14) 0.85 (0.12) 0.08 (0.02) 7.07 (5.02) 332 (23.6) 272 (32.7) 13.5 (6.9) 1.33 (1.09) 7.37 (12.7) 6.52 (12.7) 4.17 (10.1) 0.72 (0.17) 0.80 (0.15) 0.54 (0.15) 0.29 (0.08) 0.37 (0.04) 0.60 (0.62) 1.67 (1.06) 694 (216) 99.1 (15.5) 36.7 (11.4) 53.8 (12.9) 18.1 (7.01) 16.0 (7.35) 46.7 (12.1) 87.3 (15.3) 13.8 (6.00) 56.8 (11.3) 0.92 (0.28) 1.00 (0.00) 4.31 (1.64) 0.86 (0.86) 0.69 (0.81) 3.40 (1.15) 0.98 (0.23) 1.38 (0.71) 1.59 (0.48) 1.59 (0.48) 0.96 (0.84) 0.78 (0.85) Grade 12 2.76 (0.72) 1.36 (0.07) 54.7 (15.0) -0.03 (0.80) 590 (3.91) 2.71 (0.15) 1.43 (0.16) 400 (19.8) 433 (13.7) 4.92 (0.51) 29.1 (14.6) 16.6 (11.8) 4.85 (3.02) 0.57 (0.14) 0.85 (0.05) 0.07 (0.03) 1.65 (2.48) 352 (27.6) 287 (40.2) 19.6 (10.1) 2.12 (1.43) 14.4 (14.0) 23.9 (25.5) 5.79 (13.5) 0.74 (0.20) 0.79 (0.18) 0.46 (0.18) 0.30 (0.12) 0.39 (0.05) 1.45 (0.60) 2.67 (1.88) 490 (216) 110 (24.2) 45.1 (16.5) 67.6 (18.4) 20.6 (10.0) 11.9 (6.64) 52.7 (13.1) 95.1 (23.0) 18.3 (9.22) 62.7 (13.9) 0.54 (0.50) 1.00 (0.17) 3.87 (1.86) 0.96 (1.34) 0.25 (0.47) 3.94 (1.61) 0.82 (0.39) 0.43 (0.50) 0.83 (0.31) 0.83 (0.31) 0.45 (0.49) 0.22 (0.41) Table 4-1 (cont’d) Evidence_effective Concluding_statement_effective C1 C2 C3 C4 C5 C6 All_markers Booster_words Discourse_markers Hedge_words Negative_adjectives_component Social_order_component Positive_adjectives_component Joy_component Trust_verbs_component 1.01 (0.61) 0.83 (0.70) 0.27 (0.01) 0.33 (0.01) 0.34 (0.02) 0.36 (0.00) 0.34 (0.10) 0.34 (0.10) 18.4 (12.2) 2.71 (3.00) 1.51 (2.06) 14.2 (9.51) -0.40 (0.89) 0.55 (0.18) 0.24 (0.30) 1.08 (0.75) 0.16 (0.09) Figure 4-2 displays a heatmap showing the unadjusted bivariate correlations between 0.43 (0.44) 0.50 (0.56) 0.29 (0.06) 0.33 (0.06) 0.36 (0.06) 0.37 (0.07) 0.38 (0.08) 0.39 (0.09) 10.2 (7.6) 1.99 (2.11) 0.62 (1.11) 7.6 (6.25) -0.58 (0.94) 0.51 (0.15) 0.13 (0.27) 1.24 (0.80) 0.15 (0.07) 1.58 (0.53) 1.54 (0.60) 0.26 (0.04) 0.37 (0.07) 0.39 (0.07) 0.42 (0.08) 0.44 (0.07) 0.45 (0.07) 31.1 (12.3) 4.51 (3.52) 4.39 (3.15) 22.2 (9.60) -0.60 (0.61) 0.50 (0.13) 0.21 (0.20) 0.70 (0.42) 0.25 (0.09) 0.75 (0.60) 0.77 (0.70) 0.30 (0.08) 0.32 (0.08) 0.35 (0.09) 0.37 (0.10) 0.38 (0.11) 0.37 (0.10) 16.3 (10.3) 2.63 (2.40) 1.51 (1.83) 12.2 (8.16) -0.07 (0.87) 0.50 (0.15) 0.10 (0.23) 0.93 (0.65) 0.19 (0.10) 0.70 (0.31) 0.63 (0.52) 0.29 (0.04) 0.36 (0.06) 0.37 (0.06) 0.36 (0.06) 0.34 (0.06) 0.34 (0.06) 18.3 (9.33) 2.37 (2.30) 1.06 (1.65) 14.9 (7.83) -0.67 (0.60) 0.60 (0.21) 0.20 (0.24) 0.91 (0.67) 0.18 (0.06) student demographic variables, essay scores, and writing measures. The X-axis represents various student demographic variables, including gender, grade level, English Language Learner (ELL) status, race/ethnicity, income background, disability status, and students’ writing scores. The Y-axis includes writing measures at microstructural and macrostructural levels. Spearman or point-biserial correlation analyses were used for the categorical variables (e.g., gender, grade, ELL status, race/ethnicity, income background, and disability status) in relation to the writing elements. Pearson correlation analysis was employed for continuous variables, such as essay scores and writing measures. Only correlation coefficients with absolute values greater than 0.30 are presented in the Figure 4-2, as they indicate a moderate positive or negative linear relationship, consistent with standards in social science research (Bujang & Baharum, 2016; S. Crossley et al., 2023). P-values for these correlation coefficients are denoted with asterisks and interpreted as significantly correlated. According to the Figure 4-2, grade level demonstrated significant correlations with several writing features, including mean syllables per word (DESWLsy), mean word frequency (WRDFRQc), mean word polysemy (WRDPOLc), incidence of first-person pronouns 99 (WRDPRP1s), incidence of second-person pronouns (WRDPRP2), LSA overlap of adjacent paragraphs (LSAPP1), and readability (RDFRE). Notably, these correlations are all positive. For instance, the positive and moderately strong correlation (r = 0.39, p < 0.001) between grade level and DESWLsy indicates that progression from lower grade levels (i.e., Grade 6) to higher grade levels (i.e., Grade 12) was moderately associated with an increase in mean syllables per word. These findings align with existing literature, suggesting that as students advance through secondary education and develop strategies for applying discourse-level knowledge, they tend to utilize more diverse lexical and syntactic choices (A. Truckenmiller et al., 2021), enhance their discourse cohesion and coherence (Sarmiento et al., 2024), and adopt genre-related rhetorical structures (W. Qin et al., 2024). Regarding holistic essay scores, most of the structural and content-level features were found to be significantly and strongly correlated with paper quality. Some language features, such as mean syllables per word (DESWLsy), mean number of words before the main verb (SYNLE), LSA overlap of adjacent paragraphs (LSAPP1), LSA given/new information (LSAGN), and total word count (DESWC), along with some tone/style features (e.g., booster words, discourse markers, and hedge words), exhibit moderate to strong correlations with essay quality. These findings aligned with the PERSUADE corpus holistic rating rubric used to evaluate student essays, which emphasizes the importance of presenting clear claims, utilizing organized evidence to support positions, and demonstrating diverse and appropriate vocabulary choices and sentence structures. Existing literature suggests that the organizational features of argumentation and the quality of content are strong predictors of writing quality at the secondary levels, as these attributes enhance the clarity and persuasiveness of students’ written work (S. A. Crossley et al., 2022; Taylor et al., 2019; Uccelli et al., 2013). 100 Figure 4-2 Correlation Heatmap of Writing-Related Variables, Demographic Information, and Essay Scores 101 Bivariate correlation coefficients among writing features at both the microstructural and microstructural levels were examined (see Table 4-2). As mentioned in Chapter 3 under the subsection of Research Design, normality assessments were conducted for all study variables, revealing that 27 writing-related variables exhibited non-normal distributions. To account for the non-normality, a weighted least square (WLS) estimator was employed for the analyses to extract factors and compute robust standard errors (Kline, 2015). This approach is recommended when the assumption of multivariate normality is violated according to Fabrigar et al. (1999), as was the case in this dataset, where many variables demonstrated skewed distributions. 102 Table 4-2 Correlations Among All Study Writing Related Variables in the Corpus 1. DESWLsy 2. LDMTLD 3. PCCNCz 4. WRDFAMc 5. WRDFRQc 6. WRDHYPnv 7. WRDIMGc 8. WRDMEAc 9. WRDPOLc 10. DESSL 11. DESSLd 12. SYNLE 13. SYNNP 14. SYNMEDwrd 15. SYNSTRUTt 16. DRPVAL 17. DRNP 18. DRVP 19. DRNEG 20. PCREFz 21. WRDPRP1s 22. WRDPRP2 23. WRDPRP3s 24. CRFAOa 25. CRFAO1 26. LSAPP1 27. LSASS1 28. LSAGN 29. PCNARz 30. PCDCz 31. DESWC 32. CNCAll 33. CNCCaus 34. CNCLogic 35. CNCADC 36. CNCTemp 37. CNCAdd 38. CNCPos 39. CNCNeg 40. RDFRE 41. Lead 42. Position 43. Claim 44. Counterclaim 45. Rebuttal 46. Evidence 47. Concluding_Statement 48. Lead_effective 49. Position_effective 50. Claim_effective 51.Counterclaim_effective 52. Rebuttal_effective 53. Evidence_effective 54.Concl_Sttm_effective 55. C1 56. C2 57. C3 58. C4 59. C5 60. C6 61. all_markers 62. booster_words 63. discourse_markers 64. hedge_words 65. neg_adj_comp 66. social_order_comp 67. positive_adj_comp 68. joy_comp 69. trust_verbs_comp 70. all_elements 71. all_effective_score 1 1.00 0.45 0.03 -0.46 -0.73 0.47 0.22 0.12 -0.51 -0.10 -0.19 0.03 0.57 0.08 0.20 0.23 0.03 -0.37 -0.29 -0.26 -0.38 -0.32 -0.04 -0.05 -0.05 0.29 0.09 0.09 -0.73 -0.22 0.25 -0.23 -0.08 -0.32 -0.10 -0.03 -0.10 -0.21 -0.18 -0.65 0.20 -0.08 0.12 0.02 0.07 0.06 0.12 0.28 0.18 0.24 0.13 0.17 0.28 0.33 -0.04 0.04 0.10 0.18 0.23 0.22 0.14 0.13 0.31 0.06 -0.08 -0.24 -0.15 -0.28 0.51 0.13 0.32 2 0.45 1.00 0.09 -0.38 -0.53 0.20 0.16 0.03 -0.37 -0.07 -0.08 0.02 0.38 0.19 -0.02 0.13 0.02 -0.34 -0.16 -0.65 -0.12 -0.16 0.00 -0.36 -0.37 -0.10 -0.41 -0.41 -0.46 -0.19 0.09 -0.15 -0.19 -0.21 0.09 0.11 -0.05 -0.21 0.02 -0.27 0.07 -0.07 -0.06 0.03 0.07 0.04 0.03 0.13 0.07 0.07 0.08 0.11 0.10 0.11 -0.06 -0.17 -0.10 -0.07 -0.02 0.02 0.05 0.12 0.21 -0.03 0.01 -0.17 -0.14 -0.14 0.20 0.01 0.13 3 0.03 0.09 1.00 0.03 -0.30 0.43 0.72 0.61 -0.04 0.39 0.14 0.26 0.20 -0.28 -0.20 0.03 0.06 -0.12 -0.34 0.16 -0.14 -0.06 0.05 0.17 0.16 -0.06 0.10 -0.16 -0.22 0.03 0.02 0.18 -0.05 -0.02 0.06 0.09 0.25 0.11 0.11 -0.29 -0.05 -0.09 -0.02 0.02 0.01 -0.04 -0.13 -0.01 -0.06 -0.03 0.05 0.03 -0.02 -0.03 0.00 -0.04 -0.01 -0.02 0.02 0.03 -0.07 -0.08 -0.04 -0.05 -0.01 -0.08 -0.02 0.00 0.14 -0.03 -0.01 4 -0.46 -0.38 0.03 1.00 0.66 -0.23 -0.17 0.30 0.39 0.05 0.09 -0.01 -0.35 -0.08 -0.05 -0.26 -0.30 0.50 0.15 0.30 0.22 0.25 0.04 0.15 0.16 -0.01 0.06 0.13 0.48 0.32 0.05 0.28 0.23 0.34 0.03 0.03 0.10 0.29 0.13 0.29 0.01 0.13 0.13 0.06 0.01 0.02 0.01 -0.03 0.01 0.06 0.03 0.00 0.00 -0.02 -0.26 -0.03 -0.11 -0.16 -0.17 -0.21 0.18 0.01 -0.09 0.25 -0.03 0.17 0.26 0.19 -0.25 0.11 0.01 5 -0.73 -0.53 -0.30 0.66 1.00 -0.54 -0.48 -0.15 0.61 0.06 0.13 -0.04 -0.60 -0.12 -0.13 -0.28 -0.12 0.51 0.34 0.37 0.35 0.29 0.01 0.13 0.12 -0.16 0.00 0.08 0.71 0.31 -0.17 0.25 0.22 0.37 0.06 -0.06 0.09 0.26 0.14 0.48 -0.10 0.13 -0.02 0.02 -0.03 -0.02 -0.02 -0.19 -0.07 -0.11 -0.08 -0.10 -0.17 -0.20 -0.07 0.01 -0.06 -0.14 -0.19 -0.21 -0.01 -0.07 -0.23 0.07 -0.01 0.29 0.22 0.25 -0.44 -0.03 -0.19 6 0.47 0.20 0.43 -0.23 -0.54 1.00 0.60 0.37 -0.16 -0.05 -0.07 0.01 0.46 -0.01 0.09 0.17 0.09 -0.21 -0.18 -0.05 -0.26 -0.24 -0.04 0.02 0.03 0.17 0.15 0.10 -0.53 -0.10 0.17 -0.13 -0.07 -0.17 -0.13 -0.02 -0.06 -0.12 -0.13 -0.30 0.08 -0.04 0.16 0.08 0.08 0.08 0.06 0.16 0.15 0.24 0.16 0.16 0.24 0.22 0.04 0.19 0.23 0.26 0.34 0.33 0.05 -0.04 0.11 0.05 -0.09 -0.06 -0.01 -0.15 0.31 0.16 0.26 7 0.22 0.16 0.72 -0.17 -0.48 0.60 1.00 0.55 -0.29 -0.05 -0.07 0.00 0.34 0.04 0.10 0.17 0.16 -0.29 -0.22 -0.12 -0.21 -0.23 -0.01 -0.04 -0.04 0.05 0.03 -0.05 -0.41 -0.22 0.02 -0.17 -0.19 -0.23 -0.12 -0.02 -0.06 -0.19 -0.10 -0.11 0.02 -0.11 0.00 0.04 0.04 0.03 -0.06 0.04 -0.06 -0.02 0.08 0.07 0.01 0.02 0.09 0.05 0.11 0.12 0.16 0.18 -0.10 -0.10 -0.02 -0.10 0.03 -0.14 -0.15 -0.07 0.20 0.02 0.03 8 0.12 0.03 0.61 0.30 -0.15 0.37 0.55 1.00 0.00 -0.01 -0.04 0.00 0.08 -0.05 0.04 -0.06 -0.07 0.15 -0.24 0.03 -0.02 0.09 0.05 0.02 0.06 0.03 0.03 0.01 -0.12 0.04 0.14 0.03 0.10 -0.07 -0.14 -0.08 0.02 0.07 -0.06 -0.08 0.02 0.06 0.16 0.04 0.03 0.07 0.04 0.09 0.11 0.17 0.09 0.07 0.17 0.12 -0.19 -0.06 -0.07 -0.06 -0.03 -0.07 0.17 0.04 0.01 0.20 -0.10 -0.04 0.27 0.13 0.26 0.14 0.16 9 -0.51 -0.37 -0.04 0.39 0.61 -0.16 -0.29 0.00 1.00 0.05 0.11 -0.05 -0.43 -0.12 -0.14 -0.15 -0.10 0.48 0.16 0.30 0.21 0.26 -0.01 0.08 0.10 -0.13 0.05 0.08 0.45 0.35 -0.08 0.27 0.32 0.17 -0.04 -0.04 0.05 0.30 0.01 0.33 -0.08 0.08 0.04 0.00 -0.02 0.01 -0.03 -0.11 0.01 -0.01 -0.06 -0.07 -0.04 -0.09 0.08 0.10 0.06 0.00 -0.03 -0.06 -0.02 -0.05 -0.17 0.04 -0.05 0.33 0.29 0.22 -0.32 0.01 -0.08 10 -0.10 -0.07 0.39 0.05 0.06 -0.05 -0.05 -0.01 0.05 1.00 0.55 0.62 -0.04 -0.27 -0.50 -0.08 -0.01 0.01 -0.03 0.33 0.03 -0.01 0.07 0.28 0.21 -0.10 0.18 -0.23 0.15 0.22 0.02 0.19 0.09 0.14 0.08 -0.01 0.18 0.16 0.10 -0.61 -0.10 -0.04 -0.10 0.00 -0.02 -0.10 -0.17 -0.08 -0.08 -0.10 0.00 -0.02 -0.10 -0.11 -0.01 -0.06 -0.10 -0.10 -0.11 -0.10 -0.01 0.00 0.00 -0.01 0.02 -0.02 -0.01 0.00 -0.02 -0.11 -0.10 11 -0.19 -0.08 0.14 0.09 0.13 -0.07 -0.07 -0.04 0.11 0.55 1.00 0.08 -0.10 -0.09 -0.44 -0.10 -0.04 0.04 0.07 0.23 0.10 0.04 0.06 0.27 0.23 -0.18 0.16 -0.17 0.32 0.26 -0.05 0.20 0.07 0.19 0.13 -0.01 0.21 0.16 0.15 -0.41 -0.12 -0.04 -0.11 0.00 -0.04 -0.09 -0.15 -0.15 -0.09 -0.12 -0.01 -0.05 -0.15 -0.18 0.00 -0.06 -0.10 -0.13 -0.14 -0.13 -0.06 -0.03 -0.07 -0.06 0.06 0.03 0.03 0.07 -0.11 -0.12 -0.15 12 0.03 0.02 0.26 -0.01 -0.04 0.01 0.00 0.00 -0.05 0.62 0.08 1.00 0.05 -0.13 -0.15 0.02 0.02 -0.04 -0.04 0.10 -0.01 -0.04 0.04 0.03 0.01 -0.01 0.02 -0.11 -0.08 0.04 0.08 0.05 0.01 0.01 0.00 0.06 0.02 0.04 0.00 -0.27 0.00 0.00 0.00 0.00 0.01 -0.02 -0.05 0.03 0.00 0.01 0.01 0.02 0.02 0.02 -0.01 -0.03 -0.04 -0.02 -0.01 -0.01 0.03 0.02 0.05 0.02 -0.02 -0.03 -0.03 -0.05 0.04 -0.01 0.03 13 0.57 0.38 0.20 -0.35 -0.60 0.46 0.34 0.08 -0.43 -0.04 -0.10 0.05 1.00 0.08 0.10 0.19 -0.08 -0.57 -0.27 -0.32 -0.33 -0.27 -0.01 -0.17 -0.15 0.10 -0.05 -0.07 -0.72 -0.24 0.12 -0.23 -0.22 -0.27 -0.02 0.08 -0.09 -0.26 -0.07 -0.39 0.10 -0.14 -0.01 -0.02 0.02 -0.01 0.01 0.15 0.07 0.08 0.04 0.08 0.11 0.17 -0.06 -0.10 -0.03 0.03 0.11 0.12 -0.03 0.06 0.15 -0.09 -0.04 -0.22 -0.19 -0.17 0.28 0.00 0.14 14 0.08 0.19 -0.28 -0.08 -0.12 -0.01 0.04 -0.05 -0.12 -0.27 -0.09 -0.13 0.08 1.00 0.10 0.04 0.03 -0.11 -0.04 -0.54 -0.05 -0.06 0.02 -0.15 -0.14 0.08 -0.15 0.00 0.03 -0.11 0.06 -0.08 -0.10 -0.10 -0.01 0.05 -0.03 -0.09 -0.02 0.11 0.08 -0.01 0.00 0.02 0.03 0.04 0.04 0.08 0.02 0.03 0.03 0.05 0.05 0.06 -0.01 -0.04 -0.03 -0.01 -0.01 0.02 0.02 0.08 0.06 -0.01 0.02 -0.07 -0.07 -0.04 0.01 0.04 0.06 15 0.20 -0.02 -0.20 -0.05 -0.13 0.09 0.10 0.04 -0.14 -0.50 -0.44 -0.15 0.10 0.10 1.00 0.08 0.10 -0.06 -0.06 -0.21 -0.10 -0.10 -0.02 -0.29 -0.25 0.17 -0.12 0.20 -0.31 -0.25 -0.02 -0.28 -0.10 -0.21 -0.20 0.02 -0.28 -0.22 -0.22 0.32 0.07 -0.01 0.10 -0.03 -0.03 0.11 0.11 0.06 0.01 0.00 -0.03 -0.02 0.03 0.11 -0.05 -0.04 0.00 0.02 0.03 0.04 0.00 -0.02 0.01 0.01 -0.03 -0.05 -0.09 -0.04 0.11 0.10 0.04 16 0.23 0.13 0.03 -0.26 -0.28 0.17 0.17 -0.06 -0.15 -0.08 -0.10 0.02 0.19 0.04 0.08 1.00 -0.02 -0.07 -0.04 -0.12 -0.18 -0.24 -0.09 -0.05 -0.05 0.11 -0.01 0.01 -0.32 -0.09 0.11 -0.09 -0.05 -0.16 -0.02 0.07 -0.12 -0.09 -0.08 -0.10 0.14 0.02 0.05 0.06 0.11 0.05 0.07 0.17 0.12 0.15 0.10 0.12 0.15 0.19 0.03 0.10 0.16 0.21 0.25 0.29 0.05 0.05 0.17 0.00 0.00 -0.15 -0.11 -0.10 0.16 0.10 0.20 17 0.03 0.02 0.06 -0.30 -0.12 0.09 0.16 -0.07 -0.10 -0.01 -0.04 0.02 -0.08 0.03 0.10 -0.02 1.00 -0.49 -0.19 -0.04 0.01 -0.08 0.02 -0.04 -0.04 -0.03 -0.02 -0.03 -0.16 -0.18 -0.12 -0.29 -0.07 -0.28 -0.17 -0.12 -0.22 -0.26 -0.18 0.00 0.03 -0.10 -0.13 -0.04 -0.05 -0.03 -0.05 -0.02 -0.13 -0.19 -0.08 -0.06 -0.17 -0.13 0.16 -0.01 0.00 -0.01 -0.03 0.02 -0.28 -0.12 -0.11 -0.29 -0.03 -0.03 -0.27 -0.10 -0.03 -0.10 -0.15 18 -0.37 -0.34 -0.12 0.50 0.51 -0.21 -0.29 0.15 0.48 0.01 0.04 -0.04 -0.57 -0.11 -0.06 -0.07 -0.49 1.00 0.23 0.29 0.20 0.22 -0.03 0.13 0.14 -0.01 0.07 0.12 0.54 0.22 0.03 0.21 0.20 0.22 -0.03 -0.08 0.10 0.27 0.05 0.26 -0.05 0.16 0.17 0.05 0.02 0.05 0.05 -0.04 0.09 0.14 0.04 0.01 0.10 0.02 -0.12 0.10 0.03 0.00 -0.03 -0.07 0.25 0.00 -0.06 0.33 -0.03 0.27 0.36 0.17 -0.12 0.13 0.07 19 -0.29 -0.16 -0.34 0.15 0.34 -0.18 -0.22 -0.24 0.16 -0.03 0.07 -0.04 -0.27 -0.04 -0.06 -0.04 -0.19 0.23 1.00 0.11 0.16 0.05 -0.05 -0.03 -0.03 -0.07 -0.04 0.03 0.40 0.11 -0.05 0.08 0.10 0.24 0.16 -0.07 0.02 0.05 0.17 0.24 -0.05 0.11 -0.01 0.10 0.07 0.02 0.03 -0.07 0.03 0.00 0.04 0.02 -0.03 -0.06 0.00 0.02 0.00 -0.04 -0.04 -0.01 -0.01 -0.05 -0.05 0.02 0.16 0.11 0.03 0.08 -0.19 0.04 -0.02 20 -0.26 -0.65 0.16 0.30 0.37 -0.05 -0.12 0.03 0.30 0.33 0.23 0.10 -0.32 -0.54 -0.21 -0.12 -0.04 0.29 0.11 1.00 0.07 0.12 0.01 0.79 0.76 0.25 0.73 0.55 0.45 0.34 0.07 0.23 0.28 0.26 -0.03 -0.09 0.07 0.27 0.01 -0.13 -0.04 0.06 0.12 0.02 -0.02 -0.02 -0.01 -0.06 -0.01 0.02 0.00 -0.03 0.00 -0.01 0.06 0.27 0.21 0.18 0.14 0.09 0.10 -0.03 -0.05 0.16 -0.03 0.14 0.10 0.04 -0.10 0.06 -0.02 21 -0.38 -0.12 -0.14 0.22 0.35 -0.26 -0.21 -0.02 0.21 0.03 0.10 -0.01 -0.33 -0.05 -0.10 -0.18 0.01 0.20 0.16 0.07 1.00 0.15 0.08 -0.05 0.00 -0.25 -0.14 -0.11 0.46 0.08 -0.09 0.08 0.07 0.14 0.02 0.00 0.02 0.09 0.06 0.26 -0.10 0.10 -0.10 -0.03 -0.03 -0.07 -0.04 -0.13 -0.02 -0.05 -0.05 -0.06 -0.08 -0.16 -0.10 -0.11 -0.16 -0.20 -0.20 -0.21 -0.02 0.00 -0.10 0.00 -0.03 0.07 0.17 0.14 -0.18 -0.10 -0.11 22 -0.32 -0.16 -0.06 0.25 0.29 -0.24 -0.23 0.09 0.26 -0.01 0.04 -0.04 -0.27 -0.06 -0.10 -0.24 -0.08 0.22 0.05 0.12 0.15 1.00 -0.01 0.02 0.07 -0.13 -0.06 0.00 0.35 0.19 -0.04 0.19 0.14 0.19 0.00 0.08 0.04 0.22 0.05 0.24 -0.02 0.03 0.02 -0.06 -0.09 0.00 -0.02 -0.06 0.00 -0.02 -0.11 -0.14 -0.04 -0.10 -0.04 -0.03 -0.11 -0.16 -0.22 -0.26 0.06 0.05 -0.12 0.09 -0.02 0.12 0.20 0.13 -0.16 -0.02 -0.09 23 -0.04 0.00 0.05 0.04 0.01 -0.04 -0.01 0.05 -0.01 0.07 0.06 0.04 -0.01 0.02 -0.02 -0.09 0.02 -0.03 -0.05 0.01 0.08 -0.01 1.00 -0.01 0.04 -0.04 -0.01 -0.01 0.10 0.00 0.10 -0.01 -0.02 -0.02 -0.03 0.03 0.03 -0.01 -0.01 -0.03 0.00 -0.04 0.01 -0.03 -0.04 -0.03 -0.03 0.00 -0.01 -0.02 -0.03 -0.03 0.01 -0.01 -0.08 -0.16 -0.18 -0.18 -0.16 -0.14 0.01 0.05 0.01 0.00 0.01 -0.02 -0.02 0.03 0.02 -0.02 -0.02 103 Table 4-2 (cont’d) 1. DESWLsy 2. LDMTLD 3. PCCNCz 4. WRDFAMc 5. WRDFRQc 6. WRDHYPnv 7. WRDIMGc 8. WRDMEAc 9. WRDPOLc 10. DESSL 11. DESSLd 12. SYNLE 13. SYNNP 14. SYNMEDwrd 15. SYNSTRUTt 16. DRPVAL 17. DRNP 18. DRVP 19. DRNEG 20. PCREFz 21. WRDPRP1s 22. WRDPRP2 23. WRDPRP3s 24. CRFAOa 25. CRFAO1 26. LSAPP1 27. LSASS1 28. LSAGN 29. PCNARz 30. PCDCz 31. DESWC 32. CNCAll 33. CNCCaus 34. CNCLogic 35. CNCADC 36. CNCTemp 37. CNCAdd 38. CNCPos 39. CNCNeg 40. RDFRE 41. Lead 42. Position 43. Claim 44. Counterclaim 45. Rebuttal 46. Evidence 47. Concluding_Statement 48. Lead_effective 49. Position_effective 50. Claim_effective 51.Counterclaim_effective 52. Rebuttal_effective 53. Evidence_effective 54.Concl_Sttm_effective 55. C1 56. C2 57. C3 58. C4 59. C5 60. C6 61. all_markers 62. booster_words 63. discourse_markers 64. hedge_words 65. neg_adj_comp 66. social_order_comp 67. positive_adj_comp 68. joy_comp 69. trust_verbs_comp 70. all_elements 71. all_effective_score 24 -0.05 -0.36 0.17 0.15 0.13 0.02 -0.04 0.02 0.08 0.28 0.27 0.03 -0.17 -0.15 -0.29 -0.05 -0.04 0.13 -0.03 0.79 -0.05 0.02 -0.01 1.00 0.87 0.32 0.68 0.49 0.37 0.27 0.14 0.20 0.21 0.18 0.00 -0.04 0.10 0.21 0.01 -0.31 0.02 0.01 0.11 0.04 0.03 -0.03 0.01 0.02 0.02 0.07 0.05 0.04 0.07 0.06 0.07 0.27 0.23 0.22 0.19 0.15 0.15 0.04 0.07 0.16 -0.01 0.04 0.00 -0.07 -0.01 0.06 0.07 25 -0.05 -0.37 0.16 0.16 0.12 0.03 -0.04 0.06 0.10 0.21 0.23 0.01 -0.15 -0.14 -0.25 -0.05 -0.04 0.14 -0.03 0.76 0.00 0.07 0.04 0.87 1.00 0.28 0.68 0.48 0.36 0.26 0.19 0.19 0.20 0.16 -0.01 -0.02 0.08 0.21 0.00 -0.26 0.04 0.02 0.13 0.05 0.03 0.01 0.02 0.06 0.05 0.11 0.07 0.05 0.12 0.09 0.03 0.23 0.18 0.18 0.16 0.12 0.18 0.08 0.08 0.19 -0.04 0.04 0.02 -0.05 0.01 0.10 0.11 26 0.29 -0.10 -0.06 -0.01 -0.16 0.17 0.05 0.03 -0.13 -0.10 -0.18 -0.01 0.10 0.08 0.17 0.11 -0.03 -0.01 -0.07 0.25 -0.25 -0.13 -0.04 0.32 0.28 1.00 0.43 0.61 -0.16 0.01 0.41 -0.04 0.07 -0.07 -0.09 0.03 -0.09 0.00 -0.12 -0.16 0.26 0.09 0.30 0.10 0.10 0.17 0.21 0.31 0.18 0.29 0.16 0.15 0.30 0.35 0.09 0.34 0.35 0.37 0.39 0.37 0.33 0.22 0.27 0.30 -0.06 -0.06 -0.09 -0.18 0.17 0.32 0.35 27 0.09 -0.41 0.10 0.06 0.00 0.15 0.03 0.03 0.05 0.18 0.16 0.02 -0.05 -0.15 -0.12 -0.01 -0.02 0.07 -0.04 0.73 -0.14 -0.06 -0.01 0.68 0.68 0.43 1.00 0.71 0.14 0.23 0.11 0.12 0.20 0.11 -0.04 -0.06 0.02 0.16 -0.06 -0.31 0.01 0.00 0.11 -0.02 -0.04 -0.03 -0.01 0.02 -0.02 0.05 0.00 -0.01 0.04 0.06 0.15 0.32 0.28 0.28 0.25 0.20 0.10 0.00 0.05 0.12 -0.04 0.03 -0.01 -0.05 0.04 0.04 0.03 28 0.09 -0.41 -0.16 0.13 0.08 0.10 -0.05 0.01 0.08 -0.23 -0.17 -0.11 -0.07 0.00 0.20 0.01 -0.03 0.12 0.03 0.55 -0.11 0.00 -0.01 0.49 0.48 0.61 0.71 1.00 0.09 0.17 0.36 0.08 0.21 0.07 -0.10 0.01 -0.08 0.13 -0.11 0.05 0.18 0.09 0.33 0.07 0.05 0.20 0.21 0.18 0.12 0.23 0.08 0.06 0.22 0.27 0.13 0.41 0.39 0.39 0.37 0.33 0.31 0.16 0.16 0.32 -0.09 0.08 0.00 -0.10 0.05 0.32 0.24 29 -0.73 -0.46 -0.22 0.48 0.71 -0.53 -0.41 -0.12 0.45 0.15 0.32 -0.08 -0.72 0.03 -0.31 -0.32 -0.16 0.54 0.40 0.45 0.46 0.35 0.10 0.37 0.36 -0.16 0.14 0.09 1.00 0.37 -0.12 0.32 0.23 0.42 0.09 -0.02 0.14 0.33 0.16 0.34 -0.16 0.13 -0.04 0.01 -0.03 -0.05 -0.07 -0.21 -0.07 -0.10 -0.06 -0.09 -0.14 -0.23 -0.03 0.02 -0.07 -0.14 -0.20 -0.21 0.04 -0.02 -0.17 0.10 0.05 0.20 0.23 0.23 -0.36 -0.06 -0.19 30 -0.22 -0.19 0.03 0.32 0.31 -0.10 -0.22 0.04 0.35 0.22 0.26 0.04 -0.24 -0.11 -0.25 -0.09 -0.18 0.22 0.11 0.34 0.08 0.19 0.00 0.27 0.26 0.01 0.23 0.17 0.37 1.00 0.08 0.74 0.82 0.72 0.13 0.16 0.19 0.75 0.18 -0.07 -0.05 0.09 0.09 0.05 0.01 0.04 -0.01 -0.03 0.04 0.07 0.02 0.00 0.03 -0.01 -0.07 0.08 0.01 -0.02 -0.05 -0.06 0.12 0.00 0.04 0.15 -0.04 0.11 0.21 0.09 -0.12 0.07 0.02 31 0.25 0.09 0.02 0.05 -0.17 0.17 0.02 0.14 -0.08 0.02 -0.05 0.08 0.12 0.06 -0.02 0.11 -0.12 0.03 -0.05 0.07 -0.09 -0.04 0.10 0.14 0.19 0.41 0.11 0.36 -0.12 0.08 1.00 0.07 0.07 -0.07 -0.08 0.16 0.02 0.10 -0.09 -0.23 0.40 0.12 0.49 0.26 0.28 0.43 0.30 0.57 0.38 0.57 0.36 0.34 0.64 0.60 -0.04 0.26 0.27 0.31 0.38 0.39 0.76 0.55 0.58 0.68 -0.12 -0.01 0.02 -0.21 0.23 0.61 0.69 32 -0.23 -0.15 0.18 0.28 0.25 -0.13 -0.17 0.03 0.27 0.19 0.20 0.05 -0.23 -0.08 -0.28 -0.09 -0.29 0.21 0.08 0.23 0.08 0.19 -0.01 0.20 0.19 -0.04 0.12 0.08 0.32 0.74 0.07 1.00 0.52 0.72 0.31 0.30 0.66 0.89 0.37 -0.02 -0.07 0.08 0.07 0.05 0.01 0.02 -0.01 -0.03 0.05 0.09 0.03 0.00 0.05 0.01 -0.08 0.04 -0.01 -0.04 -0.05 -0.07 0.13 0.00 0.03 0.16 -0.01 0.09 0.20 0.09 -0.10 0.05 0.04 33 -0.08 -0.19 -0.05 0.23 0.22 -0.07 -0.19 0.10 0.32 0.09 0.07 0.01 -0.22 -0.10 -0.10 -0.05 -0.07 0.20 0.10 0.28 0.07 0.14 -0.02 0.21 0.20 0.07 0.20 0.21 0.23 0.82 0.07 0.52 1.00 0.45 -0.04 -0.10 -0.01 0.63 -0.02 -0.03 -0.01 0.08 0.08 0.04 0.01 0.03 0.03 0.01 0.06 0.08 0.02 0.01 0.05 0.04 -0.04 0.13 0.06 0.04 0.00 -0.02 0.11 0.01 0.05 0.13 -0.09 0.11 0.19 0.05 -0.04 0.07 0.05 34 -0.32 -0.21 -0.02 0.34 0.37 -0.17 -0.23 -0.07 0.17 0.14 0.19 0.01 -0.27 -0.10 -0.21 -0.16 -0.28 0.22 0.24 0.26 0.14 0.19 -0.02 0.18 0.16 -0.07 0.11 0.07 0.42 0.72 -0.07 0.72 0.45 1.00 0.46 0.12 0.38 0.57 0.52 0.09 -0.12 0.11 0.00 0.06 0.01 -0.02 -0.04 -0.12 -0.02 -0.01 0.01 -0.02 -0.06 -0.11 -0.09 0.03 -0.04 -0.09 -0.11 -0.12 0.06 -0.06 -0.03 0.10 0.02 0.11 0.19 0.14 -0.18 0.00 -0.07 35 -0.10 0.09 0.06 0.03 0.06 -0.13 -0.12 -0.14 -0.04 0.08 0.13 0.00 -0.02 -0.01 -0.20 -0.02 -0.17 -0.03 0.16 -0.03 0.02 0.00 -0.03 0.00 -0.01 -0.09 -0.04 -0.10 0.09 0.13 -0.08 0.31 -0.04 0.46 1.00 -0.01 0.42 -0.05 0.89 -0.02 -0.07 0.03 -0.12 0.10 0.08 -0.07 -0.04 -0.07 -0.03 -0.06 0.06 0.04 -0.08 -0.08 0.01 -0.03 -0.03 -0.06 -0.05 -0.04 -0.01 0.02 0.10 -0.04 0.06 0.01 -0.02 0.03 -0.09 -0.07 -0.05 36 -0.03 0.11 0.09 0.03 -0.06 -0.02 -0.02 -0.08 -0.04 -0.01 -0.01 0.06 0.08 0.05 0.02 0.07 -0.12 -0.08 -0.07 -0.09 0.00 0.08 0.03 -0.04 -0.02 0.03 -0.06 0.01 -0.02 0.16 0.16 0.30 -0.10 0.12 -0.01 1.00 0.03 0.30 -0.01 0.03 0.10 0.01 0.08 0.02 0.04 0.06 0.02 0.14 0.07 0.13 0.04 0.05 0.13 0.13 -0.07 -0.03 -0.02 0.00 0.03 0.05 0.09 0.08 0.13 0.05 -0.04 -0.04 0.01 -0.03 0.01 0.10 0.14 37 -0.10 -0.05 0.25 0.10 0.09 -0.06 -0.06 0.02 0.05 0.18 0.21 0.02 -0.09 -0.03 -0.28 -0.12 -0.22 0.10 0.02 0.07 0.02 0.04 0.03 0.10 0.08 -0.09 0.02 -0.08 0.14 0.19 0.02 0.66 -0.01 0.38 0.42 0.03 1.00 0.49 0.49 -0.10 -0.08 0.01 0.02 0.04 0.00 -0.02 -0.02 -0.06 -0.01 0.02 0.03 0.00 0.00 -0.04 -0.01 -0.01 -0.03 -0.05 -0.04 -0.08 0.03 -0.03 -0.03 0.05 0.02 0.04 0.07 0.05 -0.03 0.00 -0.01 38 -0.21 -0.21 0.11 0.29 0.26 -0.12 -0.19 0.07 0.30 0.16 0.16 0.04 -0.26 -0.09 -0.22 -0.09 -0.26 0.27 0.05 0.27 0.09 0.22 -0.01 0.21 0.21 0.00 0.16 0.13 0.33 0.75 0.10 0.89 0.63 0.57 -0.05 0.30 0.49 1.00 -0.02 0.00 -0.06 0.07 0.12 0.00 -0.03 0.04 0.01 -0.01 0.07 0.11 0.00 -0.02 0.08 0.04 -0.06 0.09 0.03 0.01 -0.01 -0.05 0.14 0.01 0.01 0.17 -0.06 0.10 0.23 0.09 -0.07 0.08 0.05 39 -0.18 0.02 0.11 0.13 0.14 -0.13 -0.10 -0.06 0.01 0.10 0.15 0.00 -0.07 -0.02 -0.22 -0.08 -0.18 0.05 0.17 0.01 0.06 0.05 -0.01 0.01 0.00 -0.12 -0.06 -0.11 0.16 0.18 -0.09 0.37 -0.02 0.52 0.89 -0.01 0.49 -0.02 1.00 0.03 -0.07 0.05 -0.12 0.11 0.07 -0.07 -0.05 -0.08 -0.04 -0.07 0.06 0.03 -0.09 -0.11 -0.05 -0.08 -0.09 -0.13 -0.13 -0.12 -0.01 -0.02 0.02 -0.01 0.08 0.04 0.03 0.08 -0.13 -0.06 -0.06 40 -0.65 -0.27 -0.29 0.29 0.48 -0.30 -0.11 -0.08 0.33 -0.61 -0.41 -0.27 -0.39 0.11 0.32 -0.10 0.00 0.26 0.24 -0.13 0.26 0.24 -0.03 -0.31 -0.26 -0.16 -0.31 0.05 0.34 -0.07 -0.23 -0.02 -0.03 0.09 -0.02 0.03 -0.10 0.00 0.03 1.00 -0.07 0.10 -0.02 -0.03 -0.04 0.04 0.04 -0.14 -0.08 -0.11 -0.11 -0.12 -0.13 -0.16 0.05 0.01 0.00 -0.05 -0.09 -0.08 -0.11 -0.11 -0.24 -0.05 0.03 0.20 0.12 0.20 -0.34 -0.02 -0.17 41 0.20 0.07 -0.05 0.01 -0.10 0.08 0.02 0.02 -0.08 -0.10 -0.12 0.00 0.10 0.08 0.07 0.14 0.03 -0.05 -0.05 -0.04 -0.10 -0.02 0.00 0.02 0.04 0.26 0.01 0.18 -0.16 -0.05 0.40 -0.07 -0.01 -0.12 -0.07 0.10 -0.08 -0.06 -0.07 -0.07 1.00 0.02 0.17 0.13 0.14 0.08 0.20 0.75 0.20 0.27 0.19 0.18 0.31 0.35 -0.09 0.11 0.13 0.16 0.18 0.21 0.28 0.24 0.27 0.22 -0.07 -0.05 -0.08 -0.16 0.12 0.33 0.46 42 -0.08 -0.07 -0.09 0.13 0.13 -0.04 -0.11 0.06 0.08 -0.04 -0.04 0.00 -0.14 -0.01 -0.01 0.02 -0.10 0.16 0.11 0.06 0.10 0.03 -0.04 0.01 0.02 0.09 0.00 0.09 0.13 0.09 0.12 0.08 0.08 0.11 0.03 0.01 0.01 0.07 0.05 0.10 0.02 1.00 0.24 0.09 0.07 0.19 0.20 0.11 0.31 0.26 0.09 0.07 0.25 0.19 -0.08 0.08 0.09 0.09 0.11 0.11 0.15 0.05 0.09 0.16 -0.06 0.03 0.15 0.03 0.02 0.30 0.25 43 0.12 -0.06 -0.02 0.13 -0.02 0.16 0.00 0.16 0.04 -0.10 -0.11 0.00 -0.01 0.00 0.10 0.05 -0.13 0.17 -0.01 0.12 -0.10 0.02 0.01 0.11 0.13 0.30 0.11 0.33 -0.04 0.09 0.49 0.07 0.08 0.00 -0.12 0.08 0.02 0.12 -0.12 -0.02 0.17 0.24 1.00 0.08 0.09 0.49 0.25 0.27 0.29 0.43 0.12 0.12 0.41 0.39 -0.08 0.20 0.21 0.24 0.28 0.26 0.41 0.22 0.25 0.40 -0.10 0.03 0.08 -0.11 0.15 0.81 0.41 44 0.02 0.03 0.02 0.06 0.02 0.08 0.04 0.04 0.00 0.00 0.00 0.00 -0.02 0.02 -0.03 0.06 -0.04 0.05 0.10 0.02 -0.03 -0.06 -0.03 0.04 0.05 0.10 -0.02 0.07 0.01 0.05 0.26 0.05 0.04 0.06 0.10 0.02 0.04 0.00 0.11 -0.03 0.13 0.09 0.08 1.00 0.80 0.22 0.13 0.15 0.12 0.18 0.73 0.61 0.18 0.15 -0.03 0.11 0.13 0.12 0.14 0.19 0.23 0.12 0.25 0.19 -0.01 0.02 0.04 -0.03 0.06 0.51 0.41 45 0.07 0.07 0.01 0.01 -0.03 0.08 0.04 0.03 -0.02 -0.02 -0.04 0.01 0.02 0.03 -0.03 0.11 -0.05 0.02 0.07 -0.02 -0.03 -0.09 -0.04 0.03 0.03 0.10 -0.04 0.05 -0.03 0.01 0.28 0.01 0.01 0.01 0.08 0.04 0.00 -0.03 0.07 -0.04 0.14 0.07 0.09 0.80 1.00 0.16 0.15 0.19 0.17 0.23 0.69 0.77 0.23 0.22 -0.03 0.10 0.13 0.14 0.17 0.22 0.23 0.15 0.30 0.19 -0.01 -0.01 0.04 -0.05 0.10 0.48 0.48 46 0.06 0.04 -0.04 0.02 -0.02 0.08 0.03 0.07 0.01 -0.10 -0.09 -0.02 -0.01 0.04 0.11 0.05 -0.03 0.05 0.02 -0.02 -0.07 0.00 -0.03 -0.03 0.01 0.17 -0.03 0.20 -0.05 0.04 0.43 0.02 0.03 -0.02 -0.07 0.06 -0.02 0.04 -0.07 0.04 0.08 0.19 0.49 0.22 0.16 1.00 0.17 0.15 0.21 0.27 0.16 0.13 0.25 0.25 0.05 0.18 0.21 0.21 0.22 0.24 0.34 0.22 0.19 0.33 -0.06 0.01 0.03 -0.06 0.09 0.75 0.28 104 Table 4-2 (cont’d) 1. DESWLsy 2. LDMTLD 3. PCCNCz 4. WRDFAMc 5. WRDFRQc 6. WRDHYPnv 7. WRDIMGc 8. WRDMEAc 9. WRDPOLc 10. DESSL 11. DESSLd 12. SYNLE 13. SYNNP 14. SYNMEDwrd 15. SYNSTRUTt 16. DRPVAL 17. DRNP 18. DRVP 19. DRNEG 20. PCREFz 21. WRDPRP1s 22. WRDPRP2 23. WRDPRP3s 24. CRFAOa 25. CRFAO1 26. LSAPP1 27. LSASS1 28. LSAGN 29. PCNARz 30. PCDCz 31. DESWC 32. CNCAll 33. CNCCaus 34. CNCLogic 35. CNCADC 36. CNCTemp 37. CNCAdd 38. CNCPos 39. CNCNeg 40. RDFRE 41. Lead 42. Position 43. Claim 44. Counterclaim 45. Rebuttal 46. Evidence 47. Concluding_Statement 48. Lead_effective 49. Position_effective 50. Claim_effective 51.Counterclaim_effective 52. Rebuttal_effective 53. Evidence_effective 54.Concl_Sttm_effective 55. C1 56. C2 57. C3 58. C4 59. C5 60. C6 61. all_markers 62. booster_words 63. discourse_markers 64. hedge_words 65. neg_adj_comp 66. social_order_comp 67. positive_adj_comp 68. joy_comp 69. trust_verbs_comp 70. all_elements 71. all_effective_score 47 0.12 0.03 -0.13 0.01 -0.02 0.06 -0.06 0.04 -0.03 -0.17 -0.15 -0.05 0.01 0.04 0.11 0.07 -0.05 0.05 0.03 -0.01 -0.04 -0.02 -0.03 0.01 0.02 0.21 -0.01 0.21 -0.07 -0.01 0.30 -0.01 0.03 -0.04 -0.04 0.02 -0.02 0.01 -0.05 0.04 0.20 0.20 0.25 0.13 0.15 0.17 1.00 0.23 0.25 0.29 0.16 0.16 0.32 0.56 -0.03 0.17 0.20 0.22 0.25 0.23 0.27 0.17 0.20 0.25 -0.06 0.03 0.05 -0.08 0.11 0.30 0.40 48 0.28 0.13 -0.01 -0.03 -0.19 0.16 0.04 0.09 -0.11 -0.08 -0.15 0.03 0.15 0.08 0.06 0.17 -0.02 -0.04 -0.07 -0.06 -0.13 -0.06 0.00 0.02 0.06 0.31 0.02 0.18 -0.21 -0.03 0.57 -0.03 0.01 -0.12 -0.07 0.14 -0.06 -0.01 -0.08 -0.14 0.75 0.11 0.27 0.15 0.19 0.15 0.23 1.00 0.40 0.49 0.27 0.27 0.54 0.52 -0.10 0.14 0.16 0.21 0.28 0.30 0.42 0.32 0.42 0.35 -0.11 -0.05 -0.02 -0.20 0.21 0.39 0.71 49 0.18 0.07 -0.06 0.01 -0.07 0.15 -0.06 0.11 0.01 -0.08 -0.09 0.00 0.07 0.02 0.01 0.12 -0.13 0.09 0.03 -0.01 -0.02 0.00 -0.01 0.02 0.05 0.18 -0.02 0.12 -0.07 0.04 0.38 0.05 0.06 -0.02 -0.03 0.07 -0.01 0.07 -0.04 -0.08 0.20 0.31 0.29 0.12 0.17 0.21 0.25 0.40 1.00 0.59 0.22 0.23 0.56 0.47 -0.08 0.16 0.19 0.22 0.30 0.29 0.34 0.21 0.28 0.31 -0.12 0.01 0.16 -0.10 0.18 0.35 0.68 50 0.24 0.07 -0.03 0.06 -0.11 0.24 -0.02 0.17 -0.01 -0.10 -0.12 0.01 0.08 0.03 0.00 0.15 -0.19 0.14 0.00 0.02 -0.05 -0.02 -0.02 0.07 0.11 0.29 0.05 0.23 -0.10 0.07 0.57 0.09 0.08 -0.01 -0.06 0.13 0.02 0.11 -0.07 -0.11 0.27 0.26 0.43 0.18 0.23 0.27 0.29 0.49 0.59 1.00 0.34 0.33 0.77 0.60 -0.09 0.27 0.28 0.33 0.42 0.38 0.51 0.30 0.40 0.47 -0.14 0.04 0.18 -0.16 0.22 0.48 0.82 51 0.13 0.08 0.05 0.03 -0.08 0.16 0.08 0.09 -0.06 0.00 -0.01 0.01 0.04 0.03 -0.03 0.10 -0.08 0.04 0.04 0.00 -0.05 -0.11 -0.03 0.05 0.07 0.16 0.00 0.08 -0.06 0.02 0.36 0.03 0.02 0.01 0.06 0.04 0.03 0.00 0.06 -0.11 0.19 0.09 0.12 0.73 0.69 0.16 0.16 0.27 0.22 0.34 1.00 0.80 0.33 0.28 -0.05 0.17 0.19 0.20 0.24 0.29 0.30 0.18 0.35 0.24 -0.04 -0.01 0.06 -0.07 0.15 0.44 0.62 52 0.17 0.11 0.03 0.00 -0.10 0.16 0.07 0.07 -0.07 -0.02 -0.05 0.02 0.08 0.05 -0.02 0.12 -0.06 0.01 0.02 -0.03 -0.06 -0.14 -0.03 0.04 0.05 0.15 -0.01 0.06 -0.09 0.00 0.34 0.00 0.01 -0.02 0.04 0.05 0.00 -0.02 0.03 -0.12 0.18 0.07 0.12 0.61 0.77 0.13 0.16 0.27 0.23 0.33 0.80 1.00 0.33 0.29 -0.05 0.15 0.17 0.19 0.24 0.29 0.28 0.19 0.34 0.22 -0.05 -0.03 0.03 -0.09 0.17 0.42 0.63 53 0.28 0.10 -0.02 0.00 -0.17 0.24 0.01 0.17 -0.04 -0.10 -0.15 0.02 0.11 0.05 0.03 0.15 -0.17 0.10 -0.03 0.00 -0.08 -0.04 0.01 0.07 0.12 0.30 0.04 0.22 -0.14 0.03 0.64 0.05 0.05 -0.06 -0.08 0.13 0.00 0.08 -0.09 -0.13 0.31 0.25 0.41 0.18 0.23 0.25 0.32 0.54 0.56 0.77 0.33 0.33 1.00 0.63 -0.09 0.25 0.27 0.33 0.43 0.40 0.54 0.33 0.45 0.49 -0.13 0.03 0.16 -0.15 0.26 0.46 0.83 54 0.33 0.11 -0.03 -0.02 -0.20 0.22 0.02 0.12 -0.09 -0.11 -0.18 0.02 0.17 0.06 0.11 0.19 -0.13 0.02 -0.06 -0.01 -0.16 -0.10 -0.01 0.06 0.09 0.35 0.06 0.27 -0.23 -0.01 0.60 0.01 0.04 -0.11 -0.08 0.13 -0.04 0.04 -0.11 -0.16 0.35 0.19 0.39 0.15 0.22 0.25 0.56 0.52 0.47 0.60 0.28 0.29 0.63 1.00 -0.08 0.22 0.26 0.32 0.40 0.39 0.50 0.32 0.44 0.44 -0.13 -0.03 0.08 -0.16 0.26 0.45 0.77 55 -0.04 -0.06 0.00 -0.26 -0.07 0.04 0.09 -0.19 0.08 -0.01 0.00 -0.01 -0.06 -0.01 -0.05 0.03 0.16 -0.12 0.00 0.06 -0.10 -0.04 -0.08 0.07 0.03 0.09 0.15 0.13 -0.03 -0.07 -0.04 -0.08 -0.04 -0.09 0.01 -0.07 -0.01 -0.06 -0.05 0.05 -0.09 -0.08 -0.08 -0.03 -0.03 0.05 -0.03 -0.10 -0.08 -0.09 -0.05 -0.05 -0.09 -0.08 1.00 0.65 0.68 0.62 0.55 0.46 -0.10 -0.05 -0.06 -0.10 0.00 0.07 -0.04 -0.02 -0.10 -0.06 -0.11 56 0.04 -0.17 -0.04 -0.03 0.01 0.19 0.05 -0.06 0.10 -0.06 -0.06 -0.03 -0.10 -0.04 -0.04 0.10 -0.01 0.10 0.02 0.27 -0.11 -0.03 -0.16 0.27 0.23 0.34 0.32 0.41 0.02 0.08 0.26 0.04 0.13 0.03 -0.03 -0.03 -0.01 0.09 -0.08 0.01 0.11 0.08 0.20 0.11 0.10 0.18 0.17 0.14 0.16 0.27 0.17 0.15 0.25 0.22 0.65 1.00 0.96 0.92 0.86 0.75 0.23 0.11 0.18 0.22 -0.14 0.12 0.09 -0.10 0.01 0.24 0.27 57 0.10 -0.10 -0.01 -0.11 -0.06 0.23 0.11 -0.07 0.06 -0.10 -0.10 -0.04 -0.03 -0.03 0.00 0.16 0.00 0.03 0.00 0.21 -0.16 -0.11 -0.18 0.23 0.18 0.35 0.28 0.39 -0.07 0.01 0.27 -0.01 0.06 -0.04 -0.03 -0.02 -0.03 0.03 -0.09 0.00 0.13 0.09 0.21 0.13 0.13 0.21 0.20 0.16 0.19 0.28 0.19 0.17 0.27 0.26 0.68 0.96 1.00 0.97 0.92 0.83 0.22 0.12 0.19 0.21 -0.11 0.09 0.05 -0.12 0.05 0.27 0.30 58 0.18 -0.07 -0.02 -0.16 -0.14 0.26 0.12 -0.06 0.00 -0.10 -0.13 -0.02 0.03 -0.01 0.02 0.21 -0.01 0.00 -0.04 0.18 -0.20 -0.16 -0.18 0.22 0.18 0.37 0.28 0.39 -0.14 -0.02 0.31 -0.04 0.04 -0.09 -0.06 0.00 -0.05 0.01 -0.13 -0.05 0.16 0.09 0.24 0.12 0.14 0.21 0.22 0.21 0.22 0.33 0.20 0.19 0.33 0.32 0.62 0.92 0.97 1.00 0.95 0.86 0.26 0.15 0.24 0.23 -0.13 0.05 0.04 -0.16 0.10 0.29 0.36 59 0.23 -0.02 0.02 -0.17 -0.19 0.34 0.16 -0.03 -0.03 -0.11 -0.14 -0.01 0.11 -0.01 0.03 0.25 -0.03 -0.03 -0.04 0.14 -0.20 -0.22 -0.16 0.19 0.16 0.39 0.25 0.37 -0.20 -0.05 0.38 -0.05 0.00 -0.11 -0.05 0.03 -0.04 -0.01 -0.13 -0.09 0.18 0.11 0.28 0.14 0.17 0.22 0.25 0.28 0.30 0.42 0.24 0.24 0.43 0.40 0.55 0.86 0.92 0.95 1.00 0.90 0.29 0.17 0.30 0.25 -0.13 0.04 0.04 -0.18 0.13 0.34 0.46 60 0.22 0.02 0.03 -0.21 -0.21 0.33 0.18 -0.07 -0.06 -0.10 -0.13 -0.01 0.12 0.02 0.04 0.29 0.02 -0.07 -0.01 0.09 -0.21 -0.26 -0.14 0.15 0.12 0.37 0.20 0.33 -0.21 -0.06 0.39 -0.07 -0.02 -0.12 -0.04 0.05 -0.08 -0.05 -0.12 -0.08 0.21 0.11 0.26 0.19 0.22 0.24 0.23 0.30 0.29 0.38 0.29 0.29 0.40 0.39 0.46 0.75 0.83 0.86 0.90 1.00 0.29 0.18 0.33 0.24 -0.11 -0.01 0.01 -0.14 0.16 0.35 0.46 61 0.14 0.05 -0.07 0.18 -0.01 0.05 -0.10 0.17 -0.02 -0.01 -0.06 0.03 -0.03 0.02 0.00 0.05 -0.28 0.25 -0.01 0.10 -0.02 0.06 0.01 0.15 0.18 0.33 0.10 0.31 0.04 0.12 0.76 0.13 0.11 0.06 -0.01 0.09 0.03 0.14 -0.01 -0.11 0.28 0.15 0.41 0.23 0.23 0.34 0.27 0.42 0.34 0.51 0.30 0.28 0.54 0.50 -0.10 0.23 0.22 0.26 0.29 0.29 1.00 0.64 0.62 0.95 -0.12 0.03 0.15 -0.10 0.16 0.50 0.58 62 0.13 0.12 -0.08 0.01 -0.07 -0.04 -0.10 0.04 -0.05 0.00 -0.03 0.02 0.06 0.08 -0.02 0.05 -0.12 0.00 -0.05 -0.03 0.00 0.05 0.05 0.04 0.08 0.22 0.00 0.16 -0.02 0.00 0.55 0.00 0.01 -0.06 0.02 0.08 -0.03 0.01 -0.02 -0.11 0.24 0.05 0.22 0.12 0.15 0.22 0.17 0.32 0.21 0.30 0.18 0.19 0.33 0.32 -0.05 0.11 0.12 0.15 0.17 0.18 0.64 1.00 0.43 0.43 -0.09 -0.03 0.07 -0.07 0.15 0.30 0.37 63 0.31 0.21 -0.04 -0.09 -0.23 0.11 -0.02 0.01 -0.17 0.00 -0.07 0.05 0.15 0.06 0.01 0.17 -0.11 -0.06 -0.05 -0.05 -0.10 -0.12 0.01 0.07 0.08 0.27 0.05 0.16 -0.17 0.04 0.58 0.03 0.05 -0.03 0.10 0.13 -0.03 0.01 0.02 -0.24 0.27 0.09 0.25 0.25 0.30 0.19 0.20 0.42 0.28 0.40 0.35 0.34 0.45 0.44 -0.06 0.18 0.19 0.24 0.30 0.33 0.62 0.43 1.00 0.43 -0.12 -0.07 0.03 -0.15 0.26 0.36 0.54 64 0.06 -0.03 -0.05 0.25 0.07 0.05 -0.10 0.20 0.04 -0.01 -0.06 0.02 -0.09 -0.01 0.01 0.00 -0.29 0.33 0.02 0.16 0.00 0.09 0.00 0.16 0.19 0.30 0.12 0.32 0.10 0.15 0.68 0.16 0.13 0.10 -0.04 0.05 0.05 0.17 -0.01 -0.05 0.22 0.16 0.40 0.19 0.19 0.33 0.25 0.35 0.31 0.47 0.24 0.22 0.49 0.44 -0.10 0.22 0.21 0.23 0.25 0.24 0.95 0.43 0.43 1.00 -0.10 0.07 0.17 -0.08 0.10 0.46 0.50 65 -0.08 0.01 -0.01 -0.03 -0.01 -0.09 0.03 -0.10 -0.05 0.02 0.06 -0.02 -0.04 0.02 -0.03 0.00 -0.03 -0.03 0.16 -0.03 -0.03 -0.02 0.01 -0.01 -0.04 -0.06 -0.04 -0.09 0.05 -0.04 -0.12 -0.01 -0.09 0.02 0.06 -0.04 0.02 -0.06 0.08 0.03 -0.07 -0.06 -0.10 -0.01 -0.01 -0.06 -0.06 -0.11 -0.12 -0.14 -0.04 -0.05 -0.13 -0.13 0.00 -0.14 -0.11 -0.13 -0.13 -0.11 -0.12 -0.09 -0.12 -0.10 1.00 -0.07 -0.38 -0.07 -0.18 -0.09 -0.15 66 -0.24 -0.17 -0.08 0.17 0.29 -0.06 -0.14 -0.04 0.33 -0.02 0.03 -0.03 -0.22 -0.07 -0.05 -0.15 -0.03 0.27 0.11 0.14 0.07 0.12 -0.02 0.04 0.04 -0.06 0.03 0.08 0.20 0.11 -0.01 0.09 0.11 0.11 0.01 -0.04 0.04 0.10 0.04 0.20 -0.05 0.03 0.03 0.02 -0.01 0.01 0.03 -0.05 0.01 0.04 -0.01 -0.03 0.03 -0.03 0.07 0.12 0.09 0.05 0.04 -0.01 0.03 -0.03 -0.07 0.07 -0.07 1.00 0.13 0.01 -0.18 0.02 -0.01 67 -0.15 -0.14 -0.02 0.26 0.22 -0.01 -0.15 0.27 0.29 -0.01 0.03 -0.03 -0.19 -0.07 -0.09 -0.11 -0.27 0.36 0.03 0.10 0.17 0.20 -0.02 0.00 0.02 -0.09 -0.01 0.00 0.23 0.21 0.02 0.20 0.19 0.19 -0.02 0.01 0.07 0.23 0.03 0.12 -0.08 0.15 0.08 0.04 0.04 0.03 0.05 -0.02 0.16 0.18 0.06 0.03 0.16 0.08 -0.04 0.09 0.05 0.04 0.04 0.01 0.15 0.07 0.03 0.17 -0.38 0.13 1.00 0.52 0.11 0.07 0.12 68 -0.28 -0.14 0.00 0.19 0.25 -0.15 -0.07 0.13 0.22 0.00 0.07 -0.05 -0.17 -0.04 -0.04 -0.10 -0.10 0.17 0.08 0.04 0.14 0.13 0.03 -0.07 -0.05 -0.18 -0.05 -0.10 0.23 0.09 -0.21 0.09 0.05 0.14 0.03 -0.03 0.05 0.09 0.08 0.20 -0.16 0.03 -0.11 -0.03 -0.05 -0.06 -0.08 -0.20 -0.10 -0.16 -0.07 -0.09 -0.15 -0.16 -0.02 -0.10 -0.12 -0.16 -0.18 -0.14 -0.10 -0.07 -0.15 -0.08 -0.07 0.01 0.52 1.00 -0.05 -0.12 -0.19 69 0.51 0.20 0.14 -0.25 -0.44 0.31 0.20 0.26 -0.32 -0.02 -0.11 0.04 0.28 0.01 0.11 0.16 -0.03 -0.12 -0.19 -0.10 -0.18 -0.16 0.02 -0.01 0.01 0.17 0.04 0.05 -0.36 -0.12 0.23 -0.10 -0.04 -0.18 -0.09 0.01 -0.03 -0.07 -0.13 -0.34 0.12 0.02 0.15 0.06 0.10 0.09 0.11 0.21 0.18 0.22 0.15 0.17 0.26 0.26 -0.10 0.01 0.05 0.10 0.13 0.16 0.16 0.15 0.26 0.10 -0.18 -0.18 0.11 -0.05 1.00 0.17 0.29 70 0.13 0.01 -0.03 0.11 -0.03 0.16 0.02 0.14 0.01 -0.11 -0.12 -0.01 0.00 0.04 0.10 0.10 -0.10 0.13 0.04 0.06 -0.10 -0.02 -0.02 0.06 0.10 0.32 0.04 0.32 -0.06 0.07 0.61 0.05 0.07 0.00 -0.07 0.10 0.00 0.08 -0.06 -0.02 0.33 0.30 0.81 0.51 0.48 0.75 0.30 0.39 0.35 0.48 0.44 0.42 0.46 0.45 -0.06 0.24 0.27 0.29 0.34 0.35 0.50 0.30 0.36 0.46 -0.09 0.02 0.07 -0.12 0.17 1.00 0.59 71 0.32 0.13 -0.01 0.01 -0.19 0.26 0.03 0.16 -0.08 -0.10 -0.15 0.03 0.14 0.06 0.04 0.20 -0.15 0.07 -0.02 -0.02 -0.11 -0.09 -0.02 0.07 0.11 0.35 0.03 0.24 -0.19 0.02 0.69 0.04 0.05 -0.07 -0.05 0.14 -0.01 0.05 -0.06 -0.17 0.46 0.25 0.41 0.41 0.48 0.28 0.40 0.71 0.68 0.82 0.62 0.63 0.83 0.77 -0.11 0.27 0.30 0.36 0.46 0.46 0.58 0.37 0.54 0.50 -0.15 -0.01 0.12 -0.19 0.29 0.59 1.00 105 4.2 SAMPLING ADEQUACY In order to perform subsequent factor analyses, the Kaiser-Meyer-Olkin (KMO) test for sufficiency of sample size for each category was evaluated. The KMO value was 0.68 and Bartlett’s chi-square approximation was χ2(2080) = 156181.4, p < .001. A KMO value close to 1 indicates that the correlation pattern was compact enough to produce distinct and reliable factors. The resulting value of 0.68 was considered mediocre as it slightly under 0.70, a typical set minimum. It is recommended to make remedial action, which is to remove the variables with low values (typically below 0.50) and recalculate the KMO to see if it improves because the variables with low KMO values did not contribute to the common variance that other variables share, and retaining these variables in the factor analysis may introduce noise and inflate factor loadings (Howard, 2016). Thus, SYNLE (KMO = 0.36), SYNMEDwrd (KMO = 0.25), WRDPRP3s (KMO = 0.42), and CNCTemp (KMO = 0.27) were removed. The revised KMO value was found to be 0.79, which is considered as an average level of sampling adequacy for performing factor analysis on the dataset. By looking at the resultant Bartlett’s sphericity test, χ2(1830) = 143358.6, p < .001, it was noted that newly acquired values were suitable to perform factor analysis. 4.3 ERROR CORRECTIONS To ensure the generation of the most accurate writing features using the language analysis tool, it is essential to identify and correct spelling and grammatical errors in each writing sample. This correction process allows the samples to be accurately processed by Coh-Metrix, SEANCE, and semantic analysis tools, thereby enhancing the reliability of the extracted writing features. The total number of errors, encompassing both spelling and grammar, was recorded for each writing sample, and these combined errors were utilized to calculate inter-rater reliability between two raters. 106 The types of errors that require careful attention, as they may affect the ability of NLP tools to parse the text, are described as follows: (a) misspellings include obvious spelling mistakes (e.g., “becuase” should be corrected to “because”), omissions of parts of words (e.g., “in conclu” should be corrected to “in conclusion”), inappropriate acronyms (e.g., “u” should be corrected to “you”), and the incorrect separation of compound words (e.g., “class room” should be corrected to “classroom”); (b) mechanics errors include misplaced punctuation or the omission of necessary punctuation, which can affect the ability of Coh-Metrix to accurately count the number of sentences and related features within the text (e.g., “By asking multiple friends peers and even family members their thoughts for something that may or may not be important, it could help save someone from making a bad choice in the future.” should be corrected to “By asking multiple friends, peers, and even family members their thoughts on something that may or may not be important, it could help save someone from making a bad choice in the future.”); (c) messy codes are a frequent issue encountered by raters, as students typing on computers may inadvertently produce nonsensical characters that are difficult to interpret (e.g., “Ü Ü Ü Ü, @?@”). Raters systematically removed these extraneous codes. It is noteworthy that errors in other aspects of sentence structure, such as capitalization errors, were not considered, as Coh-Metrix does not differentiate between lowercase and uppercase letters. Two independent raters conducted the corrections on the writing samples. I reviewed a total of 2,977 student essays, while a second rater—who is a doctoral student and former ESL teacher—marked corrections on 25% of randomly selected papers (n = 745) from the set I reviewed according to established guidelines. The percentage agreement regarding the identification of errors for correction was 91%. This figure was calculated by dividing the number of errors marked by the first rater by the average total number of errors identified by 107 both raters, as suggested by Polio (1997). A consensus was subsequently reached for the corrected versions of the papers. The finalized corrections were then input into Coh-Metrix, SEANCE, and content and semantic analysis tools to produce the writing-related variables. 4.4 MAIN ANALYSES 4.4.1 RQ1a. Persuasive Writing Features in Secondary School Students’ Papers Parallel analysis (PA) (Horn, 1965) and the scree plot (Cattell, 1966) were used to determine the appropriate number of factors to retain. PA compares the observed eigenvalues from the correlation matrix with those generated from uncorrelated normal variables. In this process, a factor is considered significant if its associated eigenvalue exceeds the 95th percentile of those obtained from random uncorrelated data (Glorfeld, 1995). The scree plot method provides a visual representation of where the eigenvalues sharply decline. Using both methods together helps avoid the risks of over- or under-extracting factors. To address part of RQ1, all microstructural-level and macrostructural-level variables were loaded into PA software to determine the number of factors to retain. The scree plot from PA for the micro- and macrostructural level is presented in Figure 4-3. According to the PA results, sixteen factors should be retained, as these sixteen factors explained a substantial portion of the variance compared to random data. Collectively, the 16-factor solution accounted for the entirety of the variance in the Pearson correlation matrix, with the individual factors explaining 16%, 11%, 11%, 8%, 8%, 7%, 6%, 6%, 6%, 5%, 4%, 3%, 3%, 3%, 2%, and 2% of variance, respectively. However, this result may indicate an overfitting issue, where the algorithm identified statistically significant patterns that may not be practically meaningful. Based on the strong theoretical foundation outlined in Chapter 2 Literature Review – particularly models such as the levels of language framework, which supported a 3-factor model for microstructural 108 features, along with literature that categorized macrostructural features into content, structure, and tone/style – a 6-factor solution was imposed. This decision was made despite the screen plot and parallel analysis suggesting a higher number of factors, following the recommendations of Sürücü et al. (2024). Figure 4-3 Scree Plot of Parallel Analysis for Microstructural Writing Features After completing the preliminary factor analysis and determining the appropriate number of factors to retain, a Promax rotation was applied to the 6-factor solution, allowing the factors to correlate with each other. This solution accounted for the entire variance in the Pearson correlation matrix, with the six factors explaining 24%, 16%, 16%, 10%, 25%, and 10% of the total variance, respectively. Figure 4-4 presents the results of EFA, visually displaying the interpretation of the six-factor model. Only loadings with an absolute value greater than 0.50 were represented as edges, while those below the 0.50 threshold were excluded from further analyses. 109 Figure 4-4 Exploratory Factor Analysis Plot 110 Table 4-3 presents the factor structure coefficients for the writing features examined in this study. The first factor comprised the following variables: DESWLsy, LDMTLD, WRDFAMc, WRDFRQc, WRDPOLc, SYNNP, DRPVAL, DRVP, WRDPRP1s, WRDPRP2, PCNARz, Readability, social_order_component, positive_adjectives_component, joy_component, and trust_verbs_component. Of these 16 variables, seven were excluded due to factor loadings below 0.5, indicating insufficient convergent validity as measured by average variance extracted. Retaining only variables with higher loadings was deemed beneficial for improving estimates of internal consistency reliability. The retained variables were DESWLsy, LDMTLD, WRDFAMc, WRDFRQc, WRDPOLc, SYNNP, DRVP, PCNARz, and Readability. Factor 1 was labeled Lexical Proficiency, as it captured both lexical complexity (e.g., mean syllables per word, mean word frequency, mean number of modifiers per noun phrase) and vocabulary diversity and depth (e.g., lexical diversity, mean word polysemy, word meaningfulness, mean word familiarity). While some variables, such as Readability and Narrativity Z-score, might not seem directly aligned with Lexical Proficiency, they in fact underscore the role of lexical features in shaping word choice, text complexity, and discourse coherence, thus enhancing narrative effectiveness. Empirically, this outcome aligns with previous research (Aryadoust & Liu, 2015; Lu, 2017; Wilson et al., 2017) employing structural equation modeling to examine and validate NLP-derived writing features, thus informing nuanced analyses of text quality using hypothesized constructs. Our results showed that specific lexical features — especially those related to word information and vocabulary depth — had high loadings, indicating strong associations with the latent construct. Collectively, these lexical features were foundational to persuasive writing (i.e., explaining 24% of the variance). However, in contrast to findings from 111 other studies, readability and narrativity demonstrated substantial loadings of 0.64 and 0.76, respectively, under the Lexical Proficiency factor. This differs from previous research (Graesser et al., 2011; Nkhobo & Chaka, 2023; Plakans & Gebril, 2013), where these features are often associated with discourse-level elements. This discrepancy suggests that vocabulary richness and diversity notably impact readability and narrative depth in persuasive essays, supporting a more refined use of language to improve textual flow (McKeown et al., 2020). The second factor included DESSL, DESSLd, SYNSTRUTt, DRNP, PCDCz, CNCAll, CNCCaus, CNCLogic, CNCADC, CNCAdd, CNCPos, and CNCNeg. After excluding variables with low factor loadings, the retained variables were PCDCz, CNCAll, CNCLogic, CNCAdd, CNCPos, and CNCNeg. This factor was labeled Cohesive Devices, capturing the integration of grammatical and lexical relationships both within and across sentences. This factor reflects cohesive device use, including the repetition of related words and concepts that enables readers to connect ideas at the sentence-to-sentence level (Graesser et al., 2011). Tortorelli (2020) similarly characterized cohesion as the effective linkage of ideas through referential and lexical ties. Our findings indicate that (1) cohesive devices such as reference, substitution, ellipsis, and conjunction are crucial for constructing persuasive discourse in writing and (2) the use of these cohesive devices collectively accounted for 16% of the variance in persuasive writing. The Cohesive Devices factor thus represents explicit linguistic markers that serve to link individual sentences, maintain consistent references, reinforce thematic continuity, and facilitate a smooth flow of ideas (Kuo, 1995). The use of logical (e.g., thus, as a result, then, hence), additive (e.g., in fact, besides, in addition, furthermore), negative (e.g., neither/nor, but, however, though, conversely), and positive connectives (e.g., and, or, either/or, moreover, likewise, also) in persuasive writing provides linguistic leverage for signaling arguments and ultimately enhancing 112 writing quality (Taylor et al., 2019). Overall, this factor emphasizes the strategic use of discourse markers to connect ideas effectively. The third factor included PCREFz, CRFAOa, CRFAO1, LSAPP1, LSASS1, LSAGN, C1, C2, and C6. After removing variables with low factor loadings, the retained variables were PCREFz, CRFAOa, CRFAO1, LSASS1, and LSAGN. The factor was labeled Textual Coherence, incorporating both micro- and macro-level cohesion. The key variables in this latent construct included LSA given/new information, LSA overlap of adjacent sentences/paragraphs, and deep cohesion, all of which contributed to coherence by establishing contextual ties rooted in shared knowledge between writer and reader invoking familiar or possible conceptual worlds (Widdowson, 1983). Distinct from the Cohesive Devices factor, the Textual Coherence factor focuses on linking elements based on “thematic development, organization of information, or communicative purpose of the particular discourse” (Kuo, 1995, p.48). This factor emphasizes maintaining clarity across both small- and large-scale text segments. For example, LSA scores for adjacent/all sentences capture cohesion at a local or superficial level, such as repeating the same word choices across successive sentences (Roscoe et al., 2015). LSA given/new information evaluates the continuity of topic development and avoiding abrupt thematic shifts. This factor promotes reader comprehension of each sentence within the broader discourse. This finding is consistent with prior research in writing assessment, which posits that cohesion is not only syntactic but also semantic, each essential to effective essay composition (Ildikó Berzlánovich & Gisela Redeker, 2012; Witte & Faigley, 1981). An essay can be cohesively constructed yet lack coherence if it fails to represent shared frames of reference. Conversely, an essay may be coherent yet lack cohesive devices, limiting its ability to effectively communicate 113 connections between ideas. Hence, these findings underscore the complexity of cohesion and coherence, which together facilitate a deeper understanding of the interrelationships among ideas within persuasive writing. The fourth factor consisted of PCCNCz, WRDHYPnv, WRDIMGc, and WRDMEAc, with DRNEG excluded due to a low loading value. This factor is challenging to categorize, yet I propose the label Semantic Richness to reflect its focus on qualities that enhance reader visualization and comprehension. Unlike Lexical Proficiency, which captures vocabulary complexity and diversity, Semantic Richness emphasizes the vivid, concrete, and meaningful aspects of word choice that facilitate sensory engagement and conceptual clarity. This factor underscores the ability of words to convey imagery and meaning, as well as their lexical hierarchy (such as hypernymy), which is essential in persuasive writing for grounding abstract concepts in relatable, vivid language. The fifth factor included evidence_effective, claim_effective, concluding_statement_effective, lead_effective, position_effective, all_markers, claim, lead, position, evidence, concluding_statement, DESWC, booster_words, discourse_markers, and negative_adjectives_component. After removing the variables with low loadings, evidence_effective, claim_effective, concluding_statement_effective, lead_effective, position_effective, all_markers, claim, and DESWC were maintained. These retained variables distinctly represent the structural components of persuasive writing along with their respective effectiveness scores. Accordingly, this factor was labeled Structural Effectiveness to reflect its emphasis on the organization and quality of key argumentative elements within the text. This factor accounted for the largest proportion (i.e., 25%) of variance in persuasive writing, 114 indicating the variables involved under this category have a relatively large contribution to persuasive writing. This finding is consistent with previous research (Kamimura, 2011; Tasya, 2022; Uccelli et al., 2013), which highlights that academic writing necessitates not only the mastery of language conventions but also advanced language forms and functions. De La Paz and colleagues (2012) noted that when composing argumentative essays, students often employed relevant evidence and argumentation strategies, effectively contextualizing and corroborating their evidence to enhance the overall quality of their writing. Likewise, Dobbs (2014) found that writers of all skill levels utilized organizational and stance markers to make claims and construct arguments. These findings indicate that effective persuasive writing is bolstered by the strategic use of structural and linguistic markers, as well as the effectiveness of discourse elements, all of which contribute to clarity, coherence, and engagement in the text. The last factor, labeled Refutation Quality, included four variables: rebuttal, counterclaim, rebuttal_effective, and counterclaim_effective. These variables were distinguished from other elements of argumentation and accounted for 10% of the variance in explaining persuasive writing. This finding underscores the significance of argumentation skills in persuasive writing, particularly in how effectively writers address opposing viewpoints and bolster their arguments through rebuttals and counterclaims. Prior research (S. A. Crossley et al., 2014; F. Liu & Stapleton, 2014; J. Qin & Karabacak, 2010) has indicated that the incorporation of counterclaims and rebuttals in writing is often positively correlated with higher holistic scores for persuasive quality. Therefore, the ability to effectively utilize these components is critical for enhancing the overall effectiveness of persuasive writing. 115 The six-factor structure derived from the EFA aligns closely with theoretical models of persuasive writing that emphasize hierarchical and interdependent competencies (Sanders & Wijk, 1996; Wilson et al., 2017). Specifically, the identified factors – Lexical Proficiency, Cohesive Devices, Textual Coherence, Semantic Richness, Structural Effectiveness, and Refutation Quality – reflect the multidimensional nature of writing posited by cognitive ad sociocognitive frameworks, where mastery evolves from foundation linguistic skills (or microstructural elements) to advanced rhetorical strategies (or macrostructural elements). For example, the first three derived factors (i.e., Lexical Proficiency, Cohesive Devices, Textual Coherence) directly map onto the three-level language framework, representing word-level features (e.g., vocabulary diversity, polysemy), sentence-level syntactic cohesion (e.g., logical connectives, referential ties), and discourse-level thematic continuity (e.g., LSA-based coherence metrics). The prominence of Lexical Proficiency (24% variance explained) as a primary factor corroborates theoretical claims that vocabulary depth and complexity are foundational to persuasive efficacy, as lexical choices shape both local clarity and global argumentativeness (Maamuujav, 2022; MacArthur et al., 2019). This structure also suggests developmental trajectories in writing proficiency. Early writers may rely heavily on Structural Effectiveness and Cohesive Devices to scaffold foundational argumentation, while advanced writers increasingly emphasize Textual Coherence and Refutation Quality to refine rhetorical nuance and audience engagement (L. L. Aull & Lancaster, 2014; Rowe & Wilson, 2015). For example, younger students might focus on formulaic templates (e.g., thesis statements, evidence placement) to meet structural expectations, whereas older learners integrate Refutation Quality to persuade through critical engagement with opposing views. This progression mirrors stage-based theories of writing development (Flower 116 & Hayes, 1981; Scardamalia & Bereiter, 1987; Scardamalia & Paris, 1985), where novices prioritize “knowledge-telling” and experts shift toward “knowledge-transforming”. Overall, the results derived from the EFA provide a productive framework of writing assessment for guiding instructional diagnosis and feedback in a straightforward manner (Berninger et al., 1994; N. W. Nelson & Van Meter, 2007; Scott, 2009). This framework not only facilitates the identification of key components that contribute to effective writing but also allows educators to tailor their feedback to address specific areas of improvement. By focusing on these elements, instructors can enhance their teaching strategies, ultimately fostering better writing skills among students. Statistically, the EFA revealed six empirically grounded dimensions reflecting the multifaceted nature of persuasive writing. Subsequently, CFA tested a theoretically informed, redesigned model aligned with established argumentative frameworks (Newell et al., 2011). The CFA model refined the structure by: (1) combining Structural Effectiveness and Refutation Quality into a unified construct, reflecting their shared role in the organizational architecture of persuasion; (2) introducing Tone and Content as two separate factors, aligned with their theoretical recognition as critical proxies for persuasive efficacy; and (3) simplifying four EFA- derived microstructural dimensions (Lexical Proficiency, Cohesive Devices, Textual Coherence, and Semantic Richness) into three parsimonious categories (word-level, cohesion, and discourse- level features) to optimize theoretical clarity and operational coherence. 117 Table 4-3 Factor Structure Coefficients for Micro- and Macro-structural Writing Features DESWLsy LDMTLD PCCNCz WRDFAMc WRDFRQc WRDHYPnv WRDIMGc WRDMEAc WRDPOLc DESSL DESSLd SYNNP SYNSTRUTt DRPVAL DRNP DRVP DRNEG PCREFz WRDPRP1s WRDPRP2 CRFAOa CRFAO1 LSAPP1 LSASS1 LSAGN PCNARz PCDCz DESWC CNCAll CNCCaus CNCLogic CNCADC CNCAdd CNCPos CNCNeg RDFRE Lead Factor 1 -0.76 -0.52 -0.2 0.66 0.84 -0.42 -0.35 0.1 0.63 -0.1 0.02 -0.67 -0.03 -0.28 -0.2 0.65 0.3 0.32 0.4 0.39 0.05 0.09 -0.17 -0.04 0.11 0.76 0.33 -0.13 0.28 0.29 0.34 -0.06 0.04 0.34 0.04 0.64 -0.13 Factor 2 -0.08 0.03 0.25 0.17 0.1 -0.1 -0.18 0 0.08 0.46 0.43 -0.07 -0.48 -0.1 -0.28 0.09 0.07 0.21 0.05 0.11 0.27 0.23 -0.12 0.12 -0.13 0.26 0.65 0.06 0.82 0.39 0.68 0.48 0.59 0.62 0.51 -0.38 -0.09 Factor 3 -0.01 -0.48 0.08 0.04 0.09 0.1 0.02 -0.07 0.1 0.21 0.18 -0.14 -0.14 0 0.1 0.07 -0.01 0.84 -0.11 -0.05 0.81 0.76 0.44 0.84 0.66 0.23 0.17 0.14 0.02 0.17 0.03 -0.13 -0.06 0.1 -0.15 -0.25 0.03 Factor 5 0.32 0.09 -0.18 0.09 -0.11 0.17 -0.11 0.14 -0.02 -0.22 -0.25 0.08 0.15 0.17 -0.24 0.17 -0.01 -0.03 -0.1 -0.01 0.03 0.07 0.44 0.04 0.4 -0.15 0.14 0.78 0.14 0.18 0.01 -0.1 -0.03 0.19 -0.12 -0.05 0.43 Factor 4 0.14 0.02 0.77 0.15 -0.25 0.51 0.65 0.83 0.02 0.12 0.01 0.19 0 0 -0.01 0.04 -0.32 0.09 -0.09 0 0.06 0.09 -0.03 0.08 -0.06 -0.24 0.04 0.04 0.05 0.06 -0.12 -0.25 0.03 0.12 -0.18 -0.17 -0.04 118 Factor 6 Communality Uniqueness Complexity 1.5 2.1 1.6 1.3 1.3 2.4 1.9 1.1 1.1 2.2 2.1 1.3 1.5 2.1 3 1.2 2.4 1.5 1.5 1.4 1.2 1.3 2.5 1.1 1.8 1.8 1.9 1.2 1.3 3.3 1.5 2.2 1 2.2 1.9 2.2 1.4 0.702 0.515 0.748 0.495 0.793 0.49 0.607 0.719 0.416 0.335 0.286 0.517 0.289 0.119 0.185 0.47 0.213 0.855 0.193 0.186 0.739 0.653 0.431 0.731 0.638 0.773 0.608 0.669 0.779 0.325 0.598 0.358 0.352 0.6 0.375 0.648 0.216 -0.06 0.06 0.11 0.01 0.02 0.07 0.13 0.01 -0.03 0.07 0.07 -0.03 -0.12 0.07 0 0.01 0.12 0 0.02 -0.13 0.03 0.02 -0.01 -0.08 -0.09 0.05 -0.14 0.15 -0.11 -0.16 -0.03 0.19 0.03 -0.21 0.2 -0.03 0.07 0.3 0.49 0.25 0.5 0.21 0.51 0.39 0.28 0.58 0.67 0.71 0.48 0.71 0.88 0.81 0.53 0.79 0.14 0.81 0.81 0.26 0.35 0.57 0.27 0.36 0.23 0.39 0.33 0.22 0.67 0.4 0.64 0.65 0.4 0.63 0.35 0.78 Table 4-3 (cont’d) Position -0.02 Claim 0.11 Counterclaim 0.01 Rebuttal 0 Evidence 0.03 Concluding_Statement -0.05 Lead_effective 0.01 Position_effective 0.04 Claim_effective 0.1 Counterclaim_effective 0.07 Rebuttal_effective 0.05 Evidence_effective 0.1 Concluding_Statement_effective 0.05 C1 -0.13 C2 -0.06 C6 -0.03 all_markers 0 booster_words -0.08 discourse_markers -0.07 negative_adjectives_component -0.14 social_order_component -0.04 positive_adjectives_component 0.22 joy_component 0.12 trust_verbs_component 0.26 Note. Factor loadings indicate the strength of the relationship between variables and factors. Communality represents the proportion of 0.116 0.342 0.637 0.71 0.17 0.204 0.431 0.33 0.582 0.723 0.675 0.623 0.581 0.127 0.314 0.408 0.57 0.25 0.444 0.064 0.104 0.271 0.169 0.281 -0.02 0.12 0.02 -0.01 0.02 0.01 0.01 -0.04 0.02 0.02 0.01 0.02 0.04 0.27 0.45 0.44 0.09 0 0.03 -0.02 0.06 -0.09 -0.12 -0.03 0.19 0.07 0.05 0 0.04 0.01 -0.18 -0.01 -0.02 -0.04 -0.07 -0.07 -0.15 -0.09 0.01 -0.22 0.05 -0.06 -0.24 -0.05 0.31 0.4 0.34 -0.36 0.03 -0.04 0.06 0.03 -0.07 -0.09 -0.04 0.02 0.04 0.06 0.03 0.01 -0.04 -0.14 -0.09 -0.16 0.12 0.05 0.11 0.02 0.01 0.12 0.05 -0.03 0.04 -0.05 0.77 0.81 0.08 0.05 0.08 0.06 0.1 0.78 0.75 0.1 0.06 0.04 0.11 0.21 0.11 0.07 0.21 0.05 0 -0.02 -0.02 0.01 0.28 0.55 0.2 0.24 0.4 0.44 0.62 0.57 0.75 0.33 0.33 0.77 0.74 -0.1 0.3 0.32 0.73 0.48 0.57 -0.2 0.02 0.19 -0.15 0.29 0.88 0.66 0.36 0.29 0.83 0.8 0.57 0.67 0.42 0.28 0.32 0.38 0.42 0.87 0.69 0.59 0.43 0.75 0.56 0.94 0.9 0.73 0.83 0.72 1.9 1.2 1.2 1.2 1.2 1.1 1.2 1 1.1 1.4 1.4 1.1 1.1 2.7 2 3.3 1.1 1.2 1.8 2.1 1.1 2.4 2 2.8 variance in a variable explained by the extracted factors. Uniqueness reflects the variance in a variable not explained by common factors. Complexity indicates how many factors are associated with each variable. Bolded values in black indicate the highest absolute factor loadings across the three factors, while italic values indicate absolute factor loadings below 0.50, which have been omitted. 119 4.4.2 RQ1b. Latent Structure of Persuasive Writing Data As discussed in Chapter 2 Literature Review, various writing frameworks, such as the (Not) Simple View of Writing and Levels of Language Framework, represented a spectrum of multidimensional elements. However, these models often incorporate writing-related elements that have been insufficiently investigated within realistic educational writing datasets, particularly those specializing in persuasive writing data (Rodgers et al., 2020). To address the second part of my RQ1, I employed CFA to establish evidence of construct validity within the hypothesized structural equation model. I hypothesized that a higher-order model comprising six first-order factors – word, cohesion, discourse, content, structure, tone/style – would be the most appropriate representation of the data, explained by both microstructural and macrostructural factors. It is noted that my hypothesized CFA model differs slightly from the findings of the EFA presented in RQ1a; nevertheless, I chose to utilize my hypothesis as it is more empirically grounded and will inform the subsequent design of the GPT prompts and automated essay scoring system. I compared this higher-order model to two alternative models: a one-factor model as the baseline CFA model and a two-factor CFA model. Model fit statistics for each model are presented in Table 4-4. It is important to note that a reduced version of the higher-order model was utilized because the full model did not converge. The non-convergence may be attributed to the ratio of observations to the number of parameters, as well as ambitious model specifications involving complex paths and correlated factors. Consequently, I opted to simplify higher-order models by excluding the effectiveness scores of each element at the structural level and SEANCE sentimental composite scores. For consistency in model comparison, these variables 120 were also removed from the alternative models to ensure comparable numbers of features across models. The rationale for removing these variables from all models is twofold. First, these variables are either correlated with or derived from other variables, which could lead to multicollinearity issues. Such multicollinearity can inflate the standard errors of estimates and obscure the true relationships between writing constructs. Second, statistically, the inclusion of highly correlated variables may result in overfitting, which may complicate the model without significantly improving its explanatory power. By simplifying the model and focusing on primary variables that directly contribute to the constructs of interest, I aim to improve model stability and interpretability. Therefore, excluding these variables aligns with both theoretical considerations and the need for a more parsimonious model that can effectively converge during estimation. Table 4-4 Comparison of CFA model fit indices Fit Index One-Factor Two-Factor Higher-Order Free parameters Chi-Square Degree of Freedom (df) p-value CFI TLI RMSEA SRMR Note. CFI = Comparative fit index; TLI = Tucker-Lewis index; RMSEA = Root mean square 61 72162.619 404 p < 0.001 0.528 0.492 0.172 0.162 60 43527.447 405 p < 0.001 0.399 0.354 0.194 0.191 67 12261.106 398 p < 0.001 0.853 0.839 0.103 0.107 error of approximation; SRMR = Standardized root mean square residual. The goodness-of-fit statistics for the one-factor model (baseline model) were statistically significant, χ2(405) = 43527.447, p < .001, with both the CFI and TLI indicating poor fit at 0.399 and 0.354, respectively. Similarly, the RMSEA and SRMR were low at 0.194 and 0.191 respectively, neither of which fell below the desirable cutoff of 0.05, indicating poor model fit. 121 The two-factor model yielded marginal improvements over the one-factor model. The goodness- of-fit statistics were statistically significant, χ2(404) = 72162.619, p < .001, with CFI and TLI values increasing slightly to 0.528 and 0.492, respectively. Additionally, the RMSEA and SRMR values were lower for the two-factor model compared to the one-factor model. The higher-order model showed a marked improvement in fit indices compared to the previous two models, as evidenced by a significant χ2(398) = 12261.106, p < .001. The CFI and TLI values substantially increased, both exceeding the 0.80 level, although remaining below the 0.90 threshold, which is considered as the minimally acceptable threshold in psychometric research. The RMSEA and SRMR values were 0.103 and 0.107, respectively. Although these values still indicated an undesirable fit, they were approaching the desired cutoff of 0.05. Consequently, I argue that the higher-order CFA model was the most suitable based on these criteria. Statistically, the higher-order model outperformed the other models across all fit indices (i.e., chi-square, CFI, TLI, RMSEA, SRMR). The weak fit indices for the hypothesized higher-order model (e.g., CFI = 0.853, RMSEA = 0.103) suggest potential misalignment between the theoretical framework and empirical data, likely due to the complexity of modeling higher-order latent constructs. This constrained the robustness of invariance testing for the RQ2, as poor baseline model fit undermines the validity of group comparisons. Future work should prioritize refining the measurement model (e.g., testing bifactor structures or simplifying hierarchical assumptions) to strengthen psychometric foundations before advancing invariance claims. In this study, the hypothesized higher-order structure was retained for theoretical consistency of higher-order structure, despite its constraints on invariance testing. 122 The standardized loadings for the higher-order CFA model are provided in Table 4-5. Figure 4-5 illustrates the loading estimates for the higher-order model including both microstructural and macrostructural writing features. Table 4-5 Loading Estimate, Standard Error, Z-Value, and P-Value for the Higher-Order CFA Model Loadings word =~ WRDFAMc PCNARz DESWLsy RDFRE SYNNP WRDHYPnv WRDPOLc DRVP WRDIMGc sentence =~ LSAGN LSASS1 PCREFz CRFAOa CRFAO1 LSAPP1 text =~ CNCAll CNCAdd CNCLogic DESSL PCDCz SYNSTRUTt CNCPos content =~ C1 C2 C3 C4 C5 C6 structure =~ Lead Position Claim Counterclaim Rebuttal Evidence Concluding_Statement tone =~ all_markers booster_words discourse_markers micro =~ word sentence text macro =~ Estimate SE z-value P(>|z|) 1 1.532 -1.187 0.621 -1.246 -0.785 0.990 1.054 -0.728 1 1.211 1.527 1.419 1.372 0.505 1 0.484 0.886 0.251 0.949 -0.390 0.956 1 1.759 1.884 1.908 1.896 1.713 1 0.614 1.531 0.878 0.916 1.400 1.143 1 0.607 0.744 1 0.733 2.160 123 0.049 0.041 0.041 0.048 0.038 0.038 0.037 0.038 0.042 0.063 0.052 0.051 0027 0.021 0.023 0.025 0.026 0.025 0.015 0.063 0.065 0.068 0.068 0.065 0.084 0.107 0.082 0.082 0.108 0.085 0.027 0.028 0.066 0.247 31.048 -28.688 15.196 -26.236 -20.793 25.847 28.857 -18.997 28.838 24.048 27.411 26.972 18.426 23.238 38.213 9.866 36.855 -15.314 63.440 28.041 29.151 28.031 27.696 26.462 7.356 14.259 10.737 11.144 13.000 13.386 22.661 26.353 11.112 8.731 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Table 4-5 (cont’d) content structure tone Variances word =~ WRDFAMc PCNARz DESWLsy RDFRE SYNNP WRDHYPnv WRDPOLc DRVP WRDIMGc sentence =~ LSAGN LSASS1 PCREFz CRFAOa CRFAO1 LSAPP1 text =~ CNCAll CNCAdd CNCLogic DESSL PCDCz SYNSTRUTt CNCPos content =~ C1 C2 C3 C4 C5 C6 structure =~ Lead Position Claim Counterclaim Rebuttal Evidence Concluding_Statement tone =~ all markers booster_words discourse_markers micro =~ word sentence text macro =~ content structure tone Covariances micro~~macro Regressions 1 1.902 2.160 Estimate 0.620 0.109 0.465 0.854 0.410 0.766 0.628 0.578 0.799 0.597 0.410 0.061 0.190 0.243 0.897 0.191 0.811 0.365 0.949 0.271 0.877 0.261 0.727 0.155 0.030 0.005 0.017 0.198 0.860 0.947 0.672 0.892 0882 0.726 0.817 0.063 0.655 0.481 0.266 0.298 -0.062 0.161 0.077 0.355 Estimate 0.284 Estimate 124 0.178 0.240 SE 0.021 0.014 0.017 0.045 0.019 0.023 0.021 0.020 0.023 0.032 0.018 0.022 0.013 0.014 0.024 0.013 0.026 0.022 0.250 0.021 0.031 0.018 0.028 0.008 0.003 0.004 0.005 0.024 0.017 0.083 0.025 0.051 0.056 0.033 0.027 0.030 0.037 0.033 0.031 0.075 0.014 0.015 0.012 0.030 SE 0.061 SE 10.698 13.302 z-value 30.205 7.891 27.039 18.847 21.910 33.566 29.690 28.718 34.835 18.778 23.229 2.740 14.488 17.875 37.257 14.331 31.231 16.742 3.792 12.929 18.010 14.285 26.186 20.266 9.211 1.533 3.328 8.098 50.426 11.349 26.311 17.353 15.857 21.761 30.165 2.084 17.588 14.758 8.474 3.992 -4.482 10.643 6.184 11.712 z-value 4.638 z-value 0.000 0.000 P(>|z|) 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.006 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.125 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.037 0.000 0.000 0.000 0.000 0.000 0.000 0.275 0.000 P(>|z|) 0.000 P(>|z|) Table 4-5 (cont’d) micro~essay score macro~essay score Note. SE = standard error -0.209 2.729 -0.205 0.939 1.020 2.906 0.000 0.000 Figure 4-5 Diagram of the Higher-Order CFA Model 4.4.3 RQ2a. Tests of Measurement Invariance Among SPED Groups The study employed the four-step approach for testing measurement invariance, as outlined by Widaman & Reise (1997), including (1) configural, (2) metric, (3) scalar, and (4) residual invariance. This framework was used to assess whether the resulting reduced higher- order CFA model could detect differences in persuasive writing performance among students with different special education (SPED) status. Table 4-6 presents the fit indices for the models testing measurement invariance. 125 The analysis began by examining configural invariance, where the same factor structure was specified across SPED groups while allowing all other parameters to vary. The fit indices for this model were suboptimal, χ2(1246) = 26220.821, RMSEA = 0.136, CFI = 0.648, AIC = 232549.253, suggesting that students with different SPED status did not uniformly conform to the higher-order structure of persuasive writing constructs. However, the baseline model for configural invariance was established for each subscale. Next, metric invariance was assessed by constraining the same factor structure and requiring equal factor loadings across groups, while allowing all other parameters to vary. The fit indices for this model still reflected a suboptimal fit, χ2(1281) = 26365.663, RMSEA = 0.135, CFI = 0.646, AIC = 232724.053. The changes in CFI (less than 0.01) and RMSEA (less than 0.015) across different groups (as suggested by Cheung & Rensvold, 2002) supported the acceptance of metric invariance. This indicates the relationships between the latent variables and the observed indicators are equivalent across SPED groups. Scalar invariance was then tested by constraining the item intercepts to be equivalent across the two SPED groups. The fit indices are χ2(1310) = 27151.004, RMSEA = 0.143, CFI = 0.638, AIC = 233625.797. While the changes in CFI and RMSEA across different groups remained within acceptable cutoffs, the decrease in CFI by 0.008 suggests a need for caution. Nonetheless, scalar invariance was considered established. Finally, residual invariance was evaluated by constraining the item residuals to be equal across the groups. The fit indices, χ2(1346) = 28174.077, RMSEA = 0.136, CFI = 0.616, AIC = 235719.929, indicated that the changes in CFI (above 0.015) across different groups did not meet the required threshold, suggesting that residual invariance was not fully supported. This indicates 126 significant differences in item residuals across SPED groups, and thus residual invariance was not achieved. In conclusion, our analysis offers a novel investigation into the measurement invariance of microstructural and macrostructural writing constructs across the SPED groups, which to our knowledge, has not been previously explored. Although the four tests of measurement invariance did not meet the suggested cutoff values for fit indices, the results provide emerging evidence of construct and discriminant validity. Specifically, the number of factors and the relationships between the latent variables and their indicators remained consistent across SPED groups. This suggests that the higher-order CFA model derived from RQ1b is invariant across groups, indicating that any observed group differences in writing constructs likely reflect actual performance differences rather than variations in how participants interpret the instrument. 127 CFI △CFI AIC 232549.253 -0.002 232724.053 -0.008 233625.797 -0.022 235719.929 0.648 0.646 0.638 0.616 Table 4-6 Fit indices for the models testing measurement invariance Model Configural invariance Metric invariance Scalar invariance Residual invariance A B C D Compared Model χ2 26220.821 df 1246 p 0.000 △χ2 (p) RMSEA △RMSEA 0.136 B vs. A C vs. B D vs. C 26365.663 1281 0.000 144.842 (0.000) 27151.004 28174.077 1310 1346 0.000 0.000 785.341 (0.000) 1023.073 (0.000) 0.135 0.134 0.136 -0.001 -0.001 +0.002 128 4.4.4 RQ2b. Differences Among the Two SPED Groups Table 4-7 presents the values of all measurement parameters, including factor loadings, item intercepts, and error variances, for the two SPED groups. In cases where parameters were invariant across both groups, a single parameter value is reported in the table; conversely, differing parameter values are provided where significant variations across groups were observed. This analysis examined the validity of the common assumption of population homogeneity within the microstructural and macrostructural writing construct model. The factor loadings demonstrated full metric invariance across the SPED groups, indicating that both groups utilized the same metric, with the set of factor loadings remaining fully invariant. Metric invariance was established, indicating that the factor loadings were consistent across groups, which suggests that the same underlying constructs were being measured in the same way in each SPED group. However, scalar invariance was also assessed, and while item intercepts were generally consistent across groups, a slight decline in model fit (e.g., CFI) was observed when scalar invariance was imposed. This indicates that, while the factors maintain equivalent meanings across groups, there may be minor baseline differences in item responses that slightly affected the overall model fit. According to Table 4-7, the item intercepts and error variances revealed substantial variability. The presence of differing item intercepts across the SPED groups suggests that, while the overall factor structure remains consistent, the starting points (or baselines) of the relationships between the latent variables and the observed indicators vary. It is noted that all writing-related variables, with the exception of LSASS1 and CRFAOa, exhibited opposite trends, indicating that students receiving special education services had lower average response levels for the same writing variables. This pattern may reflect lower writing capabilities or experiences 129 among students in this group. This finding aligns with previous research indicating that students with disabilities often encounter challenges in various aspects of writing, including vocabulary use (Bryant et al., 2003; O’Connor et al., 2019), cohesion and coherence (Koutsoftas & Petersen, 2017), content development (Graham & Harris, 2010; Monroe & Troia, 2006), structural organization (Golley, 2015; G. A. Troia, 2002), and tone/style (Reed et al., 2023). The finding that students receiving special education services demonstrated higher measurement errors in specific indicators (e.g., RDFRE, LSAPP1, DESSL, SYNSTRUTt, position, and concluding statement) implies that these indicators may be less reliable for this student population. This variability may indicate that these indicators do not effectively capture the constructs for students with special education needs, potentially due to factors such as learning differences, instructional practices, or the complexity of the writing tasks. To sum up, the variability observed in the item intercepts and error variances suggests that, while both education groups adhere to a shared framework for the writing constructs, significant differences exist in how these constructs are manifested in the writing of students with varying special education statuses. This finding underscores the necessity for tailored assessment strategies that account for these differences, ensuring that evaluations are equitable and valid for all students. Recognizing that certain writing indicators may operate differently for students in special education can help educators to provide targeted instruction and support. This understanding can inform curriculum design, intervention strategies, and the development of more effective writing assessment tools that better accommodate the needs of students with diverse learning profiles. 130 Table 4-7 Invariant and Non-Invariant Factor Loadings, Item Intercepts, and Error Variances in Two SPED Groups Latent Variable Word Cohesion Text Content Structure Tone/style Item WRDFAMc PCNARz DESWLsy RDFRE SYNNP WRDHYPnv WRDPOLc DRVP WRDIMGc LSAGN LSASS1 PCREFz LDTTRc CRFAOa CRFAO1 LSAPP1 CNCAll CNCAdd CNCLogic DESSL PCDCz SYNSTRUTt CNCPos C1 C2 C3 C4 C5 C6 Lead Position Claim Counterclaim Rebuttal Evidence Cncldng_Sttmnt all_markers booster_words discourse_marker Factor Loadings TAP SPED 0.446 0.726 -0.645 0.413 -0.634 -0.456 0.46 0.497 -0.354 0.584 0.708 0.77 -0.499 0.779 0.767 0.359 0.872 0.589 0.605 0.121 0.61 -0.219 0.749 0.605 0.852 0.883 0.874 0.828 0.74 0.14 0.076 0.099 0.891 0.862 0.217 0.146 0.694 0.975 0.556 Item Intercepts TAP -0.002 -0.015 0.03 0.02 0.019 0.064 0.026 0.007 0.004 0.064 0 0.006 -0.103 0 0.001 0.067 0.022 0.003 -0.003 -0.052 0.01 0.01 0.021 0.007 0.093 0.108 0.116 0.137 0.132 0.078 0.063 0.094 0.058 0.065 0.063 0.078 0.094 0.134 0.106 SPED 0.007 0.063 -0.128 -0.085 -0.082 -0.27 -0.11 -0.028 -0.015 -0.271 0.001 -0.024 0.435 0.001 -0.005 -0.285 -0.093 -0.014 0.012 0.221 -0.042 -0.044 -0.091 -0.032 -0.392 -0.456 -0.489 -0.581 -0.557 -0.329 -0.268 -0.397 -0.246 -0.274 -0.265 -0.33 -0.397 -0.283 -0.362 Error Variances TAP 0.662 0.175 0.363 0.635 0.363 0.643 0.659 0.602 0.805 0.45 0.311 0.21 0.6 0.185 0.209 0.748 -0.032 0.517 0.465 0.537 0.452 0.822 0.218 0.522 0.103 0.021 0.037 0.113 0.303 0.95 0.724 0.96 0.184 0.261 0.92 0.849 0.287 0.298 0.512 SPED 0.814 0.294 0.367 1.269 0.488 0.775 0.74 0.693 0.821 0.997 0.512 0.271 0.79 0.281 0.31 1.145 -0.079 0.552 0.645 2.755 0.693 1.401 0.295 0.682 0.08 0.024 0.02 0.048 0.122 0.973 2.037 0.921 0.196 0.094 0.998 1.383 0.146 0.123 0.530 * TAP = typically achieving peers; SPED = students who receive special education services 131 4.4.5 RQ3a. Automated Persuasive Writing Scoring Algorithm After addressing the first two research questions, I gained valuable insights into the writing features most indicative of persuasive writing quality among secondary school students. To investigate RQ3, the first step involves building and validating an automated writing scoring system that can provide machine-generated scores for GPT-revised essays based on the identified writing constructs. Specifically, the six-level writing features identified in RQ1 represent salient characteristics that influence persuasive writing quality. These features were subsequently integrated into a pretrained language model, Bidirectional Encoder Representations from Transformers (BERT), to facilitate the prediction of persuasive essay scores. There are three primary reasons for employing BERT as a baseline model. First, BERT is a transformer-based natural language processing model that has become the de facto industry standard for a variety of downstream tasks, including text classification and score prediction (Cochran et al., 2022). The effectiveness of BERT-based transformers for evaluating student responses has been consistently documented (Poulton & Eliens, 2022; Wulff et al., 2023; Zhu et al., 2022). Second, BERT’s architecture generates contextualized word embeddings by training on extensive unlabeled corpora, allowing it to capture task-specific nuances and contextual details effectively (Wang et al., 2024). Third, once the BERT model is successfully trained and validated, future datasets can be processed without the need to reintroduce features from external sources. Different from other supervised machine learning models, the BERT model can directly utilize the embedded contextual information for various tasks. To develop and validate the automated essay scoring model, I utilized the pre-trained BERT model from the Huggingface library and configured it to align with the parameter of the basic version of Google BERT (Devlin et al., 2019). The model included 12 self-attention layers, 132 12 attention heads, and a hidden dimension of 768 for the embedding vectors. During fine- tuning, the pre-trained models were concatenated with a text classification model. In the training phase, essays (n = ~25,000) from the PERSUADE corpus were input into BERT to generate feature vectors. The mean of these vectors was forwarded to the text classification layer to generate predictions based on six scoring levels. The cross-entropy loss was then computed by comparing the predictions to the ground truth. For these experiments, I employed the Adam optimizer with consistent hyperparameters: a learning rate of 1e-04, a batch size of 10, an input sequence length of 128, 20 training epochs, a weight decay of 0.0005, and a dropout rate of 0.3. The training was performed on an NVIDIA 1080Ti GPU. All experiments involved training, testing, and validating the model on original student essay scores, with a test size of 0.3. In this section, I report key evaluation metrics – MAE, standard deviation of MAE, R2 – for evaluating and comparing the performance of the two auto-scoring models in predicting persuasive essay scores: (1) the BERT-generic model as the baseline model and (2) the BERT model incorporating the EFA-identified writing features. Performance results for all approaches are presented in Table 4-8. Table 4-8 Results of Two BERT Models on Scoring Prediction Task Models BERT BERT+writing features MAE (SD) R2 0.65 (0.67) 0.42 (0.33) 0.66 0.82 Note. MAE = Mean Absolute Error, SD = Standard Deviation Both BERT models demonstrated strong performance, with BERT model incorporating writing features consistently outperforming the generic BERT model across all evaluation metrics. The BERT pretraining model with EFA-derived writing features achieved higher scores in all metrics, particularly for R2, indicating that 82% of the variance in persuasive essay scores 133 was explained by this model. This finding suggests a robust alignment between machine- generated scores and human-rated scores, with the model also demonstrating greater stability, evidenced by a lower standard deviation of the MAE. Given the successful performance of this model on the PERSUADE dataset, I have selected it as the final model for further analyses. This model will be used to automatically score GPT-revised essays and compare their performance to the original student essays, evaluating improvements in quality based on key writing features. Figures 4-6 and 4-7 present scatter plots illustrating the predicted and actual results for the BERT-generic model and the BERT model enhanced with writing features, respectively. Figure 4-6 Predicted and True Values in BERT-Generic Model 134 Figure 4-7 Predicted and True Values in BERT Model Enhanced with Writing Features 4.4.6 RQ3b. Effectiveness of GPT-Revised Essays The prompt for generating feedback (see Table 4-9) using GPT was developed based on the insights gained from the first two RQs and relevant feedback literature (Hattie & Timperley, 2007; Graham & Sandmel, 2011). Drawing from Meyer et al. (2024), and the prompt instructed GPT to provide students with suggestions for improving their text by offering hints and examples. Specifically, building on RQ1 and RQ2 findings, GPT was directed to focus feedback on six key aspects of persuasive essay quality: lexical proficiency, cohesion and coherence, text complexity, structure, content, and tone/style. To minimize cognitive load for students, the feedback was required to be structured, with short examples from the student’s own text for individualized guidance. Additionally, the prompt incorporated a revised version of the student essays to assess whether GPT-revised essays could serve as effective mentor texts aligned with the six writing dimensions identified through factor analysis. The GPT settings included the 135 GPT-3.5-turbo model, a temperature of 0.7, and a maximum length of 1800 (see Figure 4-8). Examples of the generated feedback are provided in Figure 4-9. The phrasing used in the GPT prompts aligns with the CFA-derived latent constructs (see Tables 4–5 and Figure 4–5), which are grounded in the hypothesized theoretical framework. Minor terminological divergence (e.g., replacing the factor label “word” with the prompt phrasing “lexical proficiency” or “text” with “discourse complexity”) reflects intentional adjustments to maintain the theoretical integrity of the constructs while optimizing clarity for generative AI applications. 136 Table 4-9 Prompt For GPT Please read the following instruction step by step: In the following you will read a persuasive essay of a secondary student. Give elaborated feedback on the following text on six aspects: (1) lexical proficiency; (2) cohesion and coherence; (3) discourse complexity; (4) content; (5) structural efficiency; and (6) tone/style. Give feedback in a highly structured manner displayed as a table with one column for the six aspects with one row for each aspect (lexical proficiency, cohesion and coherence; discourse complexity; content; structural efficiency; tone/style), one column with hints for improvement on the relevant aspect, and one column with three examples for improvement on the relevant aspect. The examples should include SHORT aids to revise the essay, such as sentence beginnings or transition words. The examples must NOT contain fully formulated sentences. Provide three examples per aspect. Write hints as fully formulated sentences. The suitable examples for the corresponding aspect should be presented as key points. Use the following table structure as an example and use the table content as orientation for your own feedback: 137 Table 4-9 (cont’d) complexity 2 / example discourse complexity 3 content 3
Aspect Hints for improvement Examples for improvement lexical proficiency hint 1 for lexical proficiency/hint 2 for lexical proficiency/hint 3 for lexical proficiency example lexical proficiency 1 / example lexical proficiency 2 / example lexical proficiency 3
cohesion and coherence hint 1 for cohesion and coherence / hint 2 for cohesion and coherence /hint 3 for cohesion and coherence example cohesion and coherence 1 / example cohesion and coherence 2 / example cohesion and coherence 3
discourse complexityhint 1 for language/hint 2 for discourse complexity/hint 3 for discourse complexity example discourse complexity 1 / example discourse
content hint 1 for content/hint 2 for content/hint 3 for example content 1 / example content 2 / example content structural efficiency hint 1 for structural efficiency/hint 2 for structural efficiency/hint 3 for structural efficiency example structural efficiency 1 / example structural efficiency 2 / example structural efficiency 3
tone/style hint 1 for tone and style/hint 2 for tone and style/hint 3 for tone and style example tone and style 1 / example tone and style 2 / example tone and style 3