THE EFFECTS OF TIME CONSTRAINTS, GENRE, AND PROFICIENCY ON L2 WRITING FLUENCY BEHAVIORS AND LINGUISTIC OUTCOMES By Jongbong Lee A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Second Language Studies – Doctor of Philosophy 2019 ABSTRACT THE EFFECTS OF TIME CONSTRAINTS, GENRE, AND PROFICIENCY ON L2 WRITING FLUENCY BEHAVIORS AND LINGUISTIC OUTCOMES By Jongbong Lee Length of writing has been measured to identify development, and task and genre effects in second language (L2) writing. Moving beyond a singular focus on assessing writing outcomes (i.e., the length of writing), this study investigates L2 learners’ writing fluency-related behaviors and the cognitive processes behind them by exploring the effects of genre, time constraints, and proficiency. Drawing on Kellogg’s model of writing (1996), this study adopts a mixed-methods design and uses (1) keystroke logging to capture writing behaviors, such as fluency, pausing, and revision, (2) syntactic complexity analyzer and Coh-metrix to investigate linguistic complexity, and (3) stimulated recalls to reveal cognitive processes used by L2 learners. Participants included 123 English L2 learners studying at a university, with high- intermediate (60 participants) or advanced (63 participants) proficiency according to standardized tests and a cloze test. Their writing behaviors were recorded by Inputlog 7.0, a keystroke logging program. The participants were assigned at random to the long-timed (60 minutes) or short-timed group (30 minutes). Furthermore, each participant was randomly assigned to either the narrative or the argumentative essay on the first day, and the other genre on the second day. Sixteen participants were randomly selected for stimulated recall sessions, and they were required to recall their writing processes as prompted by the screen recordings. For triangulating the data, this study used the stimulated recall comments and the keystroke logs. Additionally, the participants completed an exit survey which captured their perception on genres and time allotment. Repeated measures MANOVAs revealed that the L2 learners’ writing behaviors such as fluency and linguistic outcomes were affected by differences in time constraints, genre, and proficiency. The time constraints affected writing fluency behaviors in that learners in the short- timed group produced higher writing fluency behaviors, such as longer P-burst length than those in the long-timed group. The argumentative genre led the participants to respond with more complex language and less fluent writing behaviors than the narrative genre. The advanced learners showed more syntactically complex language and more fluent writing behaviors than the high intermediate learners. The stimulated recall data showed that L2 learners’ writing processes, such as planning and translation, differed across time constraints, genre, and proficiency. In addition, a two-way ANOVA showed that the effect of proficiency on writing quality was significant whereas the different time constraints did not affect writing quality. Writing fluency measures were correlated with linguistic measures and writing quality. A linear regression analysis showed that some writing fluency behavior measures predicted writing quality. Further, depending on proficiency and time allotment, the participants’ perception on writing tasks differed. Taken together, the findings regarding theoretical, methodological, and pedagogical implications are discussed. Copyright by JONGBONG LEE 2019 ACKNOWLEDGEMENTS I have been so fortunate to have had so much support from many people. I would like to express gratitude to everyone I met during my graduate studies. First, I would like to express deep thanks to Dr. Charlene Polio, whose exemplary supervision has enabled me to enjoy PhD life. She always reads my papers with lighting speed and gives me many helpful comments. She is aware of my strengths and weaknesses, and her good advice has helped me to develop my research. She genuinely cares about her students and is an inspiring role model and scholar. I hope to follow in her footsteps and become a scholar devoted to research, teaching, and mentoring students. I also thank Dr. Shawn Loewen for helping me expand my ideas about writing fluency and add an exit questionnaire to this dissertation. In addition to offering his support for my dissertation, he read my two qualifying papers and gave me constructive feedback during my doctoral studies. I thank Dr. Paula Winke for providing me with relevant literature for this dissertation and giving me the opportunity to develop the theoretical background of this dissertation in her Language Assessment class. I would like to thank Dr. Patti Spinner for equipping me with research skills through her Advanced Topic in SLA class, which provided a foundation for my dissertation. Although he was not on my committee, Dr. Peter De Costa has been supportive throughout my doctoral studies, and I am grateful that I have had many opportunities to work with him. I am also thankful to Dr. Ok-Sook Park in the Korean program for giving me the opportunity to teach Korean. I would like to thank the College of Arts and Letters and the Graduate College for a v Dissertation Completion Fellowship and the Second Language Studies program for a research grant. Additionally, the AAAL graduate student award provided me with funds to present part of my dissertation. I also have many friends to thank. I am fortunate to have had the members of my cohort, Dan, Hima, Minhye, Jungmin, Stella, and Wendy, to share highs and lows with me during my doctoral studies. I also thank Shinhye, Hyung-Jo, Xiaowan, Michael, Wenyue, Ryo, Kiyo, and Dustin for making my PhD life enjoyable in East Lansing. I am thankful to Matt and Karolina for helping me to revise the narrative rubric. Thank you, Laura and Amy for rating all the essays and providing the inter-rater reliability. My sincere appreciation goes out to my professors and friends at Georgetown University. I would like to thank Dr. Alison Mackey for guiding me as I completed my master’s degree and helping me have the opportunity to study at Michigan State University. My experiences working with her inspired me to be a scholar. I am also grateful to Dr. Ronald Leow for including me in his teletandem project. I thank Dr. John Norris for equipping me with TBLT and statistical knowledge. To Dr. Lourdes Ortega, I am grateful for an enlightening introductory course on SLA. I would like to thank Yuka, John, Dong Jin, Sandra, Eunji, Hae In, Mari, Yoonsang, Youngah, Young-A, Sakol, and Tyler for their friendship. I also want to thank my former advisor and professors at Korea University. Dr. Jennifer Yusun Kang inspired me to make pursuing a PhD a lifelong goal. I am thankful to Dr. Inn-Chull Choi for teaching me statistics, and I am indebted to Dr. Myung-Hye Huh for opening a second language writing class and helping me with the data collection for this dissertation. I thank my parents for their endless support and encouragement and my brother for doing chores for me. Thank you to my parents-in law and sister-in law for treating me like your own vi son and brother. Last but not least, I thank my wife, Myeongeun, for reading my dissertation, giving me constructive feedback, and teaching me what true love is. Thank you for all that you have done to help me during this journey. vii TABLE OF CONTENTS LIST OF TABLES ...........................................................................................................................x LIST OF FIGURES ...................................................................................................................... xii CHAPTER 1. INTRODUCTION ....................................................................................................1 CHAPTER 2. LITERATURE REVIEW .........................................................................................5 2.1. Definitions of fluency and its relationship to other measures ..............................................5 2.1.1. Fluency ...........................................................................................................................5 2.1.1.1. Fluency and writing processes ..............................................................................10 2.1.1.2. Fluency and writing quality ..................................................................................14 2.1.2. Complexity ..................................................................................................................15 2.2. Factors responsible for differences in writing fluency and linguistic outcomes: Time constraints, genre, and proficiency ............................................................................................20 2.2.1. Time constraints ..........................................................................................................20 2.2.2. Genre ...........................................................................................................................23 2.2.3. Proficiency ..................................................................................................................25 2.3. Research questions .............................................................................................................27 CHAPTER 3. METHOD ...............................................................................................................30 3.1. Participants ..........................................................................................................................30 3.2. Materials .............................................................................................................................31 3.3. Procedures ..........................................................................................................................33 3.4. Scoring ...............................................................................................................................36 3.5. Analysis ..............................................................................................................................38 3.5.1. Qualitative analysis .....................................................................................................41 3.5.2. Statistical analysis .......................................................................................................42 CHAPTER 4. RESULTS ...............................................................................................................46 4.1. Quantitative analysis ...........................................................................................................46 4.2. Qualitative analysis .............................................................................................................77 4.3. Exit questionnaire results: L2 writers’ perceptions of the time constraints and genres......88 CHAPTER 5. DISCUSSION .........................................................................................................94 5.1. Overview of research questions and results ........................................................................94 5.2. Research question 1: To what extent do proficiency and time constraints affect writing fluency behaviors and linguistic outcomes of L2 writers’ writing in two genres? ....................95 5.3. Research question 2: As evidenced by the stimulated recall data, to what extent do proficiency and time constraints affect L2 writers’ writing process in the two genres? .........100 5.4. Research question 3: How do L2 proficiency and time constraints affect writing quality in two essay genres? .....................................................................................................................103 5.5. Research question 4: Which fluency measures are related to text quality and linguistic complexity, and to what extent? ..............................................................................................105 5.6. Research question 5: How do L2 writers perceive the effects of time constraints and genre on their writing? .......................................................................................................................106 viii 5.7. Contributions of this dissertation ......................................................................................108 5.7.1. Understanding time constraints..................................................................................108 5.7.2. Understanding fluency ...............................................................................................109 CHAPTER 6. CONCLUSION.....................................................................................................112 6.1. Summary ...........................................................................................................................112 6.2. Theoretical, methodological, and pedagogical implications .............................................113 6.3. Limitations and future research ........................................................................................115 APPENDICES .............................................................................................................................117 APPENDIX A: Prompts for the narrative and the argumentative essays ................................118 APPENDIX B: Cloze test and answer key ..............................................................................119 APPENDIX C: Timed key-boarding skill test ........................................................................125 APPENDIX D: Language experience and proficiency questionnaire .....................................126 APPENDIX E: Exit questionnaire ...........................................................................................127 APPENDIX F: Stimulated recall protocol ...............................................................................128 APPENDIX G: Argumentative essay rubric............................................................................129 APPENDIX H: Narrative rubric ..............................................................................................130 APPENDIX I: Reasons for pausing and revision ....................................................................131 REFERENCES ............................................................................................................................139 ix LIST OF TABLES Table 1 Writing-Process Research Using Keystroke-Logging Techniques and Grouped by Research Focus ................................................................................................................................8 Table 2 Syntactic Complexity Measures (Lu, 2010) .....................................................................18 Table 3 Demographic Information of High Intermediate and Advanced Proficiency Students ...30 Table 4 Participants .......................................................................................................................35 Table 5 Cloze Test Scores .............................................................................................................36 Table 6 Keyboarding Skill Test Scores (Number of Total Characters Typed within 2 Minutes) .36 Table 7 Fluency Measures (Adapted from Van Waes & Leijten, 2015) .......................................39 Table 8 Coding Categories (Adapted from Révész et al., 2017) ...................................................41 Table 9 Linguistic Measures as Dependent Variables ...................................................................43 Table 10 Descriptive Statistics: Writing Fluency Behaviors and Linguistic Outcomes by Time Constraints, Proficiency, and Genres .............................................................................................47 Table 11 Repeated Measures MANOVA: Effects of Time Constraints and Proficiency on Writing Fluency Behaviors and Linguistic Outcomes within Genres ...........................................52 Table 12 MANOVA: Effects of Time Constraints and Proficiency on Linguistic Features .........58 Table 13 Descriptive Statistics: Writing Quality by Time Constraints, Proficiency, and Genres 66 Table 14 Two-Way ANOVA: Effects of Time Constraints and Proficiency on Writing Quality in Narrative Essays ............................................................................................................................67 Table 15 Two-Way ANOVA: Effects of Time Constraints and Proficiency on Writing Quality in Argumentative Essays ...................................................................................................................68 Table 16 Correlations: Fluency Measures with Total Writing Quality and Linguistic Complexity Measures in Narrative Essays (N = 123) .......................................................................................70 Table 17 Model Summary: Total Quality as Criterion Variable in Narrative Essays ...................72 Table 18 Coefficients: Total Quality as Criterion Variable in Narrative Essays ...........................72 Table 19 Correlations: Fluency Measures with Total Writing Quality and Linguistic Complexity Measures in Argumentative Essays (N = 123) ..............................................................................74 Table 20 Model Summary: Total Quality as Criterion Variable in Argumentative Essays ..........75 x Table 21 Coefficients: Total Quality as Criterion Variable in Argumentative Essays ..................76 Table 22 Pausing: Writing Processes, Text Examples, and Stimulated Recall Comments (Participant #7) ..............................................................................................................................78 Table 23 Revision: Writing Processes, Text Examples, and Stimulated Recall Comments (Participant #4) ..............................................................................................................................80 Table 24 Questionnaire Responses by Group: “How did you feel about writing narrative and argumentative essays? Is one type of essay writing more difficult than the other?” .....................88 Table 25 Questionnaire Responses by Group: “Do you think the time allotted was enough to write the essays (both genres)?” ....................................................................................................90 Table 26 Descriptive Statistics: Writing Difficulty Ratings in the Four Conditions ....................91 Table 27 Task Difficulty Ratings in the Four Task Conditions (One-Way ANOVA) ..................92 Table 28 Summary of Findings .....................................................................................................95 Table I-1 Number of comments for pausing in stimulated recalls (high intermediate short timed group) ...........................................................................................................................................131 Table I-2 Number of comments for revision in stimulated recalls (high intermediate short timed group) ...........................................................................................................................................132 Table I-3 Number of comments for pausing in stimulated recalls (high intermediate long timed group) ...........................................................................................................................................133 Table I-4 Number of comments for revision in stimulated recalls (high intermediate long timed group) ...........................................................................................................................................134 Table I-5 Number of comments for pausing in stimulated recalls (advanced short timed group) ......................................................................................................................................................135 Table I-6 Number of comments for revision in stimulated recalls (advanced short timed group) ......................................................................................................................................................136 Table I-7 Number of comments for pausing in stimulated recalls (advanced long timed group) ......................................................................................................................................................137 Table I-8 Number of comments for revision in stimulated recalls (advanced long timed group) ......................................................................................................................................................138 xi LIST OF FIGURES Figure 1. Complexity (Housen & Kuiken, 2009) ..........................................................................16 Figure 2. Inputlog 7.0: Screen capture ..........................................................................................39 Figure 3. Means of pauses between words in the two genres .......................................................51 Figure 4. Genre differences in MLC .............................................................................................53 Figure 5. Genre differences in CN/T .............................................................................................53 Figure 6. Genre differences in WL ................................................................................................54 Figure 7. Genre differences in WF ...............................................................................................54 Figure 8. Genre differences in Product: Words per minute ..........................................................55 Figure 9. Genre differences in P-burst length ...............................................................................55 Figure 10. Genre differences in the number of R-bursts ..............................................................56 Figure 11. Effects of time constraints on process: words per minute ...........................................59 Figure 12. Effects of time constraints on product: words per minute ...........................................59 Figure 13. Effects of time constraints on p-burst length ...............................................................60 Figure 14. Effects of time constraints on pause between words ...................................................60 Figure 15. Effects of time constraints on the number of R-bursts ................................................61 Figure 16. Effects of proficiency on MLS ....................................................................................61 Figure 17. Effects of proficiency on MLC ....................................................................................62 Figure 18. Effects of proficiency on CN/T ...................................................................................62 Figure 19. Effects of proficiency on VP/T ....................................................................................63 Figure 20. Effects of proficiency on product: words per minute ...................................................63 Figure 21. Effects of proficiency on process: words per minute ...................................................64 Figure 22. Total writing quality scores in the two time constraints and proficiency levels across the groups .......................................................................................................................................67 Figure 23. Comments about pausing from stimulated-recall sessions ..........................................81 xii Figure 24. Comments about revision from stimulated-recall sessions .........................................82 Figure 25. Comments about pausing in narratives.........................................................................84 Figure 26. Comments about pausing in argumentative essays .....................................................85 Figure 27. Comments about revision in narratives ........................................................................86 Figure 28. Comments about revision in argumentative essays ......................................................86 Figure 29. Writing processes during pauses between words .........................................................87 xiii CHAPTER 1. INTRODUCTION Fluency has been used as a measurement of second-language performance and second- language (L2) development. It has also been used to better understand how specific tasks or genres affect L2 performance. Although there are different definitions of fluency, the term is generally considered to describe the flow and smoothness of language production (Koponen & Riggenbach, 2000; Segalowitz, 2010). For instance, Lennon (1990) considered oral fluency to be a global ability and a temporal aspect of performance. Schmidt (1992) suggested that fluency in speech production is an automatic procedural skill that shows how well learners perform when doing a task in real time. In the assessment of writing fluency, the definition is narrowed down to specific, measurable characteristics. For example, traditionally, writing fluency is measured by the number of words and structures produced within a limited time, which is equivalent to temporal measures for oral language (Wolfe-Quintero, Inagaki, & Kim, 1998). These measures, however, may not reflect learners’ writing performance perfectly, in part because writing behaviors such as pausing and revision may affect learners’ time producing language (Abdel Latif, 2013; Kellogg, 1996). Accordingly, the traditional method of assessing writing fluency is controversial (e.g., Abdel Latif, 2013; Van Waes & Leijten, 2015). Abdel Latif (2013) questioned the validity of previous studies that used the traditional method of dividing the number of words written within a given time frame. One reason for this is that L2 learners may pause in different places and for different reasons when writing, so not all pauses should be considered equal. In short, as writing fluency is affected by different writing processes such as monitoring, a single measure of length 1 (i.e., the number of words) may not fully capture fluency (Abdel Latif, 2013). In order to validly measure writing fluency, recent researchers have explored new methods such as keystroke logging (e.g., de Smet, Leijten, & Van Waes, 2018; Révész, Kourtali, & Mazgutova, 2017). Keystroke logging is a useful tool for examining specific aspects of L2 writing. Concurrent and unobtrusive, it can record, for example, the length and timing of pauses. However, keystroke logging reveals only some writing behaviors, and so it alone cannot fully capture L2 writers’ processes or their internal cognition during writing. Therefore, other methods such as stimulated recall (e.g., Révész, Kourtali, & Mazgutova, 2017) and think alouds (e.g., Schrijver, Van Vaerenbergh, & Van Waes, 2012) should be used to complement keystroke logging, thus compensating for some of its shortcomings, especially its inability to take account of the writers’ thought processes (Geisler & Slattery, 2007, p. 197). One important issue in the measurement of fluency is related to whether or not the writing is produced in a timed setting because the construct of fluency is affected by the amount of writing time available. When time limits are used, the question of how to appropriately set them for various tasks arises. In addition, although timed writing is used in many instructional and testing settings, many scholars have suggested that writing under time pressure is unnatural (e.g., Cho, 2003; Weigle, 2002). For instance, Weigle (2002, p. 172) pointed out the limitations of the timed impromptu essays that are widely used in testing and research to indicate L2 learners’ development or production. She suggested that alternatives to short timed essays should be considered and that untimed essay writing gives L2 learners less anxiety and allows them time to generate ideas and to prepare to write about specific topics. While most previous studies investigating L2 writing have employed short timed writing tasks (e.g., 30-minute essays), the use of untimed or longer timed tasks is ecologically valid, and research findings based on such 2 tasks could be extended to real-life and instructional settings (Polio & Friedman, 2017; Polio & Lee, in press). Both timed and untimed essays are considered meaningful tasks, but few studies have investigated differences in how the different time constraints affect L2 learners’ fluency, writing behaviors, and linguistic outcomes. Fluency has also been used as a dependent variable in research that investigates task and genre effects. Researchers have employed a variety of theoretical frameworks to assess constructs of writing such as fluency and complexity while exploring the effects of different tasks and genres on L2 writing. For instance, Robinson’s (2001) cognition hypothesis and Skehan’s (1996) trade-off hypothesis both suggest that the linguistic complexity, accuracy, and fluency (CAF) of L2 learners’ production are influenced by tasks. In addition, some genre-based studies (e.g., Lu, 2011) have shown that genres affect learners’ production in terms of CAF. Although these previous studies used different frameworks, all of them have emphasized fluency as one of the constructs that help researchers find out how different types of writing affect L2 learners’ production. The primary goal of this dissertation is to delve into the interplay between time constraints, genre, proficiency, and linguistic outcomes. It will also contribute to previous research by examining writing fluency with different measures. To date, most previous studies that have examined how different aspects of writing tasks such as genre and time constraints affect L2 writers’ fluency have used only product-based measures (e.g., the total number of words produced in a given time). Only a few studies (e.g., Kellogg, 1990; Révész, Kourtali, & Mazgutova, 2017) have examined how different task types affect writing fluency behaviors and the underlying cognitive processes of writing; however, these studies have used only short-timed tasks and a single task type. To address this research gap, this study uses a range of diverse 3 writing fluency measures including process-based measures (e.g., P-bursts), and connects writing fluency behaviors to cognitive processes in two different types of writing. In addition, the study explores how different genres and time constraints affect the processes and products of L2 writers at different proficiency levels. The results of the study’s investigation of L2 writing fluency behaviors should provide theoretical and methodological implications for L2 writing research as well as L2 writing pedagogy and assessment. The remainder of this dissertation is organized as follows. Chapter 2 reviews the literature on the relationship of fluency to complexity and factors responsible for differences in writing fluency and linguistic outcomes to explain the theoretical background for the study. It also presents the study’s research questions. Chapter 3 describes the study’s methodology. Chapter 4 presents the results of the analysis of the data, and Chapter 5 discusses these results with regard to the research questions. Chapter 6 concludes the dissertation, pointing out limitations of this study and suggesting some directions for future research. 4 CHAPTER 2. LITERATURE REVIEW 2.1. Definitions of fluency and its relationship to other measures Fluency is often discussed along with other constructs of production such as complexity and accuracy. Many researchers have investigated second language learners’ production in terms of the three CAF constructs, and the three constructs in CAF are interwoven with each other (Foster & Skehan, 1996). The constructs have been used to measure distinct components of L2 performance that may be manifested by L2 learners under different task conditions (e.g., Housen & Kuiken, 2009; Housen, Kuiken, & Vedder, 2012). An underlying assumption of the three constructs is that L2 learners show development in the target language over time. In other words, proficient L2 learners tend to show more complex, accurate, and fluent writing than novice L2 learners. Another assumption is that the three constructs of CAF are influenced by writing task types. In most second language acquisition (SLA) studies, in addition to being used as indices of L2 development, the three constructs have been utilized to look for effects of pedagogical treatments and genre differences. In the following sections, the constructs of fluency, complexity, and accuracy, and how they have been used in L2 research, will be further discussed. 2.1.1. Fluency Since ways of measuring speaking fluency have influenced ways of measuring writing fluency, the latter are defined and operationalized in varied ways. For example, Wolfe-Quintero 5 et al. (1997) defined fluency as a number of words or structures included in writing within a limited time. On the other hand, Snellings, Van Gelderen, and De Glopper (2004) defined fluency as the speed of lexical retrieval in writing. Recently, in a study on process-based writing fluency, Van Waes and Leijten (2015) proposed a multidimensional fluency model. They argued that writing fluency includes production, process variation, revision, and pause behavior, and that these four components can distinguish fluent and less fluent writers. By using principal component analysis, they confirmed that the four components together contribute to the multidimensional fluency model. They suggested that various components of writing fluency be examined in experimental settings for comparison between groups or tasks. In short, fluency has been defined in many ways depending on multiple components within it, and the different definitions of writing fluency lead to the various measurements for assessing it. Given that writing fluency does involve multiple components, operationalizations of writing fluency differ. The usual measures include counting the number of production units produced in a given time. According to Wolfe-Quintero et al. (1998), fluency should be measured by the number of words or structural units that a writer can produce in a particular period of time, rather than by the sophistication of the vocabulary or structures produced. In other words, more fluent writers can produce more words and structures in a given time whereas less fluent writers can produce fewer words and structures in a given time. The most widely used measure is the number of words divided by writing time (e.g., Sasaki & Hirose, 1996). Some studies include quantity of writing (Sasaki, 2004) and words per T-unit (Larsen-Freeman, 2006), but Norris and Ortega (2009) suggested that words per T-unit (i.e., words per a main clause plus any clauses dependent on it) should be considered a complexity measure. As this brief summary of the research suggests, there remains some confusion regarding how best to measure writing 6 fluency. Abdel Latif (2013) pointed out the definitional confusion over writing fluency due to its multiple components, and raised a concern about product-based writing fluency assessment; that is, the practice of measuring fluency quantitatively in a finished product. In most previous studies, the researchers have counted words or calculated sentence length (Johnson, Mercado, & Acevedo, 2012) or composition rate (Sasaki, 2000). Few researchers have examined process- based measures such as pausing or length of translating episodes. To assess L2 learners’ pausing, computer-based methods such as keystroke logging can be used, helping researchers assess L2 learners’ real-time writing fluency (e.g., Leijten & Van Waes, 2006; Révész, Kourtali, & Mazgutova, 2017; Révész, Michel, & Lee, 2017, in press; Spelman Miller, Lindgren, & Sullivan, 2008; Van Hell, Verhoeven, & Van Beijsterveldt, 2008; Van Waes & Leijten, 2015). The computer-based methods have been used in both L1 and L2 writing research. The research that included the computer-based methods was involved in different research foci and languages (see Table 1). For instance, the software Inputlog (http://www.inputlog.net/) tracks writing activities by recording pauses, keystrokes, mouse action, and so on. The software can also calculate P- bursts, which are the units of text produced between pauses; that is, the number of typed characters between pauses. More fluent writers have fewer, longer P-bursts than less fluent writers (Chenoweth & Hayes, 2001; Van Waes & Leijten, 2015). These studies argue that both process-based and product-based measures should be considered in order to assess writing fluency accurately. 7 Table 1 Writing-Process Research Using Keystroke-Logging Techniques and Grouped by Research Study Alves et al. 2008 Baaijen et al. 2012 de Smet et al. 2018 Leijten & Van Waes 2013 Schrijver et al. 2012 Wengelin et al. 2009 Chukharev-Hudilainen 2014 Lindgren & Sullivan 2003 Spelman Miller 2005 Révész, Michel, & Lee 2017 Ranalli et al. 2018 Ranalli et al. 2019 Sullivan & Lindgren 2002 Kowal 2014 New 1999 Scott & New 1999 Chenoweth & Hayes 2003 Eklundh & Kollberg 2003 Eklundh 1994 Deane et al. 2018 Medimorec & Risko 2016 Medimorec & Risko 2017 de Smet et al. 2014 Leijten, Van Waes, & Ransdell 2010 Quinlan et al. 2012 Van Waes & Schellens 2003 Van Waes et al. 2010 Wallot & Grabowski 2013 Barkaoui 2015, 2016* Khuder & Harwood 2015 Révész, Kourtali, & Mazgutova 2017 Thorson 2000* Focus Research focus Writing fluency behaviors (e.g., pausing and revision) Language L1 Portuguese L1 Dutch L1 Swedish L1 Russian L2 English L2 Swedish L2 French L1 English L1 Dutch L1 German L2 English L2 German 8 Task type comparison (e.g., genre) Table 1 (cont’d) Proficiency Writing quality Learning style Thorson 2000* Stevenson et al. 2006 Van Waes & Leijten 2015 L1 English and L2 German L1 Dutch and L2 English L1 Dutch and foreign languages L1 and L2 English Spelman Miller 2000 Barkaoui 2015, 2016* L2 English Ganem-Gutierrez & Gilmore 2018* Spelman Miller et al. 2008* Xu 2018 Xu & Ding 2014 Almond et al. 2012 Deane 2014 Zhang & Deane 2015 Guo et al. 2018 Ganem-Gutierrez & Gilmore 2018* Spelman Miller et al. 2008* Révész, Michel, & Lee 2017* Van Waes, Van Weijen, & Leijten 2014 L1 English L2 English L1 Dutch Note. * indicates that the study falls in more than one category. To understand and assess writing fluency better, it is worthwhile to compare writing fluency with speaking fluency. One of the differences between writing and speaking fluency is related to processing (Abdel Latif, 2013). L2 learners’ production behaviors are different in tasks that are the same except for modality. Speaking is generally faster than writing, and L2 speech can be analyzed by the temporal fluency measures of pausing and speech rate because it needs to be produced in a given time. On the other hand, L2 learners’ fluency behaviors vary more in writing than in speaking; for example, some learners pause a lot when they are beginning to write, and then speed up, while others might do the opposite. These behaviors can be strategic or inconsistent, and pausing may support or hinder writing. Therefore, pausing while writing may not be a sign of dysfluency, unlike pausing while speaking. In addition, pausing at different 9 locations is often associated with planning or other writing processes (Schilperoord, 1996). According to Abdel Latif (2013), a valid measurement of writing fluency should take account of chunks or spans of text produced; that is, the “bursts” occurring between pauses (i.e., P-bursts). As mentioned above, writing fluency can be defined in different ways, which results in different measurements. Hence, including different measurements increases the validity of assessments of writing fluency. In addition, with the help of keystroke logging software, it is possible to measure how writers write and revise by examining the ratio of process and product. Fluency can also be used to show L2 development over time (e.g., Spelman Miller et al., 2008; Yoon & Polio, 2017). For example, by examining the number of words produced, Yoon and Polio (2017) did not find significant differences between genres but they did find a difference over time. As with many other studies (e.g., Knoch, Rouhsahd, & Storch, 2014; Godfrey, Treacy, & Tarone, 2014; Knoch, Roushad, Oon, & Storch, 2015), the L2 learners in their study showed a significant increase in fluency over the course of one semester but notably did not improve in terms of accuracy. Spelman Miller et al. (2008) investigated writing fluency in a longitudinal study in terms of bursts (typed characters between pauses and/or revisions), and they measured fluency during bursts (writing time between pauses and/or revisions). Although theirs was a small-scale study, they showed that fluency and the length of writing bursts both increased over time. 2.1.1.1. Fluency and writing processes Although both speaking and writing modes require productive skills, they differ crucially in processing time. Pausing and speech rate are key temporal elements in speaking, and they 10 affect the product’s comprehensibility, whereas in writing, pausing and writing rate vary depending on a variety of factors, and are not directly visible in the final product. As mentioned briefly above, writing cannot be accurately assessed by product-based measures and pausing alone; such measures do not tell us much about differences in how shorter or longer texts are produced depending on tasks or learner factors (Abdel Latif, 2013). In contrast, process-based measures such as P-bursts, as recorded by keystroke logging, can capture more information about the cognitive processes that L2 learners engage in while performing writing. Therefore, by employing a varied array of measures, it is possible to more accurately examine the construct of fluency in the writing mode. Nevertheless, keystroke logging cannot reveal L2 learners’ internal cognitive processes. Although it allows a glimpse of where and what learners write quickly and slowly, and how they revise and pause, it does not explain why they do so. Recently, Révész, Kourtali, and Mazgutova (2017) tried to triangulate keystroke logging with other methods. They conducted stimulated- recall sessions with four students to find out where they paused and revised. Their participants were advanced proficiency L2 English users, and the authors used English for the stimulated- recall sessions, but it is worth considering whether the L1 might be more useful to elicit rich data (Gass & Mackey, 2017). Several researchers have sought to identify and explain how cognitive processes are involved in writing processes (Flower & Hayes, 1981; Kellogg, 1996; Sasaki, 2000, 2004; Sasaki & Hirose, 1996). Assuming that writing is a complex process, Flower and Hayes (1981) and Kellogg (1996) proposed models of writing. Flower and Hayes broke down the writing process into nonlinear, interactive processes of planning, translating, and revising. For instance, when reading a passage one has written, one may notice and repair errors, or make changes while 11 planning the next step. Thus, writers can demonstrate pausing, deletion, insertion, and movement behaviors (Spelman Miller et al., 2008). The Flower and Hayes model considers task environment, cognitive processes involved in writing, and the writer’s long-term memory. The task environment includes external factors that influence writing tasks such as time constraints. The cognitive processes in writing involve planning, translating, and revision. The long-term memory stores knowledge of the genre, of the topic, and of the audience. Kellogg’s model also involves three processes, which he called formulating, executing, and monitoring. These labels suggest the interactive relationship between cognitive processes and linguistic encoding processes. Execution involves motoric skills such as handwriting and typing. Monitoring is done to check if the intended meaning has been delivered well. Formulation deals with planning ideas and translating them into linguistic expressions. Translating ideas into linguistic expressions includes subprocesses such as selecting lexical units and encoding syntactic structures. Kellogg suggested that the three processes are active simultaneously, and that the extent to which the three processes are achievable depends on learners’ working memory. More specifically, the central executive in working memory is responsible for the processes of formulating and monitoring, but not executing. This writing model also predicts advantages for both text quality and fluency when writing tasks place fewer demands on working memory (e.g., by including extra planning/outlining time), because the quality and fluency of writing depends on formulation and monitoring processes (Kellogg, 1990). These writing models do not explicitly relate task types to writing processes and production, but it is likely that L2 writing behaviors and fluency can be influenced by different genres or tasks (e.g., Hayes, 1996). When genres are not familiar or tasks are cognitively complex, it is possible that L2 learners may have difficulties due to limited working memory 12 (Kellogg, 1990, 1996; Révész, Kourtali, & Mazgutova, 2017). L2 learners may also feel pressured by limited writing time, which could force them to generate ideas from long-term memory. Such pressures can affect underlying cognitive processes such as translating and planning, resulting in slower processing. And slow processing in turn can lead to more pauses and revisions. With respect to the relationship between these writing processes and writing behaviors such as pausing and revising, some previous studies have explored alternative research methods for delving into L2 fluency (e.g., Lindgren & Sullivan, 2006; Stevenson, Schoonen, & Glopper, 2006). For instance, Thorson (2000) utilized keystroke logging to compare participants’ revision behaviors when writing in their L1 and in their L2, as well as when responding to two different genres. Recently, Révész, Kourtali, and Mazgutova (2017) adopted a process-oriented perspective on fluency to look for task effects. The study used pausing behaviors, total writing time divided by total number of words/characters excluding pauses (minutes per word and characters per word), the number of words/characters occurring between pauses (words per P- burst and characters per P-burst), and revision behaviors. They did not find task effects in terms of overall fluency but did find a task effect on pausing between sentences as well as on revision behaviors. They suggested that a more complex task (i.e., a task in which content was not provided) led to more extensive pausing at higher level discourse units such as sentences, and to more revisions below the word level. In addition to quantitative data, they collected qualitative data through stimulated-recall sessions to attempt to explain L2 learners’ cognitive writing processes. By including both traditional fluency measures and process-based measures, they were able to shed light on task effects that might not be captured with traditional measures of fluency alone. 13 L2 learners’ linguistic encoding processes also differ depending on their proficiency or development (e.g., Chenoweth & Hayes, 2001; Housen & Kuiken, 2009; Housen et al., 2012; Roca de Larios, Manchón, Murphy, & Marín, 2008; Wolfe-Quintero et al., 1997). In considering how and why they differ, researchers generally assume that L2 learners can write more fluently as they learn more of the target language and, therefore, more proficient learners are more fluent in their writing than less proficient learners. Nevertheless, it is also possible that more proficient writers with a reflective writing style can make longer pauses and look back more than less proficient writers while producing high-quality writing (Bereiter & Scardamalia, 2009). However, many writing tasks take place under time pressure or in a testing environment. This matters because, generally, proficiency affects the speed with which learners can retrieve language; therefore, some learners can write more in the same amount of time than other learners. An improved automatized process of retrieving language is one aspect of improved proficiency. 2.1.1.2. Fluency and writing quality Previous studies have provided evidence of the relationship between writing quality and writing behaviors including fluency (e.g., Barkaoui & Knouzi, 2018; Ganem-Gutierrez & Gilmore 2018; Porte, 1996; Révész, Kourtali, & Mazgutova, 2017; Spelman Miller et al., 2008; Stevenson et al., 2006). For instance, Stevenson et al. (2006) explored how Dutch high school students’ writing behaviors were related to the quality of the texts they produced. The students wrote four argumentative essays (two in their L1 and two in their L2 English) on computers as they did think-aloud. Four raters rated the essays on only two criteria: content and language use. The findings showed some relationship between text length and text quality, but no relationship 14 between writing quality and revision types, although the authors hypothesized that a type of low- level revision (i.e., at the word and clause level) may be related to writing quality. Although their study was important in showing the relationship between writing behaviors and writing quality, their use of scores on only content and language use may have affected their findings. In addition, Bowles (2010) suggested that thinking aloud during writing activities may hinder learners’ writing process, although Godfroid and Spino’s (2015) L2 reading research showed that thinking aloud may not be as problematic as Bowles indicated it would be. Spelman Miller et al.’s (2008) study examined a variety of factors in Swedish high school learners’ L2 writing quality. As mentioned above, Spelman Miller et al. showed that two fluency measures (bursts and fluency during bursts) strongly predicted text quality. However, they found no relationship between revision or pausing behaviors and text quality. Although their longitudinal study was insightful regarding L2 writing fluency, more research on this topic is worthwhile to gain a clearer understanding of what fluency measures are related to text quality. 2.1.2. Complexity As one of the CAF measures, fluency is related to complexity (Norris & Ortega, 2009). Oh (2006), for example, offered relevant empirical evidence for the relationship between complexity and fluency. She found that two fluency measures—namely, the number of T-units and the number of clauses—were positively correlated with complexity measures—namely, the number of words per T-unit and the number of words per clause, respectively (see also Qin & Uccelli, 2016). In other words, development of the L2 learners’ complexity leads to development in their fluency and vice versa. Given the mutual impacts of complexity and fluency on changes 15 in each other, L2 learners’ writing fluency should be explored together with complexity and its effects. According to Housen and Kuiken (2009), complexity usually refers to both task complexity and L2 complexity. L2 complexity can be divided into linguistic complexity and cognitive complexity (see Figure 1). Cognitive complexity may contribute to L2 learners’ attention or perception of difficulty; it is the subjective difficulty of processing language when L2 learners perform language tasks. Assessments of linguistic complexity tend to try to tap into L2 learners’ interlanguage system, which is commonly measured by the length, sophistication, and diversity of the language the learners produce. Researchers examine learners’ L2 complexity to try to understand how it is influenced by tasks or how it develops over time. Figure 1. Complexity (Housen & Kuiken, 2009) Previous studies have considered linguistic complexity in terms of syntactic complexity and lexical complexity (e.g., De Clercq & Housen, 2017; Housen, De Clercq, Kuiken, & Vedder, 2019; Norris & Ortega, 2009; Ortega, 2003). According to Norris and Ortega (2009), syntactic 16 complexity measures are often based on length, and calculated by dividing words by a chosen production unit such as the sentence. They suggested that syntactic complexity should be measured multidimensionally because L2 development cannot be explained by any single measure, and the construct of syntactic complexity is composed of several subconstructs. In other words, one syntactic complexity measure may not be enough to assess L2 learners’ development. For instance, Lu (2010, 2011) used 14 syntactic complexity measures to find genre and proficiency differences in his automated text analysis; the different measures shed light on the various characteristics of genre and proficiency (see Table 2). In a recent study, Kyle and Crossley (2017) compared students’ syntactic complexity and their verb argument construction to the quality of their essays. They found that both types of index were significant predictors of writing quality, although verb argument construction indices can explain a larger portion of variance in writing quality than can syntactic complexity indices. Lexical complexity can often be understood as lexical diversity, although there are many other constructs (Norris & Ortega, 2009; Pallotti, 2015). A written text containing more different vocabulary items can be deemed more complex than one with fewer. Several lexical complexity measures exist, and there is some debate over which are best (McCarthy & Jarvis, 2010). For example, the vocd-D index, a lexical diversity measure, has been considered a useful measure that is not affected by text length as it is based on a mathematically probabilistic model (Malvern, Richards, Chipere, & Durán, 2004), but McCarthy and Jarvis (2010) found that, in fact, it is swayed by text length. Because of such uncertainty, analyses should include several different measures of lexical complexity. Using a range of syntactic complexity and lexical complexity measures makes it possible to investigate writing multidimensionally and may provide a clearer analysis. 17 Table 2 Syntactic Complexity Measures (Lu, 2010) Measures Definition Type 1: Length of production unit Mean length of sentence (MLS) Mean length of T-unit (MLT) Number of words / Number of T- Number of words / Number of sentences Mean length of clause (MLC) Number of words / Number of clauses Type 2: Sentence complexity Type 3: Subordination Clauses per sentence (C/S) Number of clauses / Number of sentences T-unit complexity ratio (C/T) Number of clauses / Number of units Type 4: Coordination Type 5: Particular structures Complex T-unit ratio (CT/T) Number of complex T-units / T-units Dependent clause ratio (DC/C) Dependent clauses per T-unit (DC/T) Coordinate phrases per clause (CP/C) Coordinate phrases per T-unit (CP/T) Sentence coordination ratio (T/S) Complex nominals per clause (CN/C) Complex nominals per T-unit (CN/T) Verb phrases per T-unit (VP/T) Number of T-units Number of dependent clauses / Number of clauses Number of dependent clauses / Number of T-units Number of coordinate phrases / Number of clauses Number of coordinate phrases / Number of T-units Number of T-units / Number of sentences Number of complex nominals / Number of clauses Number of complex nominals / Number of T-units Number of verb phrases / Number of T-units The syntactic complexity and lexical complexity measures have been used to find task or genre differences because they show how well L2 writers deal with complex grammatical structures (e.g., Ellis & Yuan, 2004; Qin & Uccelli, 2016; Révész, Kourtali, & Mazgutova, 2017; Yoon & Polio, 2017). Yet there are mixed findings on the relationship between task complexity and linguistic complexity measures. Some researchers have found a positive relationship between them, but others have not. For example, Ellis and Yuan (2004) had three planning task conditions (pre-task planning, online planning, and no planning) in their experiment and found 18 that the L2 writers in the no planning condition (the most complex) produced less complex, accurate, and fluent writing than those in the other two conditions. They suggested that more complex tasks could elicit less complex language from L2 learners because the learners in no planning needed to formulate, execute and monitor their language under time pressure. Tavakoli (2014), on the other hand, found that storyline complexity did not affect written syntactic complexity. However, generally, in terms of genre, argumentative essays can elicit more complex language than narrative or descriptive essays (Biber & Conrad, 2009). The reason is that the communicative goals of argumentative essays require more complex structures and language than other genres of writing. Yoon and Polio (2017) found a strong genre difference in linguistic complexity, and suggested that argumentative essays can induce L2 learners as well as native speakers to produce more complex language than narrative essays. Yoon and Polio’s comparisons between L2 learners’ and native speakers’ writing led them to suggest that the more complex language in argumentative essays can be attributed to the communicative functions of the genre rather than the possible reasoning demands of the genre. As these studies show, task and genre differences may be detected by measuring linguistic complexity, including syntactic and lexical complexity. Previous studies have also used complexity measures to assess learners’ development (e.g., Alexopoulou, Michel, Murakami, & Detmar, 2017; Beers & Nagy, 2011). For instance, Lu (2011) compared syntactic complexity in writing across four grade levels within the same institutions. He found that the first two adjacent levels (levels 1 and 2) and two or three pairs of nonadjacent levels could be distinguished by three length of production measures: mean length of clause, mean length of sentence, and mean length of T-unit. Unfortunately, he considered school level as equivalent to proficiency; his analysis could have been clearer if he had 19 administered a proficiency test such as the Test of English as a Foreign Language internet-based test (TOEFL iBT; www.ets.org). In addition, a longitudinal study is needed to capture learners’ L2 development in terms of linguistic complexity. 2.2. Factors responsible for differences in writing fluency and linguistic outcomes: Time constraints, genre, and proficiency Previous research shows that writing fluency is influenced by many variables such as proficiency, types of task, and writing conditions (e.g., Révész, Kourtali, & Mazgutova, 2017). For this reason, fluency can only be assessed fully by considering a variety of factors such as writing topics, genres, and writing time. Investigating the relationships among the variables is also essential in order to provide empirical evidence to see how genres, time constraints, and proficiency play a role in writing fluency. In this section, the relationships between fluency and time constraints, genre, and proficiency will be discussed. 2.2.1. Time constraints According to Kellogg (1996), the time pressure to write rapidly can limit the central executive in terms of writing memory. Thus, increased time pressure inhibits smooth and responsive writing behavior; consequently, the writer may end up prioritizing formulation (i.e., planning and translating) over execution and monitoring. In other words, the amount of allowed time for a task can make a difference in the extent to which L2 learners stay at the formulation stage. This, in turn, may result in different lengths of pauses during writing, and consequently, 20 different writing fluency behaviors. In this regard, L2 learners’ writing fluency should be investigated while taking into consideration time constraints. In developing writing assignments or writing tests for L2 learners, time allocation is an issue in terms of outcome and process, including fluency (Caudery, 1990; Cho, 2003; Elder, Knoch, & Zhang, 2009; Knoch & Elder, 2010; Kroll, 1990; Lu, 2011; Polio & Glew, 1996; Powers & Fowles, 1996; Weigle, 2002). In her review article, Weigle (2002, p. 63) divided the dimension of time allowance into three sets (less than 30 minutes, 30–59 minutes, and 60–120 minutes). Wu and Erlam (2016) operationalized their study’s timed condition by allowing 70% of the time the learner used on the untimed condition to examine the effect of time constraints on complexity, accuracy, fluency, and quality. Their findings showed that the learners produced more words in the untimed than in the timed condition. However, Elder, Knoch, and Zhang (2009) compared 30-minutes (short-timed) and 55-minutes (long-timed) writing tasks and did not find significant differences in terms of fluency ratings between them. In short, due to the differential operationalization of time constraints, previous studies have reported inconsistent findings across different time conditions in terms of L2 learners’ performance. In addition to comparing fluency, examining different linguistic features in L2 writing taps into other aspects of time-constraint effects. Some learners may benefit more from one or the other condition than other learners do. Younkin (1986), for example, compared native and nonnative English speakers’ essays written in three different time conditions (no extra time, 10 minutes extra, and 20 minutes extra). He found that both the native and nonnative English groups benefited from the two extra time conditions. However, the essay test was part of a larger test, and thus it was hard to know how much time individual learners used for the essays. Ӓdel (2008) compared timed and untimed essays in corpora and argued that time can influence the proportion 21 of certain linguistic features such as first person singular pronouns. In a testing setting, Hale (1992) compared a test of written English in 30-minute and 45-minute conditions, and suggested that time allocation did not change performance on various test constructs although scores were higher in the longer condition. Powers and Fowles (1996) also compared graduate students’ GRE writing in 40-minute and 60-minute conditions. Although the graduate students preferred and received a better score on untimed essays, the scores were not related to time allocation because scores under both conditions correlated similarly to nontest indicators of writing ability such as the students’ reported success in various writing activities in college classes. Using corpus data, Lu (2011) compared timed and untimed argumentative essays in terms of seven syntactic complexity measures. The untimed essays elicited more syntactically complex language than the timed essays; however, Lu did not report how the corpus data he used operationalized timing conditions. More recently, Knoch and Elder (2010) found that test takers’ scores were similar in two time conditions (55 minutes and 30 minutes), but they suggested that high proficiency learners benefited more from the extended time condition than did low proficiency learners, though they did not find significant differences in terms of quality. In short, the operationalization of time constraint conditions is different across the previous studies, and the effects of time constraint conditions remain inconclusive. Only a few studies have delved into how different time constraints affect writers’ production in relation to genres or tasks. For instance, Caudery (1990) compared two topics in time-restricted (timed, 40 minutes) and no-time-restricted (untimed, 1 hour) conditions. He did not find significant score differences between timed and untimed conditions. However, he compared only 12 students and did not report their proficiency levels. In addition, he claimed that eight of the students had written more slowly in the untimed condition, but he did not 22 measure their writing fluency accurately. More research is crucial to better understand whether and how the interplay of genres and time constraints affects L2 learners’ writing, particularly their writing fluency or writing process. 2.2.2. Genre Fluency is an important construct for understanding the effects of different genres on L2 learners’ writing (e.g., Yang, 2014; Yang, Lu, & Weigle, 2015). Several studies have demonstrated that L2 learners’ writing production can vary depending on genre (e.g., Jeong, 2017; Lu, 2011; Qin & Uccelli, 2016). With regard to writing processes, as the skill to deal with a certain genre increases, the effort needed to collect, plan, translate, and review decreases (Kellogg, 1994, p. 64). During writing, planning ideas, linguistically translating ideas or generating sentences, and reviewing ideas and text are all effortful; however, the pattern of differences among these processes varies with the task (Kellogg, 2001). Based on the assumptions of the writing models discussed in Section 2.2 (Flower & Hayes, 1980; Kellogg, 1996), previous researchers have found genre effects on L1 language processing and production, as shown by measures such as pause length (Beauvais, Olive, & Passerault, 2011; Medimorec & Risko, 2017; Van Hell et al., 2008) and text length (Beers & Nagy, 2011). In second language pedagogy and research, genre is also important for L2 writing theory and assessment regarding whether different genres elicit different processes and productions, such as more or less fluency from L2 learners. Although many researchers have explored the effects of topic on L2 writing, only a few have specifically investigated the effects of genre on L2 learners’ fluency (e.g., Ruiz-Funes, 23 2014, 2015; Thorson, 2000; Yang, 2014; Yoon & Polio, 2017). For example, Yoon and Polio (2017) examined fluency in narratives and argumentative essays as measured by total number of words produced in 30 minutes, but did not find a significant difference between the two genres, although the ESL learners in their study produced more words in narratives than in argumentative essays. Way, Joiner, and Seaman (2000) compared L2 French learners’ 30-minute writing in three different genres. They found that the learners’ narrative essays and expository essays were shorter than their descriptive essays. In addition to its effects on fluency, genre plays an important role in aspects of L2 learners’ written production (e.g., Lu, 2011; Qin & Uccelli, 2016; Way et al., 2000; Yoon, 2017). Examining different measures in addition to fluency is necessary to explain the multiple aspects of genre effects. As briefly mentioned above, Yoon and Polio (2017) found increased linguistic complexity (length of unit, coordination, particular structures, and lexical complexity) in argumentative essays, when compared to narratives; however, they did not find significant genre effect on fluency. On the other hand, Qin and Uccelli (2016) found more complexity and fluency in Chinese EFL learners’ argumentative essays than in narratives in terms of number of words, lexical complexity, and number of words per clause. They also examined whether linguistic complexity features and fluency in argumentative essays and narratives were related to writing quality. The authors found that lexical complexity, syntactic complexity, and fluency were correlated to the quality of the argumentative essays and narratives. For both the genres, text length was found to be a strong predictor of quality. Although their use of holistic scores for writing quality allowed them to offer only a limited explanation of the relationship between the fluency measure and quality, their findings suggested that the L2 learners seemed to use different linguistic and discourse features to meet each genre’s communicative purposes. Based on this 24 empirical evidence of L2 learners’ use of complex and fluent language and the relationship between fluency measure and writing quality in the two genres, these researchers suggested that L2 learners use linguistic resources differently to fulfill different communicative purposes and the functions of different genres (Biber & Conrad 2009; Biber, Gray, & Poonpon, 2011). 2.2.3. Proficiency Fluency is used as an indicator of L2 proficiency as well as L2 development (e.g., Chenoweth & Hayes, 2001; Lambert & Kormos, 2014; Larsen-Freeman, 2006). Foreign language fluency is connected to general proficiency and metalinguistic knowledge (Kowal, 2014; Wolfe-Quintero et al., 1998). As learners’ proficiency develops, they gain greater ability to monitor their language and pay attention to form while writing. In other words, L2 learners’ writing is affected by the amount of attention they have available for higher level processing such as planning, generating ideas, or organizing content (Chenoweth & Hayes, 2001; Dekeyser, 2005). Researchers have explored the relationship between proficiency and fluency. For instance, Sasaki (2004) examined 11 participants’ proficiency over three and a half years (including study abroad experience) and found improvement in their fluency as measured by mean total number of words and mean number of words per minute in their production. Taking a case study approach, Thorson (2000) compared L1 and L2 essays and two different genres of writing (articles and letters). The participants revised proportionally more when they wrote in L2 German than when they wrote in their L1 English. However, no clear genre effects were found in their revision behaviors. Way et al. (2000) compared French level 1 and level 2 students’ writing 25 and found that level 2 learners wrote more fluently than level 1 learners when fluency was measured by the number of words produced. In language testing, Barkaoui (2016) compared low proficiency and high proficiency learners’ revision behaviors related to fluency. The study found that low proficiency learners made significantly more revisions than high proficiency learners because the high proficiency learners did not need to revise as often even though they wrote more than the low proficiency learners. Van Waes and Leijten (2015) examined participants’ L1 (Dutch) and L2 (English, French, Spanish, or German) expository essay writing in terms of product-based and process-based fluency. They found that writing fluency significantly differed between the L1 and the L2. For example, the participants needed less pausing time between words when they wrote in the L1 than when they wrote in the L2. Although the participants’ L2s were different, and the study does not attempt to explain the differences in pausing behavior, the study demonstrated that L1 writing fluency and L2 writing fluency differ in terms of different fluency measures. Previous studies suggest that proficiency and genre together play an important role in L2 learners’ linguistic outcomes, including fluency (e.g., Jeong, 2017; Qin & Uccelli, 2016; Ruiz- Funes, 2014, 2015). For instance, Ruiz-Funes (2015) examined intermediate and advanced learners’ essays and found an interaction effect between genre and proficiency. Advanced learners wrote argumentative essays and contrast-compare essays while intermediate learners wrote expository essays and narratives. Ruiz-Funes suggested that argumentative and expository essays were more difficult than the other genres for each proficiency group. She found different patterns between the two groups. The advanced students were able to produce writing of similar complexity, accuracy, and fluency in both genres; however, the intermediate students showed less complex and accurate language in expository genres than in narratives. If a genre is too 26 difficult for a certain proficiency group, their use of language can be limited (e.g., less sophisticated vocabulary or less complex structures) because the difficulty they experience may overburden their working memory and increase the time they spend on revising or reviewing (Hayes, 2012; Kellogg, 1996). In Ruiz-Funes’s study, possibly, high proficiency allowed the advanced learners to easily access the genre knowledge in their long-term memory, without overloading their working memory. Jeong (2017) also investigated genre effects and their interaction with proficiency (novice, intermediate, and advanced) in writing performance. She did not find significant differences between the two genres she tested, but she did find a significant interaction between genre and proficiency: Novice learners received higher scores on narratives than on expository essays, whereas advanced learners obtained higher scores on expository essays than on narratives. These studies’ findings indicate the necessity of including proficiency in attempts to explain genre effects on the multifaceted aspects of writing. 2.3. Research questions As discussed above, moving beyond a singular focus on assessing writing outcomes (e.g., the length of writing), this study investigates L2 learners’ writing fluency-related behaviors and the cognitive processes behind them by exploring the effects of time constraints, genre, and proficiency. Drawing on Kellogg’s (1996) model of writing, this study adopts a mixed-methods design and uses (a) keystroke logging to capture writing behaviors such as fluency, pausing, and revision, and (b) stimulated recall to reveal cognitive processes used by L2 learners (e.g., Révész, Kourtali, & Mazgutova, 2017; Van Waes & Leijten, 2013). Most previous research regarding writing fluency behaviors and linguistic outcomes has 27 not yet touched upon the differences resulting from different time constraints. Rather, it has been conducted within short timed setting, possibly due to practicality and convenience, and inconsistent operations of time constraints in timed writing have been utilized. However, given that writing in extended time settings is widely preferable under certain circumstances such as classroom settings, extended timed writing should be investigated to increase ecological validity and reflect L2 writing in reality. Although this study cannot give the participants unlimited time for logistical reasons, it employs two timed conditions alternatively to remove the limitations of a time constraint as well as simulate untimed conditions. Based on Weigle’s (2002) dimensions of time allowance, one condition gave 30 minutes, which has been widely used as a short time constraint for 300-word writing. The other condition doubled the time in which participants could complete the task. Previous studies have indicated the impact of different genres and proficiency on L2 writing. In addition to the outcomes themselves, however, L2 learners’ writing fluency behaviors underlying the writing process can provide further understanding of L2 writing. In particular, the extent to which genres and proficiency have impact on L2 writings may be different across linguistic outcomes and writing fluency behavior. Nevertheless, whether the observable traces of a person’s cognitive activities such as pausing are due to differences in the demands of genres (e.g., Kellogg, 1990; Thorson, 2000), and the extent to which L2 learners show different writing processes depending on their L2 proficiency, have rarely been tested. To address this gap in research, this study delved into different L2 proficient learners’ writing fluency behaviors and linguistic outcomes in different genres under different time constraints. The study investigates the interwoven impact of time constraints, genre, and proficiency on L2 learners’ writing fluency behavior to improve our understanding of L2 writing fluency. 28 Given the correlation between fluency and linguistic complexity and the impact of linguistic complexity on fluency, this study also explores linguistic complexity and writing quality, which ultimately provide relevant evidence for understanding L2 writing fluency. Moreover, by using keystroke logging software to explore these variables, the study employs a relatively innovative approach to assessing fluency. The study addresses four specific research questions: 1. To what extent do proficiency and time constraints affect writing fluency behaviors and linguistic outcomes of L2 writers’ writing in two genres? 2. As evidenced by the stimulated recall data, to what extent do proficiency and time constraints affect L2 writers’ writing process in the two genres? 3. How do L2 proficiency and time constraints affect writing quality in two essay genres? 4. Which fluency measures are related to writing quality and linguistic complexity, and to what extent? 5. How do L2 writers perceive the effects of time constraints and genre on their writing? 29 CHAPTER 3. METHOD 3.1. Participants The participants of the study were 128 EFL students (Age: M = 22.75, SD = 2.31; 38 males and 90 females) studying at a private university in Seoul, Republic of Korea, who all spoke Korean as a first language. The participants were selected according to three main criteria. First, they must have learned English as a second language. While they may have visited or resided in English-dominant countries, for instance in study-abroad programs, they must have learned English in instructional settings. 81 had never resided in an English-dominant country, while 47 had (M = 6 months, SD = 15.13 months). Second, they must have completed a required English for Academic Purposes class at their university. Third, they must have achieved high- intermediate or advanced proficiency according to standardized tests such as TOEFL or IELTS taken two years or less before the time of data collection. They received $25 for their participation, and the five who wrote the best essays, based on the essay scores, received additional compensation. According to their standardized test scores, the participants were divided into intermediate (62 participants) and advanced (66 participants) groups.1 However, because five participants’ keystroke logging files had corruption errors, only 123 participants’ data were included in the analysis, 60 in the intermediate group, and 63 in the advanced group (see Table 3). 1 The high-intermediate participants had TOEFL scores of 72–94, TOEIC scores of 785–940, or IELTS scores of 5.5–6.5, each of which are equivalent, according to an ETS equivalency table, to Level B2 in the Common European Framework of Reference (CEFR) levels. The advanced participants had TOEFL scores of 95 or above, TOEIC Scores of 945 or above, or IELTS scores of 7 or above, which are equivalent to Level C1 (Papageorgiou, Tannenbaum, Bridgeman, & Cho, 2015). 30 Table 3. Demographic Information of High Intermediate and Advanced Proficiency Students High intermediate (N = 60) Advanced (N= 63) Age, Mean (SD) 23.03 (SD = 2.05) 22.54 (SD = 2.54) Gender Male Female 16 44 20 43 Length of residence in 3.07 months (SD = 6.73) 8.86 months (SD = 20) English speaking countries, Mean (SD) 3.2. Materials A narrative writing prompt and an argumentative writing prompt were used to investigate genre effects on the participants’ writing. In order to minimize potential topic effects, the topics were controlled by using the prompts on the same theme, learning a foreign language. They came from Yoon (2017; see Appendix A). In order to ensure intergroup comparability in terms of proficiency levels, a cloze test was administered to measure the L2 learners’ English proficiency at the time of data collection (Appendix B). The cloze test was used because it is considered to be a valid measure of global proficiency when the focus of research is related to literacy skills (Wu & Ortega, 2013). The test was composed of 50 items, and the L2 learners were asked to finish it within 25 minutes. The cloze test was scored by the acceptable answer scoring method, which considers all contextually acceptable answers as correct answers and, consequently, increases test reliability. Correct 31 answers received one point; thus, the scores could range from 0 to 50. The results of the cloze test were found to be reliable (Cronbach’s α = .79), suggesting its consistency in distinguishing the participants. A timed key-boarding skill test (Appendix C) was used to ensure the comparability of the groups in terms of typing speed, which might affect their fluency in writing (Barkaoui, 2016). The participants were asked to copy a sentence as many times as they could in two minutes. By calculating their typing speed measured by the number of total characters typed, the study was able to control for typing speed when assessing writing fluency. The typing speed in each group was compared to ensure intergroup comparability. Two questionnaires were used. First, the Language Experience and Proficiency Questionnaire (Appendix D) developed by Marian, Blumenfeld, and Kaushanskaya (2007) was used to collect the participants’ biographic information including age, sex, length of residence in English-dominant countries, and standardized English test scores. In addition, an exit questionnaire (Appendix E) adapted from one employed by Yoon (2017) was used to ask the participants’ perceptions of time constraints and genres. The questionnaire was composed of two open-ended questions and eight items to be rated on a nine-point Likert scale. In order to measure the quality of the participants’ argumentative essays, the analytic rubric provided in Connor-Linton and Polio (2014) was used (Appendix G). Because this analytic rubric can provide detailed information on various aspects of L2 writers’ performance, it is preferable to a holistic rubric (Weigle, 2002). The rubric is an adapted version of the ESL composition profile (Jacobs, Zinkgrap, Wormuth, Hartfiel, & Hughey, 1981) that is most widely used, and the full score is 90 points. It consists of five subscales (content, organization, vocabulary, language use, and mechanics); the full score of each of the first four subscales is 20 32 points, and the full score of the mechanics subscale is 10 points. The rubric was designed for assessing argumentative essays, but the current study also required an analytic rubric for narrative essays. I therefore revised the rubric to make it applicable to narratives (Appendix H). Following Polio and Lim (under review), two expert raters were given three narratives on the same topic and told to rank the essays in terms of quality while talking about their rankings. Both raters were doctoral students in second language studies who had taught ESL and EFL students and rated essays when working at an English language center. One rater was an experienced English teacher and the other an IELTS certified examiner. I asked the raters to rank the essays only by quality, and I audiorecorded their descriptions. Both gave the same ranks of ratings to the three essays, and described the quality of the narratives. After they rated and discussed the quality of the narratives, I gave them the analytic rubric for argumentative essays from Connor-Linton and Polio (2014) and asked them to rate the narratives based on the rubric. They discussed some difficulties of rating narratives with the argumentative essay rubric and suggested possible ways to adapt it for narratives. Based on their discussion, I revised the rubric. The validity of the revised rubric was then confirmed by an L2 writing expert, a professor who has conducted research on L2 writing over 25 years at a university in the United States. 3.3. Procedures The experimental design was mixed, with one within-subject and two between-subject factors. The independent variables were genre (within-subject), timing conditions (between- subject), and proficiency (between-subject). The dependent variables were syntactic complexity, 33 fluency behaviors, and writing quality. I met with each participant individually in a conference room at their school on two separate days. Each day, the participants were asked to write one 300-word essay on a computer. The participants were not allowed to use reference materials or other resources to complete the essays. Their writing was recorded by Inputlog 7.0 (Leijten & Van Waes, 2013), a keystroke logging program. Half of the participants were assigned randomly to the shorter time group and half to the longer time group. They were given 30 minutes in the short-timed condition and 60 minutes—double the time to mimic an untimed condition—in the long-timed condition. Giving the students unlimited time was impossible for logistical reasons; doubling the time was an attempt to remove the limitations of a time constraint. Each participant was randomly assigned either the narrative or the argumentative essay prompt on the first day and the other on the second day, in order to counterbalance the order of the genres. To minimize testing effects from a repeated design, the participants were asked to schedule the second day of the experiment at least a week after the first day. On the first day, they completed the cloze test and the background questionnaire right after they finished their writing (either narrative or argumentative). On the second day, they completed the timed key- boarding skill test and the exit questionnaire after finishing their writing (either narrative or argumentative). A total of 16 participants (eight each day, with one from each proficiency group, in each time condition, and after writing in each genre type; see Table 4) were randomly selected for stimulated-recall sessions in order to triangulate the data. Stimulated recall is useful for understanding the participants’ thoughts on their writing process. Previous research has used stimulated-recall protocols to better understand the process of writing in terms of what participants pay attention to, the difficulties they encounter, and the online behaviors they show 34 (Barkaoui, 2015; Lindgren, 2005). This study’s stimulated-recall protocols followed those suggested by Gass and Mackey (2017) and Barkaoui (2015). The stimulated-recall session took approximately an hour, and the selected participants completed the session right after they finished their writing of the day, before completing the other tests. The participant and the researcher watched the screen recording generated by Camtasia together; the participant was told to pause at any time to comment. The researcher also stopped the recording whenever the participant paused or revised. If the participants could not recall their writing behaviors, further questions were not asked (Appendix F). To elicit rich data, the stimulated-recall sessions were conducted in their L1, Korean. Table 4 Participants English proficiency High- Timing constraints Short-timed Genres (two different days) Narrative intermediate (N =30) Argumentative (N = 60) Advanced (N = 63) Long-timed (N = 30) Short-timed Narrative (N = 33) Argumentative Stimulated recalls Two participants at two different proficiency levels conducted the sessions after they finished 30 min/60 min narratives and argumentative essays Long-timed (N =30) 35 Table 5 Cloze Test Scores Conditions Short-timed (30 minutes) Long-timed (60 minutes) M (SD) 30.03 (4.73) 37.00 (4.76) 95% CI 28.27, 31.80 35.31, 38.69 M (SD) 28.07 (5.50) 36.77 (5.36) 95% CI 26.01, 30.12 34.76, 38.77 High-intermediate Advanced Note. Total score is 50. Table 6 Keyboarding Skill Test Scores (Number of Total Characters Typed within 2 Minutes) Conditions Short-timed (30 minutes) Long-timed (60 minutes) High-intermediate Advanced M (SD) 509.17 (111.02) 621.48 (78.92) 95% CI 467.71, 550.62 593.50, 649.47 M (SD) 493.43 (130.90) 579.57 (93.14) 95% CI 444.56, 542.31 544.79, 614.35 3.4. Scoring Table 5 presents the descriptive statistics of the groups’ scores. To ensure group comparability, independent samples t-tests were performed. A statistical difference between high-intermediate and advanced proficiency levels was found (t(121) = –8.52, p < .001, 95% CI = [–9.66, –6.01]). For the short-timed and long-timed group comparisons, no statistical differences were found within the high-intermediate proficiency group (t(58) = 1.49, p = .14, 95% CI = [–.69, 4.62]) or the advanced proficiency group (t(61) = .18, p = .90, 95% CI = [–2.32, 36 2.78]). For the key boarding test, independent samples t-tests were performed to find the comparability of the groups (see Table 6). The results showed a significant difference between proficiency levels (t(107) = –5.24, p < .001, 95% CI = [–138.08, –62.36]), but no significant difference between time constraint conditions (t(121) = 1.51, p = .13, 95% CI = [–9.76, 72.55]). For the short-timed and long-timed group comparisons, no significant differences were found within the high-intermediate group (t(58) = .502, p = .62, 95% CI = [–46.99, 78.46]) or the advanced group (t(61) = 1.93, p = .06, 95% CI = [–1.45, 85.29]). Therefore, the keyboarding skills that may affect writing fluency behaviors differed between the two proficiency levels but were similar in the two time constraint groups. Two native English speakers, who were expert raters and had taught ESL and EFL students, rated the essays based on the rubrics. Both raters were instructors at an English language center at a university and were studying towards their master’s degrees in TESOL. The raters were trained in a two-hour norming session where they rated sample narratives and argumentative essays that were not part of this study and discussed their scoring. If a discrepancy in any subscale was greater than two points, the raters resolved the discrepancy through discussion. After the norming session, the raters independently rated all of the essays, and the average scores obtained from the two raters were used for the analysis. If some essays received discrepant scores (subscale scores differing by three or more), a third rater rated the essays, and the two closer scores were utilized to find average scores. Because the prompts and rubrics were different for the two genres, interrater reliability was calculated by genre. The interrater reliability of the total scores for the narratives was r = .81 (content: r = .74, organization: r = .71, vocabulary: r = .70, language use: r = .74, and mechanics: r = .77). The interrater reliability of 37 the total scores for the argumentative essays was r = .85 (content: r = .75, organization: r = .76, vocabulary: r = .75, language use: r = .78, and mechanics: r = .82). According to Brown, Glasswell, and Harland (2004), reliability of 0.70 is a benchmark for structured rubrics, and thus the interrater reliability for both the narrative and argumentative essays is within an acceptable range. 3.5. Analysis To analyze the syntactic complexity of the participants’ written texts, the 14 syntactic complexity measures in Lu’s (2010) syntactic complexity analyzer were used. Based on previous studies (Lu, 2011; Yoon & Polio, 2017), some inaccurate measures for development and genre effects such as clauses per sentence (C/S), complex T-unit ratio (CT/T), and sentence coordination ratio (T/S) were excluded. For lexical complexity, the D index and the lexical sophistication measure (the logarithm of word frequency for all words and average length of word) were calculated by using Coh-Metrix (McNamara, Graesser, McCarthy, & Cai, 2014). From among the measures of the frequency of all words, the logarithm of word frequency for all words (WF) and average word length (WL) were selected in order to prevent rare words from creating a limiting factor. In interpreting WF, a lower value means less frequent words and a higher value means more frequent words. Following Yu (2010), spelling mistakes were corrected before running the syntactic complexity analyzer and Coh-Metrix. 38 Table 7 Fluency Measures (Adapted from Van Waes & Leijten, 2015) Measures Process Number of words produced, including deleted words Definitions Product Number of words produced in the final text Ratio of process Proportion between process and product measures and product P-burst A string of actions delimited by an initial pause and end pause exceeding the defined pause threshold (2000 ms). R-burst Language bursts that were bounded by a revision. Figure 2. Inputlog 7.0: Screen capture To analyze fluency, the data recorded by Inputlog 7.0 was used (see Figure 2). Following Van Waes and Leijten (2015), several measures were calculated: in the writing product, words per minute; in the writing process, words per minute, number of P-bursts, mean typed characters 39 in P-bursts (P-burst length), number of pauses within words, number of pauses between words/sentences/paragraphs, number of R-bursts, mean typed characters in R-bursts (R-burst length), and the ratio of process and product (proportion between product and process measures). The number of characters per minute in the writing process includes the number of characters that the learners deleted in the writing process whereas the number of characters per minute in the writing product only considers the number of characters in the final product. The number of pauses within words is related to the efficiency of typing, word finding and spelling behaviors (Torrance & Galbraith, 2006). The number of pauses between words is usually caused by lexical retrieval and editing process whereas the number of pauses between clauses can include planning processes (Wengelin, 2006). The number of pauses between sentences or paragraphs is likely to be associated with planning processes (Wengelin, 2006). This fluency analysis identified bursts, which are sequences of keystrokes without long pauses. Thus, a burst is a chunk of words that is bounded by breaks in written production. Bursts are therefore a useful measurement to show efficiency in writing (Chenoweth & Hayes, 2003). According to Chenoweth and Hayes (2003), P-bursts are defined as the bursts bounded by pausing followed by continued written production. R-bursts are defined as the bursts bounded by revision of the language produced during the burst. More fluent writers can take fewer pauses than less fluent writers. Consequently, more fluent writers may show a lower number of P-bursts than less fluent writers. In addition, more fluent writers can write more words between pauses and show longer lengths of P-bursts than less fluent writers. Following the previous studies, the threshold for pauses was set to 2000 milliseconds (Spelman Miller et al., 2008; Van Waes & Leijten, 2015); in other words, only pauses over 2000 milliseconds were counted. Table 7 provides explanations on the fluency measures. 40 Table 8 Coding Categories (Adapted from Révész et al., 2017) Process/Subprocess Example comments (English translation) I was thinking about two things. The first one was the process of learning English when I lived in the States. The other one was the process of learning Chinese in high school, when I did not acquire much because I was too old to learn quickly. I was thinking what to say here. I was thinking about the whole structure of this writing. How can I connect this paragraph to the next one? How can I connect this sentence and paragraph to the whole writing? I was thinking these things. Because the next sentence is a fact that is hard to generalize, I was thinking about using different and more sophisticated words instead of saying “act positively.” I was writing this part, “translation services currently provided by.” I stopped and asked myself if the verb provide should take an object. I just wrote “currently being provided by” and ended the sentence. But I felt that I was wrong. The verb, provide needs a provider. I was thinking about whether provide needs an object. As I was looking at this word, therefore does not fit in here. It might be better to use because in order to change this sentence. I found this part awkward. I wanted to make this sentence more natural. I was looking at this sentence. From now on, I was skimming from the beginning and correcting some mistakes. Planning Content Organization Translation Lexical retrieval Syntactic encoding Cohesion Unspecified Monitoring 3.5.1. Qualitative analysis With respect to the stimulated-recall data, following Kellogg’s (1996) model and Révész et al. (2017), the participants’ comments were transcribed verbatim and coded into three categories - planning, translation, and monitoring – as shown in Table 8 - using MAXQDA. With 41 respect to pausing and revision comments, following Stevenson et al. (2006), their comments were counted by pause location and type of revision, as determined by watching the video generated by Camtasia. Their comments about pausing and revision were calculated in terms of pause location and types of revision. Three participants’ comments (about 18 percent of the data) were double-coded to check intercoder agreement reliability (95%), and any discrepancy was resolved through discussion. 3.5.2. Statistical analysis In order to address the research questions, the complexity indices, fluency indices, and writing quality of the participants’ essays were analyzed. SPSS 25 was used to determine whether there were statistical differences between the genres, the timing conditions, and the proficiency levels in terms of complexity, fluency, and writing quality. With the explore function in SPSS, descriptive statistics and 95% confidence intervals were obtained. Multicollinearity was controlled between measures (r > .90). If the two measures were multicollinear, only one of the measures was included in the analysis. Because of multicollinearity, some syntactic complexity measures (mean length of T-unit, clauses per T-unit, dependent clauses per T-unit, coordinate clauses per T-unit, complex nominals per clause) were excluded. Table 9 summarizes the measures included in the analysis. 42 Table 9 Linguistic Measures as Dependent Variables Complexity and fluency measures Length of production Mean length of sentence (MLS) Mean length of clause (MLC) Subordination Dependent clause ratio (DC/C) Coordination Coordinate phrases per clause (CP/C) Particular structure Complex nominals per T-unit (CN/T) Verb phrases per T-unit (VP/T) Lexical sophistication The logarithm of word frequency for all words (WF) Mean length of word (WL) Lexical diversity D Fluency Pausing Process: Words per minute Product: Words per minute The number of P-bursts The mean typed characters per P-burst (P-burst length) The number of pauses within words The number of pauses between words The number of pauses between sentences The number of pauses between paragraphs Revision The ratio of process and product The number of R-bursts The mean typed characters per R-burst (R-burst length) 43 The independent variables were genre (within-subject), timing conditions (between- subject), and proficiency (between-subject). The dependent variables were linguistic complexity, fluency behaviors, and writing quality. Regarding the first research question, in order to examine the effect of genres, timing conditions, and proficiency levels on the dependent variables (complexity and fluency), a repeated-measures multivariate analysis of variance (MANOVA) was conducted. Every student wrote essays in two genres, and did so in one of the two time constraint conditions, and the student’s proficiency was either high-intermediate or advanced. Because the prompts and rubric are different in the two genres, text quality (writing scores) for the genres was not included in the MANOVA. Evaluation of the homogeneity of variance- covariance matrices (Box’s M), error variances (Levene’s test), linearity, non-multicollinearity, and normality assumptions underlying MANOVA did not reveal any substantial anomalies. Given the number of comparisons, the a priori alpha level was set at p < .0025 with Bonferroni adjustment (.05/20). For the second research question about the effect of time constraints and proficiency on text quality, a two-way analysis of variance (ANOVA) for each genre was conducted. Given the multiple comparisons, the a priori alpha level was set at p < .0083 with Bonferroni adjustment (.05/6). With regard to the third research question, to explore the relationship between writing fluency measures and writing quality and to determine which fluency measures predict writing quality, a correlation and multiple regression analysis were performed. For the fourth research question about the L2 learners’ perceptions, to analyze the results of the questionnaire on genre and time constraints, one-way analyses of variance (ANOVA) and a post-hoc Bonferroni test were used to look for differences in the learners’ perceptions of their writing tasks. The a priori alpha level was set at p < .0062 with Bonferroni adjustment (.05/8). 44 Along with exact p-values, effect sizes for inferential statistics (Cohen’s d) are reported. Cohen’s d is considered to be the most appropriate effect size estimate. The effect size can tell the magnitude of quantitative findings and observed differences between two conditions in standard deviation units (Norris & Ortega, 2000; Plonsky & Oswald, 2014). According to Plonsky and Oswald (2014), small, medium, and large effect sizes of Cohen’s d correspond to values of .40, .70, and 1, respectively. 45 CHAPTER 4. RESULTS 4.1. Quantitative analysis The descriptive statistics for writing fluency behaviors and linguistic outcomes by time constraints, proficiency, and genres are presented in Table 10. The learners in each group wrote the narrative essays and the argumentative essays on two different days. Although the 95% confidence intervals for the four groups overlap, there seem to be differences between the groups. Within groups, the two genres differed in terms of syntactic complexity, fluency, and writing fluency behaviors (pausing and revision). The L2 learners tended to produce more complex language, such as higher syntactic complexity and lexical complexity and less fluent writing behaviors, such as shorter P-burst lengths in argumentative essays than narratives. The short- timed groups showed longer P-burst lengths than the long-timed groups. The advanced students tended to show higher syntactic complexity (i.e., MLS, MLC, CN/T and VP/T) and fluency (i.e., process: words per minutes and product: words per minute) than the high-intermediate students. 46 Table 10 Descriptive Statistics: Writing Fluency Behaviors and Linguistic Outcomes by Time Constraints, Proficiency, and Genres Meas ures High-intermediate High-intermediate short-timed (N = 30) long-timed (N = 30) Advanced short-timed (N = 33) Advanced long-timed (N = 30) Nar Arg Nar Arg Nar Arg Nar Arg M (SD) 95% CI M (SD) 95% CI MLS 17.96 (5.31) 15.98, 19.94 18.71 (6.01) 16.47, 20.96 MLC 8.83 (1.45) 8.29, 9.37 9.37 (1.31) 8.89, 9.86 DC/C CP/C CN/T VP/T WL WF D .39 (.10) .18 (.09) 1.69 (.57) 2.46 (.45) 1.47 (.08) 3.09 (.09) 87.77 (17.29) .35, .43 .14, .20 1.49, 1.89 2.29, 2.62 1.44, 1.50 3.05, 3.12 81.31, 94.22 .39 (.12) .18 (.10) 2.25 (.65) 2.50 (.69) 1.60 (.11) 3.03 (.10) 83.86 (18.93) .35, .43 .14, .21 2.01, 2.49 2.23, 2.75 1.56, 1.64 2.99, 3.06 76.78, 90.93 M (SD) 16.43 (3.77) 8.35 (1.02) .40 (.08) .20 (.07) 1.56 (.46) 2.31 (.41) 1.46 (.06) 3.09 (.07) 87.66 (15.07) 95% CI M (SD) 95% CI M (SD) 95% CI M (SD) 95% CI M (SD) 95% CI M (SD) 95% CI 15.02, 17.84 17.60 (3.90) 16.14, 19.05 20.83 (4.57) 19.20, 22.45 21.04 (4.59) 19.41, 22.66 19.77 (4.14) 18.22, 21.32 20.75 (3.40) 19.48, 22.02 7.97, 8.74 9.35 (1.36) 8.84, 9.86 9.33 (1.27) .37, .42 .18, .24 1.39, 1.73 2.16, 2.49 1.44, 1.48 3.07, 3.12 82.04, 93.29 .37 (.05) .23 (.11) 1.93 (.48) 2.35 (.37) 1.59 (.07) 3.05 (.07) 84.98 (16.40) .35, .39 .19, .27 1.75, 2.11 2.21, 2.49 1.57, 1.62 3.02, 3.07 78.85, 91.10 .44 (.10) .24 (.12) 1.98 (.69) 2.76 (.68) 1.49 (.06) 3.07 (.07) 93.93 (17.45) 8.88, 9.78 .40, .47 .20, .28 1.74, 2.22 2.52, 3.00 1.47, 1.51 3.04, 3.09 87.75, 100.12 10.60 (1.62) 10.03, 11.18 9.07 (1.53) 8.50, 9.64 10.27 (1.24) 9.81, 10.73 .41 (.10) .27 (.15) 2.50 (.65) 2.70 (.68) 1.65 (.08) 3.01 (.09) 88.32 (19.09) .37, .44 .22, .32 2.27, 2.73 2.46, 2.94 1.62, 1.68 2.98, 3.04 81.55, 95.09 .42 (.08) .21 (.10) 1.89 (.52) 2.65 (.43) 1.47 (.09) 3.10 (0.07) 88.98 (13.05) .39, .45 .18, .25 1.69, 2.08 2.49, 2.81 1.44, 1.50 3.07, 3.12 84.11, 93.86 .43 (.09) .26 (.10) 2.52 (.63) 2.70 (.44) 1.62 (.07) 2.96 (.23) 88.22 (13.65) .40, .46 .22, .30 2.28, 2.75 2.54, 2.86 1.60, 1.66 2.88, 3.05 83.13, 93.32 47 Table 10 (cont’d) Measures High-intermediate High-intermediate short-timed (N = 30) long-timed (N = 30) Advanced short-timed (N = 33) Advanced long-timed (N = 30) Nar Arg Nar Arg Nar Arg Nar Arg M (SD) 17.62 (4.97) 95% CI 15.77, 19.48 M (SD) 15.58 (4.20) 95% CI 14.01, 17.15 M (SD) 11.83 (2.86) 95% CI 10.77, 12.90 M (SD) 10.83 (3.05) 95% CI 9.70, 11.97 M (SD) 20.73 (4.69) 95% CI 19.07, 22.40 M (SD) 18.12 (4.19) 95% CI 16.63, 19.60 M (SD) 17.15 (4.75) 95% CI 15.38, 18.92 M (SD) 15.64 (4.60) 95% CI 13.92, 17.36 12.88 (3.94) 11.41, 14.35 11.16 (3.16) 9.97, 12.33 8.00 (2.31) 7.13, 8.86 7.05 (1.97) 6.30, 7.78 14.57 (4.55) 12.96, 16.18 12.40 (3.10) 11.30, 13.49 11.56 (3.86) 10.12, 13.00 10.38 (3.76) 8.98, 11.79 3.26 (1.06) 2.86, 3.66 3.48 (.93) 3.13, 3.82 3.72 (.86) 3.40, 4.04 3.78 (.72) 3.51, 4.05 3.27 (.81) 2.98, 3.55 3.50 (.79) 3.22, 3.78 3.39 (.71) 3.13, 3.65 3.44 (.61) 3.21, 3.67 41.39 (27.48) 31.13, 51.65 33.05 (16.04) 27.06, 39.04 .29 (.15) .23, .34 .46 (.42) .30, .62 22.07 (9.65) .34 (.21) 18.4, 25.67 .26, .42 20.22 (7.95) .27 (.54) 17.25, 23.20 43.62 (17.82) 37.30, 49.94 37.51 (15.05) 32.18, 42.85 34.33 (17.34) 27.85, 40.80 31.92 (12.50) 27.25, 36.59 .07, .48 .37 (.35) .25, .50 .37 (.28) .27, .47 .31 (.24) .22, .39 .32 (.32) .20, .44 1.69 (.72) 1.42, 1.96 1.73 (.83) 1.42, 2.04 1.68 (.60) 1.45, 1.90 1.13 (.71) .87, 1.40 1.52 (.59) 1.31, 1.73 2.24 (.95) 1.90, 2.58 1.57 (.54) 1.37, 1.77 1.20 (.77) .91, 1.48 Process: Words per minute Product: Words per minute Number of P- bursts P-burst length Pause within words Pause between words 48 Table 10 (cont’d) Measures High-intermediate High-intermediate short-timed (N = 30) long-timed (N = 30) Advanced short-timed (N = 33) Advanced long-timed (N = 30) Pause between sentences Pause between paragraph s Ratio of process and product Number of R- bursts R-burst length Nar Arg Nar Arg Nar Arg Nar Arg M (SD) .19 (.15) 95% CI .14, .25 M (SD) .17 (.13) 95% CI .12, .22 M (SD) .15 (.11) 95% CI .11, .19 M (SD) .10 (.08) 95% CI .07, .13 M (SD) .17 (.13) 95% CI .12, .22 M (SD) .19 (.10) 95% CI .15, .22 M (SD) .19 (.17) 95% CI .13, .26 M (SD) .15 (.13) 95% CI .10, .20 .05 (.05) .02, .06 .05 (.06) .02, .07 .03 (.03) .02, .05 .03 (.03) .02, .04 .05 (.05) .02, .06 .06 (.06) .04, .08 .03 (.03) .02, .04 .04 (.04) .02, .05 .68 (.10) .64, .71 .69 (.09) .65, .72 .63 (.13) .59, .68 .61 (.13) .56, .66 .66 (.13) .62, .71 .65 (.10) .62, .69 .65 (.09) .61, .68 .63 (.10) .59, .67 6.32 (7.15) 3.65, 8.99 5.50 (4.72) 3.74, 7.26 3.67 (2.73) 2.65, 4.69 3.30 (2.78) 2.26, 4.34 7.38 (5.43) 5.46, 9.31 5.60 (5.12) 3.79, 7.42 4.67 (3.93) 3.20, 6.14 2.85 (3.06) 1.71, 3.99 11.71 (5.82) 9.53, 13.88 11.59 (5.88) 9.39, 13.78 10.83 (5.99) 8.59, 13.07 10.27 (4.11) 8.73, 11.80 12.49 (5.02) 10.70, 14.27 10.90 (4.26) 9.40, 12.42 12.69 (4.45) 11.03, 14.35 12.78 (4.66) 11.04, 14.52 49 A repeated measures MANOVA was performed, using 20 dependent measures to analyze within genres (within-subject variable). The independent variables were proficiency and time constraints. The MANOVA indicated statistically significant genre differences of the combined dependent variables according to Wilks’ Lambda (.169; F(20, 100) = 24.508, p = < .001, d = .90). As shown in Table 11, follow-up univariate ANOVAs found statistically significant differences between the two genres in MLC (p = < .001), CN/T (p = < .001), WL (p = < .001), WF (p = < .001), D (p = < .001), process: words per minute (p = < .001), product: words per minutes (p = < .001), P-burst length (p = < .001), and the number of R-bursts (p = .001). A comparison of effect sizes suggested that genre differences had the greatest, though still moderate, effect on complexity, fluency, and writing fluency behaviors. In addition, the interaction between genre and time (.738, F(20, 100) = 1.771, p = .034, d = .24) was found to be statistically significant, indicating that the effect of genre on the linguistic measures was not the same in the two time constraint conditions. This result suggests that the learners wrote differently in the two genres depending on the given time. In contrast, according to Wilks’ Lambda, there was no interaction between genre and proficiency (.804, F(20, 100) = 1.220, p = .254, d = .19), suggesting that the high-intermediate learners and advanced learners constructed their writing in similar ways regardless of genre. Univariate testing showed the interaction between genre and time to be significant in the number of pauses between words (F(1, 119) = 18.764, p = < .001, d = .78). Figure 3 shows that the participants in the short-timed group made fewer pauses between words in narratives than in argumentative essays; however, the participants in the long-timed group made fewer pauses between words in argumentative essays than in narratives. 50 Figure 3. Means of pauses between words in the two genres. A close examination of the results of the follow-up univariate ANOVAs and the descriptive statistics shows that the patterns of differences were different for each measure. The argumentative genre led the participants to produce higher MLC, CN/T, WL, and WF than did the narrative genre. As Figures 4, 5, 6, and 7 demonstrate, the argumentative genre elicited more complex language than the narrative genre across the groups. The narrative genre showed higher P-burst lengths, product: words per minutes and number of R-bursts than the argumentative genre. As Figures 8, 9, and 10 show, when the participants wrote narratives, they showed more fluent writing behaviors than when they wrote argumentative genres. In sum, although the MANOVA detected differences in the genres, the follow-up analyses showed that the patterns of genre differences varied for each measure. 51 Table 11 Repeated Measures MANOVA: Effects of Time Constraints and Proficiency on Writing Fluency Behaviors and Linguistic Outcomes within Genres Measures Genre Genre * Proficiency Genre * Time F P d F 7.219 .008 .48 .402 70.817 <.001* 1.51 3.820 2.364 5.456 .127 .021 .27 .42 .078 1.608 119.154 <.001* 1.96 1.345 .128 .722 .06 .222 352.196 <.001* 3.38 2.678 35.823 <.001* 1.07 2.848 4.227 .042 .37 43.234 <.001* 1.18 .001 .994 p .527 .053 .781 .207 .248 .638 .104 .094 .972 .321 d .11 .35 .05 .23 .20 .09 .30 .30 0 .18 F 1.046 .621 .008 .795 .155 .392 .000 1.926 .924 3.890 p .309 .432 .931 .374 .695 .533 .985 .168 .338 .051 33.893 <.001* 1.05 .416 .520 .11 2.951 .088 6.398 .013 .45 .006 .940 .01 2.564 .112 14.094 <.001* .422 .517 .67 .11 .113 .271 .737 .604 .06 .09 4.193 1.562 .043 .214 d .18 .14 .02 .16 .07 .13 0 .25 .17 .35 .31 .29 .37 .23 MLS MLC DC/C CP/C CN/T VP/T WL WF D Process: Words per minute Product: Words per minute Number of P- bursts P-burst length Pause within words Pause between words Pause between sentences Pause between paragraphs Ratio of process and product Number of R- bursts R-burst length Wilk’s Lambda .185 .668 .08 4.773 .031 .39 18.764 <.001* .78 2.092 .151 .26 .660 .418 .15 1.877 .173 .086 .770 .05 1.198 .276 .06 .602 .439 1.829 .179 .05 .123 .726 .06 1.932 .167 12.662 .001* .64 3.200 .076 1.61 .206 25.182 <.001* .22 .90 .228 1.220 .634 .254 .32 .09 .19 .093 .761 .517 .474 1.968 .015* .24 .14 .25 .05 .13 .25 * p < .0025 (Bonferroni adjustment for dependent variables) 52 Figure 4. Genre differences in MLC Figure 5. Genre differences in CN/T 53 Figure 6. Genre differences in WL Figure 7. Genre differences in WF. 54 Figure 8. Genre differences in Product: Words per minute Figure 9. Genre differences in P-burst length 55 Figure 10. Genre differences in the number of R-bursts. In order to find the effects of time constraints and proficiency on linguistic features in both genres, tests of between-subjects effects were conducted next. The MANOVA indicated statistically significant proficiency differences of the combined dependent variables according to Wilks’ Lambda (.626; F(20, 100) = 2.993, p = <.001, d = .31). As shown in Table 12, follow-up univariate ANOVAs indicated statistically significant advantages for the advanced groups in four syntactic complexity measures and two fluency measures: MLS (p = < .001), MLC (p = < .001), CN/T (p = < .001), VP/T (p = .001), product: words per minute (p = < .001), and product: words per minute (p = < .001), but not in writing fluency behaviors (see Figures 16, 17, 18, 19, 20, and 21). Comparisons of effect sizes suggest that proficiency had a medium effect on syntactic complexity and fluency. In addition, the MANOVA indicated statistically significant time constraint differences on the combined dependent variables according to Wilks’ Lambda (.576; F(20, 100) = 3.674, p = < .001, d = .35). The follow-up univariate ANOVAs indicated 56 statistically significant time constraint effects on process: words per minute (p = < .001), product: words per minute (p = < .001), P-burst length (p = < .001), the number of pauses between words (p = < .001), and the number of R-bursts (p = < .001). The participants in the short-timed groups showed higher fluency than those in the long-timed groups (see Figures 11 and 12). For the writing fluency behaviors, the participants in the short-timed groups paused more between words and revised more than those in the long-timed groups (see Figures 13, 14, and 15). However, the interaction between proficiency and time constraints was not statistically significant according to Wilks’ Lambda (.584; F(20, 100) = 1.209, p = < .001, d = .34). 57 Table 12 MANOVA: Effects of Time Constraints and Proficiency on Linguistic Features Measures Proficiency Time Proficiency * Time MLS MLC DC/C CP/C CN/T VP/T WL WF D Process: Words per minute Product: Words per minute Number of P- bursts P-burst length Pause within words Pause between words Pause between sentences Pause between paragraphs Ratio of process and product Number of R- bursts R-burst length Wilk’s Lambda F p 14.514 <.001* 15.123 <.001* 7.367 .008 8.619 .004 14.729 <.001* 12.385 .001* 5.362 .022 3.657 .058 2.241 .137 D .69 .70 .49 .53 .69 .64 .42 .35 .27 F p 1.691 .196 1.598 .209 .035 .608 .851 .437 1.934 .167 1.414 .237 1.715 .193 .088 .158 .768 .692 d .23 .23 .03 .14 .25 .21 .24 .05 .07 F .182 .013 .093 p .670 .910 .761 3.309 .071 .949 .286 .275 .480 .356 .332 .593 .601 .490 .552 30.450 <.001* 1.00 33.724 <.001* 1.05 2.459 .120 18.936 <.001* .79 38.475 <.001* 1.12 3.097 .081 1.357 .246 .21 2.243 .137 .27 1.583 .211 8.062 .005 .51 18.971 <.001* .000 .997 0 2.050 .155 .79 .25 2.556 .113 .017 .897 .730 .395 .15 20.390 <.001* .81 1.157 .284 1.961 .164 .25 3.349 .070 .33 1.864 .175 .827 .365 .16 5.966 .016 .44 .380 .539 .024 .877 .03 5.173 .025 .41 1.501 .223 .311 .578 .10 11.433 .001* .61 .041 .839 1.912 .169 2.589 .001* .14 .29 .001 .972 0 1.751 .188 3.679 <.001* .35 1.216 .257 * p < .0025 (Bonferroni adjustment for dependent variables) d .07 .02 .05 .33 .18 .10 .09 .12 .11 .28 .32 .23 .29 .02 .19 .24 .11 .22 .04 .24 .20 58 Figure 11. Effects of time constraints on process: words per minute. Figure 12. Effects of time constraints on product: words per minute 59 Figure 13. Effects of time constraints on p-burst length Figure 14. Effects of time constraints on pause between words 60 Figure 15. Effects of time constraints on the number of R-bursts. Figure 16. Effects of proficiency on MLS 61 Figure 17. Effects of proficiency on MLC Figure 18. Effects of proficiency on CN/T 62 Figure 19. Effects of proficiency on VP/T Figure 20. Effects of proficiency on product: words per minute 63 Figure 21. Effects of proficiency on process: words per minute Table 13 presents the descriptive statistics for writing quality by time constraints, proficiency, and genres. At a glance, the advanced students received higher scores on all subscales than the high-intermediate students. The 95% confidence intervals for the total scores from high-intermediate and advanced groups do not overlap with each other, and thus there seem to be differences between the groups (see Figure 22). For the comparison between the two genres and time constraints, the mean and 95% confidence intervals did overlap across the groups. Table 14 presents the results of the two-way ANOVA that was conducted to find effects of time constraints and proficiency on the writing quality of narratives. A main effect of proficiency was shown based on the total scores and subscale scores, with medium to large effect sizes, whereas the effect of time and the interaction of proficiency and time were not statistically significant. For the proficiency effect on narratives, advanced learners gained higher scores overall (F(1, 119) = 35.610, p = < .001, d = 1.08), and on content (F(1, 119) = 27.712, p = < .001, 64 d = .95), organization (F(1, 119) = 35.073, p = < .001, d = 1.07), vocabulary (F(1, 119) = 28.836, p = < .001, d = .97), language use (F(1, 119) = 37.302, p = < .001, d = 1.10), and mechanics (F(1, 119) = 15.155, p = < .001, d = .70). The total scores and all the subscale scores except for mechanics were found to have large effect sizes; mechanics had a medium effect size. 65 Table 13 Descriptive Statistics: Writing Quality by Time Constraints, Proficiency, and Genres Measures High-intermediate High-intermediate short-timed (N = 30) long-timed (N = 30) Advanced short-timed (N = 33) Advanced long-timed (N = 30) Nar Arg Nar Arg Nar Arg Nar Arg M 95% M 95% M 95% M 95% M 95% M 95% M 95% M 95% (SD) CI (SD) CI (SD) CI (SD) CI (SD) CI (SD) CI (SD) CI (SD) CI Total 69.70 67.28, 70.13 68.14, 69.63 67.48, 70.69 68.83, 75.30 72.92, 75.31 73.03, 77.46 75.26, 78.09 75.10, (6.49) 68.62 (5.32) 72.11 (5.74) 71.77 (4.98) 72.55 (6.69) 77.67 (6.43) 77.59 (5.90) 79.67 (8.00) 81.08 Content 15.98 15.38, 15.80 15.28, 15.66 15.15, 15.68 15.26, 16.97 16.42, 16.84 16.27, 17.52 16.99, 17.41 16.69, (1.61) 16.59 (1.39) 16.32 (1.37) 16.18 (1.13) 16.10 (1.55) 17.52 (1.60) 17.41 (1.41) 18.05 (1.92) 18.12 Organization 15.37 14.78, 15.65 15.16, 15.53 15.02, 15.85 15.38, 16.67 16.10, 16.74 16.17, 17.43 16.88, 17.44 16.77, (1.52) 15.93 (1.30) 16.14 (1.36) 16.04 (1.24) 16.31 (1.59) 17.23 (1.61) 17.31 (1.49) 17.99 (1.79) 18.11 Vocabulary 15.35 14.84, 15.57 15.18, 15.57 15.11, 15.62 15.24, 16.50 15.98, 16.55 16.03, 17.05 16.54, 17.28 16.62, (1.37) 15.86 (1.03) 15.95 (1.22) 16.02 (1.00) 15.99 (1.46) 17.02 (1.45) 17.06 (1.36) 17.56 (1.78) 17.95 Language 14.92 14.39, 14.98 14.51, 14.83 14.28, 15.28 14.80, 16.44 15.86, 16.45 15.94, 16.67 16.10, 16.97 16.28, use (1.40) 15.44 (1.26) 15.45 (1.49) 15.39 (1.30) 15.77 (1.63) 17.02 (1.44) 16.96 (1.54) 17.24 (1.86) 17.66 Mechanics 8.09 7.60, 8.13 7.75, 8.03 7.68, 8.26 7.94, 8.72 8.40, 8.72 8.45, 8.80 8.53, 8.99 8.64, (1.32) 8.58 (1.01) 8.51 (.95) 8.39 (.86) 8.58 (.92) 9.05 (.79) 9.00 (.72) 9.07 (.94) 9.34 66 Figure 22. Total writing quality scores in the two time constraints and proficiency levels across the groups. Table 14 Two-Way ANOVA: Effects of Time Constraints and Proficiency on Writing Quality in Narrative Essays Measures Proficiency Time Proficiency * Time F p d F p d F p d Total 35.610 <.001* 1.08 .862 .355 .17 .990 .332 .17 Content 27.712 <.001* .95 .183 .670 .07 2.569 .112 .29 Organization 35.073 <.001* 1.07 2.984 .087 .31 1.233 .269 .20 Vocabulary 28.836 <.001* .97 2.444 .121 .28 .462 .498 .12 Language 37.302 <.001* 1.10 .069 .794 .05 .320 .573 .10 use Mechanics 15.155 <.001* .70 .002 .968 0 .132 .717 .07 * p < .0083 (Bonferroni adjustment for dependent variables) 67 Table 15 Two-Way ANOVA: Effects of Time Constraints and Proficiency on Writing Quality in Argumentative Essays Measures Proficiency Time Proficiency * Time Total Content Organization Vocabulary Language use Mechanics F p d F p d F p d 30.591 <.001* .99 2.157 .145 .26 .955 .330 .18 24.881 <.001* .90 .635 .427 .14 1.480 .226 .22 24.330 <.001* .89 2.731 .101 .30 .842 .361 .17 29.153 <.001* .97 2.586 .110 .29 1.971 .163 .25 34.667 <.001* 1.06 2.352 .128 .28 .169 .682 .07 16.542 <.001* .73 1.424 .235 .22 .182 .670 .08 *p < .0083 (Bonferroni adjustment for dependent variables) Table 15 presents the results of a two-way ANOVA on the effects of time constraints and proficiency on the writing quality of the argumentative essays. The results are similar to those for the narratives. A main effect of proficiency was found for the total scores and the subscale scores, with medium to large effect sizes, whereas the effect of time and the interaction of proficiency and time were not statistically significant. With regard to the proficiency effect on the argumentative essays, the advanced learners gained higher total scores (F(1, 119) = 30.591, p = < .001, d = .99), as well as higher scores on content (F(1, 119) = 24.881, p = < .001, d = .90), organization (F(1, 119) = 24.330, p = < .001, d = .89), vocabulary (F(1, 119) = 29.153, p = < .001, d = .97), language use (F(1, 119) = 34.667, p = < .001, d = 1.06), and mechanics (F(1, 119) = 16.542, p = < .001, d = .73) than the high-intermediate learners. The total scores and the subscale scores, except for mechanics, were found to have large effect sizes, while mechanics 68 had a medium effect size. Table 16 presents the correlation between fluency measures and writing quality and the correlation between fluency measures and linguistic complexity in the narratives. For writing quality and fluency measures, the results of a Pearson correlation indicated significant positive associations between process: words per minute and total quality, (r(123) = .413, p = < .001), product: words per minute and total quality (r(123) = .441, p = < .001), the ratio of process and product and total quality (r(123) = .270, p = .003), P-burst length and total quality (r(123) = .217, p = .016), and R-burst length and total quality (r(123) = .290, p = .001). The two fluency measures (process and product: words per minute) showed moderate correlations with writing fluency, and the writing fluency behavior measures (the ratio of process and product, P-burst length, and R-burst length) demonstrated weak correlations. Based on Plonsky and Oswald’s (2014) benchmarks for the effect size of correlation coefficients (.25: small; .40: medium; .60: large), the association between total scores for writing quality in the narratives and writing fluency behaviors had a medium or small effect. With regard to the correlation between fluency measures and linguistic complexity, there were significant associations between fluency measures (process: words per minute and product: words per minute) and syntactic complexity. In addition, writing fluency behaviors such as pausing and revision tend to be associated with lexical complexity more than with syntactic complexity. Overall, the correlation coefficients show that the association between writing fluency measures and linguistic complexity measures had a small effect size. 69 Table 16 Correlations: Fluency Measures with Total Writing Quality and Linguistic Complexity Measures in Narrative Essays (N = 123) Measures Process: Words per minute Product: Words per minute Ratio of process and product Number of P- bursts P-burst length Pause within words Pause between words Pause between sentences Pause between paragraphs Number of R- bursts Total quality .413*** MLS MLC DC/C CP/C CN/T VP/T WL WF D .279** .252** .098 .191* .190* .249** .158 -.120 .153 .441*** .228* .227* .095 .139 .152 .205* .089 -.107 .115 .270** .042 .038 .111 -.055 .051 .049 -.058 -.057 -.083 -.063 .001 -.070 .085 -.127 .023 -.029 -.048 .060 -.159 .217* -.118 .112 .040 .187* -.062 -.048 .039 .154 .019 .068 .058 .110 -.032 .209* -.205* .159 .111 .031 -.011 .002 .006 -.032 .084 -.123 .020 -.008 -.086 -.001 -.218* .019 -.188* -.171 -.072 -.212* -.113 -.132 -.131 -.054 -.277* .046 .043 .105 .077 -.066 .178* .135 .084 -.171 .100 -.045 .074 .003 .036 .016 .040 .025 .141 -.180* .169 R-burst length *** p < 0.001, ** p < 0.01, * p < .05 .290** -.036 .106 -.043 .141 -.042 .036 .058 -.118 .046 To further explore the predictive relationship between writing quality and fluency measures in narratives, a multiple regression was performed to find which fluency measures most strongly predicted the narratives’ overall writing quality. A stepwise multiple regression (probability of F to enter = .05), beginning with all eleven fluency measures, identified two statistically significant models for predicting the overall writing quality of the narratives. As shown in Table 17, product: words per minute alone predicted a relatively large proportion of the variance in writing quality (R2 = .195, F(1, 121) = 29.251, p = < .001). The addition of R-burst length increased the predictive power slightly (R2 = .228, F(1, 120) = 5.147, p = .025). Unstandardized beta values (Table 18) indicated that an increase of one word in the R-burst 70 length and product: words per minute was related to an increase of between .25 and .62 score points on the writing quality of the narrative essays. The remaining nine variables did not contribute additional unique statistically significant variance once the two main predictor variables were removed from the model. 71 Table 17 Model Summary: Total Quality as Criterion Variable in Narrative Essays Model R R2 Adjusted R2 Std. Error of the estimate R2 change Change statistics F change df1 df2 Sig. F change 1 2 .441a .195 .188 6.351 .195 29.251 .477b .228 .215 6.245 .033 5.147 1 1 121 .000 120 .025 a Predictors: (constant), product: words per minute b Predictors: (constant), product: words per minute, R-burst length Table 18 Coefficientsa: Total Quality as Criterion Variable in Narrative Essays Model (Constant) Product: Words per minute (Constant) Product: Words per minute 1 2 R-burst length .110 a Dependent variable: total quality .249 Unstandardized Std. Error coefficients B 64.834 1.629 .698 .129 62.772 1.842 .621 .131 Standardized coefficients Beta t Sig. 95% confidence interval for B Lower bound Upper bound 39.794 <.001 61.609 68.060 Correlations Collinearity statistics Zero- order Partial Part Toler- ance VIF .441 5.408 <.001 .442 .953 34.080 <.001 59.125 66.419 .392 .188 4.726 <.001 2.269 .025 .361 .032 .881 .466 .441 .441 .441 1.000 1.000 .441 .396 .379 .203 .182 .290 .933 1.072 .993 1.072 72 Table 19 presents the correlations between fluency measures and writing quality and the correlation between fluency measures and linguistic complexity in argumentative essays. Different from the correlation between fluency measures and total quality in narratives, the results of a Pearson correlation indicated only three significant positive associations: process: words per minute and total quality (r(123) = .457, p = < .001), product: words per minute and total quality (r(123) = .410, p = < .001) and P-burst length and total quality (r(123) = .361, p < .001). The fluency measures (process: words per minute and product: words per minute) and the writing fluency behavior measure (P-bursts) showed moderate correlations with the writing quality of the argumentative essays. With regard to the effect sizes, the association between total scores on writing quality in the argumentative essays and writing fluency had a medium effect. For the correlations between writing fluency measures and linguistic complexity, there are several significant associations: process: words per minute and MLS, product: words per minute and MLS, number of P-bursts and D, pauses within words and CP/C, and pauses between paragraphs and CN/T. The correlation coefficients showed that these associations between writing fluency measures and linguistic complexity measures had a small effect size. 73 Table 19 Correlations: Fluency Measures with Total Writing Quality and Linguistic Complexity Measures in Argumentative Essays (N = 123) Measures MLS MLC DC/C CP/C CN/T VP/T WL WF D Total quality .457*** .047 -.143 .029 .057 -.100 .048 -.111 .159 .027 .147 .172 -.010 .063 .033 .105 -.058 -.039 .062 -.166 -.099 -.090 .106 .069 .062 .176 -.045 -.015 -.210* .061 .020 .132 .050 .113 .019 .171 .015 -.198* -.055 -.044 -.123 .146 -.138 -.008 -.055 .388 .474 .095 .830 .614 .089 -.165 .010 .011 .045 .004 .056 .164 -.123 .197* .121 -.051 .067 .009 -.084 .010 -.008 -.011 .033 .060 -.003 .210* .105 .068 .185* -.043 -.065 -.064 .125 .141 .010 .042 -.011 .049 -.107 -.115 .045 .308** -.022 .410*** .283** .194* .361*** Process: Words per minute Product: Words per minute Ratio of process and product Number of P-bursts P-burst length Pause within words Pause between words Pause between sentences Pause between paragraphs Number of R-bursts R-burst length *** p < 0.001, ** p < 0.01, * p < .05 -.026 -.115 -.030 -.058 -.055 .069 -.095 .124 .154 -.062 .068 .038 .088 -.064 -.014 .111 .169 In order to investigate the predictive relationship between writing quality and fluency measures for argumentative essays, a multiple regression was performed to find which fluency measures most strongly predicted the argumentative essays’ overall writing quality. A stepwise multiple regression (probability of F to enter = .05), beginning with all eleven fluency measures, 74 identified two statistically significant models for predicting the overall writing quality of the argumentative essays. As shown in Table 20, process: words per minute alone predicted a relatively large proportion of the variance in writing quality of the argumentative essays (R2 = .209, F(1, 121) = 31.991, p = < .001). The addition of the number of R-bursts (R2 = .260, F(1, 120) = 8.240, p = .005) increased the predictive power slightly. Unstandardized beta values (Table 21) indicated that an increase of one word in process: words per minute and a decrease of one point of the number of R-bursts were related to a change of between -.400 and .790 score points on the writing quality of the argumentative essays. The remaining nine variables did not contribute additional unique statistically significant variance once the two main predictor variables were removed from the model. Table 20 Model Summary: Total Quality as Criterion Variable in Argumentative Essays Model R R2 Adjusted R2 Std. Error of the estimate R2 Change Change statistics F change df1 df2 Sig. F change 1 2 .457a .209 .203 6.282 .209 31.991 1 121 .000 .510b .260 .248 6.102 .051 8.240 1 120 .005 a Predictors: (constant), process: words per minute b Predictors: (constant), process: words per minute, the number of R-bursts 75 Table 21 Coefficientsa: Total Quality as Criterion Variable in Argumentative Essays 1 2 Process: Words per minute The number of R-bursts Model Unstandardized coefficients B Std. Error (Constant) 63.485 1.876 Process: Words per minute .669 .118 (Constant) 63.400 1.822 Standardized coefficients Beta t Sig. 95% confidence interval for B Lower bound Upper bound 33.844 <.001 59.772 67.199 Correlations Collinearity statistics Zero- order Partial Part Toler- ance VIF .457 5.656 <.001 .435 .903 .457 .457 .457 1.000 1.000 34.790 <.001 59.791 67.008 .790 .122 .540 6.455 <.001 .547 1.032 .457 .508 .507 .882 1.134 –.005 –.253 –.225 .882 1.134 –.400 .139 –.240 –2.871 .005 –.676 –.124 a Dependent variable: total quality 76 4.2. Qualitative analysis The stimulated recall data (N = 16) were used to triangulate the quantitative results regarding the second research question. Based on Kellogg’s (1996) model of writing, the participants’ comments on pausing and revision were categorized as pertaining to planning, translation, or monitoring processes. Within the planning processes (planning and organization), the majority of recall comments were about content, and within the translation processes (lexical retrieval, syntactic encoding, and cohesion), more than half of the comments were about lexical retrieval. Table 22 presents examples from Participant #7’s argumentative essay, classified by types of writing process, along with the participant’s stimulated recall comments regarding pausing while writing these examples. The first example is from the first paragraph; the participant said he had decided to argue against the prompt, and he paused between words to plan what content and supporting ideas he would use to disagree with the prompt’s statement regarding the necessity of foreign language abilities. The next example shows a translation process; here, he paused to search for synonyms for “foreign language,” because he did not want to use the same words repeatedly; however, he did not find an appropriate synonym. The third example illustrates pausing for the purpose of monitoring; as the participant’s stimulated recall shows, he paused because he noted an error; he thought the term “world trade sector” was not appropriate. He later changed “world trade sector” to “trade sector.” 77 Table 22 Pausing: Writing Processes, Text Examples, and Stimulated Recall Comments (Participant #7) Writing Text Stimulated recall process comments Planning I could bet that a lot of students I decided a position to write. I came up would agree that foreign with contents and supporting ideas. I language abilities are necessary wanted to oppose the prompt. in this globalized area. However, is it that much? Many academies in…(pause) Translation However, is it that much? This I tried to think of vocabulary that essay would talk about it is not substitutes “foreign language”, but it that necessary… (pause) Many was hard to find one. I wanted to write English academies in Korea a different word that means foreign wants to language, but it was hard to find one. I would write the same vocabulary, “to use foreign language”. Monitoring World trade sector in world As I read, I found an error and wanted economy structure becomes to go back and fix it. larger…(pause) Table 23 presents examples of the writing processes related to revision, and the stimulated recall comments regarding these specific revisions in the argumentative essay of 78 Participant #4. In the first example, in a process of planning while writing, the participant wrote a sentence beginning with the connective “for instance”; she then decided she should emphasize a general point before writing about specific advantages. She therefore deleted the connective and inserted a new sentence between the two sentences she had just written. Next, the participant engaged in a translation process to retrieve lexical items as she wrote. She decided to revise “person” (a singular noun) to “a group of…people” (a collective noun), because, she explained, “group” was more appropriate in the context. Because the number of recalls was different from the number of participants (e.g., 30 minutes and 60 minutes), the data were converted to percentages. Figures 23 and 24 show the percentage of stimulated recall comments that tap into the writers’ cognitive processes underlying pausing and revision behaviors in the writing of narratives and argumentative essays (the actual numbers of comments are included in the appendix I). Figure 23 summarizes the distribution of the comments about pausing that the four groups of participants made. These stimulated recall data demonstrate that there are differences in the processes underlying pausing behaviors when participants at different proficiencies write in different genres under different time constraints. The proficiency comparisons show the advanced learners’ comments about pausing are more associated with translation than the high- intermediate learners’, and this is the case for both narrative and argumentative essays (narrative: 37% for advanced short-timed and 49% for advanced long-timed; argumentative: 35% for advanced short-timed and 40% for advanced long-timed). In other words, the advanced learners recalled pausing for lexical retrieval, syntactic encoding, and cohesion much more often than did the high-intermediate learners. 79 Table 23 Revision: Writing Processes, Text Examples, and Stimulated Recall Comments (Participant #4) Writing Text process Stimulated recall comments Planning Especially in the current globalized era, being I wanted to put an able to speak another language can bring much emphasis on the need to more benefits such as speaking different learn a foreign language people around the world, visiting to other before presenting countries, and learning more about another advantages of learning a country’s culture. [deleted: for instance] foreign language. I [inserted: In various ways, being able to speak added one sentence here. a foreign language fluently can lead to a lot of benefits that another abilities can fulfill.] The more language one can speak and understand, the more people that person can communicate with and learn about another language. Translation For example, I realized the necessity of a I deleted “person” and foreign language when I met [deleted: a changed it to group Chinese person] a group of Chinese people in because group was more the streets. appropriate in this context. 80 In regard to time constraints, in writing an argumentative essay, the short-timed group students’ comments are more associated with planning (53% for intermediate short-timed writing and 52% for advanced short-timed writing) than the long-timed group students’. On the other hand, the long-timed group students made more comments associated with monitoring, compared to the short-timed group students. With regard to genre differences, the distribution of translation-related pausing is similar across the groups; however, the distribution of planning-related pausing is different across the groups. During pauses, the short-timed groups showed more planning in argumentative than in narrative writing, while the long-timed groups showed more planning in narrative than in argumentative writing. In addition, unlike the short-timed groups, the long-timed groups showed more monitoring-related pausing in argumentative essays than in narrative essays. Figure 23. Comments about pausing from stimulated-recall sessions. 81 Figure 24 shows the distribution of comments about revision from the stimulated recall sessions. These comments suggest similarities and differences in the processes underlying revision behaviors when participants at different proficiency levels were writing in different genres under different time constraints. Figure 24. Comments about revision from stimulated-recall sessions. Overall, in contrast to the comments about pausing, a higher percentage of comments regarding revision referred to translation than to planning across all groups. Compared to the number of revision comments on planning processes, the participants made more comments about translation processes (lexical retrieval, syntactic encoding, and cohesion). In particular, it is worth focusing on proficiency differences in planning and translation: The advanced students tended to make more comments on translation than the high-intermediate students in both genres. Regarding time constraint differences, the long-timed groups and the short-timed groups did not show much difference in translation processes. For planning, there are some differences 82 depending on proficiency. The intermediate long-timed and short-timed groups showed similar amounts of translation-related revision comments; however, in writing narratives, the long-timed learner groups showed more planning than the short-timed learner groups, but the long-timed groups showed less planning in argumentative writing. On the other hand, the advanced long- timed group made more comments on planning than the advanced short-timed group during the writing of argumentative essays. However, in contrast to the advanced short-timed group students, the advanced long-timed group students made fewer comments about planning and more comments about translation in relation to revision when writing their narratives. With regard to genre differences in recall comments about revision, the difference is not large in numbers of comments related to the translation process; however, there is some difference in planning processes. The learners in the high-intermediate long-timed group commented more on planning in narratives (37%) than in argumentative essays (22%). However, those in the advanced long-timed group made more comments related to planning for the argumentative essays (28%) than for the narratives (13%). The participants’ comments about pausing and revision showed patterns according to the locations of pauses and revision behaviors (Figures 25, 26, 27, and 28). Kellogg’s (1996) model of writing describes cognitive processes during writing as lower or higher processes. In the stimulated recall data, higher level textual units such as sentences are associated with the comments regarding higher levels of writing processes such as planning rather than translation, regardless of time constraints, genres, and proficiencies. Regarding their pausing, most of the participants’ comments about translation and about planning occurred in different textual locations. The participants made more comments related to translation to explain pauses between words in both genres, and most of their comments related 83 to translation processes such as lexical retrieval, syntactic encoding, and cohesion referred to pauses they made between words. In regard to translation processes, few comments were made between sentences. Many of their comments associated with planning were made to explain their pauses between words and between clauses, although some referred to pauses between sentences. 40% 35% 30% 25% 20% 15% 10% 5% 0% s d r o w n i h t i W s d r o w n e e w t e B s e s u a l c n e e w t e B s e c n e t n e s n e e w t e B s d r o w n i h t i W s d r o w n e e w t e B s e s u a l c n e e w t e B s d r o w n i h t i W s d r o w n e e w t e B s e s u a l c n e e w t e B s e c n e t n e s n e e w t e B s e c n e t n e s n e e w t e B Planning Translation Monitoring Figure 25. Comments about pausing in narratives. 40% 35% 30% 25% 20% 15% 10% 5% 0% s d r o w n i h t i W s d r o w n e e w t e B s e s u a l c n e e w t e B s e c n e t n e s n e e w t e B s d r o w n i h t i W s d r o w n e e w t e B s e s u a l c n e e w t e B s d r o w n i h t i W s d r o w n e e w t e B s e s u a l c n e e w t e B s e c n e t n e s n e e w t e B s e c n e t n e s n e e w t e B Planning Translation Monitoring Figure 26. Comments about pausing in argumentative essays. 84 Intermediate-short Intermediate-long Advanced-short Advanced-long Intermediate-short Intermediate-long Advanced-short Advanced-long With regard to how comments about revision aligned with textual locations, the participants showed different patterns in the narratives and the argumentative essays, though all groups spent more time on translation than planning process. In narratives, the high-intermediate long-timed group participants made more comments about planning at the word and sentence levels than those in the other three groups. The long-timed groups’ comments showed more translation below the word level than the comments of the short-timed groups. Comparing the proficiency levels, the advanced groups made more comments on translation at the word level and below the sentence level than did the high-intermediate groups. Advanced long-timed group participants commented more about revision at the clause and sentence levels compared to the other three groups. The advanced long-timed group made more comments about planning below the clause level than the other groups. In the argumentative essays, the two long-timed groups made more comments about translation at the word and below the clause level than the two short-timed groups. 85 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% d r o w e h t t A d r o w e h t w o l e B e s u a l c e h t w o l e B e s u a l c e h t t A e c n e t n e s e h t t A d r o w e h t w o l e B d r o w e h t t A e s u a l c e h t w o l e B e s u a l c e h t t A e c n e t n e s e h t t A Planning Translation Figure 27. Comments about revision in narratives. 40% 35% 30% 25% 20% 15% 10% 5% 0% d r o w e h t t A d r o w e h t w o l e B e s u a l c e h t w o l e B e s u a l c e h t t A e c n e t n e s e h t t A d r o w e h t w o l e B d r o w e h t t A e s u a l c e h t w o l e B e s u a l c e h t t A e c n e t n e s e h t t A Planning Translation Figure 28. Comments about revision in argumentative essays. 86 Intermediate-short Intermediate-long Advanced-short Advanced-long Intermediate-short Intermediate-long Advanced-short Advanced-long Because the MANOVA showed a significant interaction between time constraints and genre, the qualitative data were used to learn more about how the participants used the time during pauses between words (see Figure 29). Because the number of recalls was different in the two time constraint groups, data were converted to percentages. At a glance, all learners showed similar patterns for planning, translation, and monitoring across the two genres. All groups spent more time on translation for both genres and spent less time on planning and monitoring for both genres. Although each group included only two students, the advanced learners showed more similar patterns during pauses between words than the high-intermediate learners when they were writing in the two different genres. Figure 29. Writing processes during pauses between words. 87 4.3. Exit questionnaire results: L2 writers’ perceptions of the time constraints and genres The exit questionnaire collected data on the participants’ perceptions of the genres and time constraints. Tables 24 and 25 reflect the results from the two open-ended questions, and Table 26 shows the descriptive statistics of the responses to the eight Likert-scale items. To analyze the Likert-scale data, one-way analyses of variance (ANOVA) and Bonferroni post-hoc tests were used to look for differences in the learners’ perceptions. Table 24 Questionnaire Responses by Group: “How did you feel about writing narrative and argumentative essays? Is one type of essay writing more difficult than the other?” Group Narrative is more Argumentative is more Both are similarly High-intermediate short-timed (N = 30) difficult 20% difficult 77% difficult 3% High-intermediate 27% long-timed (N = 30) Advanced short-timed (N = 33) Advanced long-timed (N = 30) 40% 40% 60% 57% 53% 13% 3% 7% With regard to the perceived difficulty of writing in the two genres, more than half of the participants considered the argumentative genre more difficult than the narrative genre. Although this perception seems to have varied depending on the participants’ English proficiency, many high-intermediate and advanced students tended to feel that argumentative essays were more 88 difficult to write, as illustrated in Excerpts 1 and 2. Excerpt 1. High-intermediate student in short-timed group, #119 “In writing the narrative, it was possible to write naturally as my brainstorming process connected to my writing process smoothly. But it was difficult to write and revise the argumentative essay when I brainstormed ideas and thought about the logical flow of writing.” Excerpt 2. Advanced student in short-timed group, #16 “It was difficult for me to write the argumentative essay. In writing the argumentative essay, I needed to think about language expressions appropriate for academic writing. However, in writing a narrative, I was able to use colloquial expressions as I talk to my friends. Writing the narrative was easier than writing the argumentative essay.” As shown in Table 24, more than half of the participants in the long-timed group considered the time allotment enough for both genres. However, the participants in the short- timed group felt that the time allowed was only enough for the narrative essay. In other words, they wanted more time for writing the argumentative essay. In the next two excerpts, an intermediate learner (Excerpt 3), and an advanced learner (Excerpt 4) explain why they wanted more time for writing their argumentative essays. The participants’ responses show that the learners in the short-timed group were aware of genre differences, as they wanted time for different kinds of writing processes, such as selecting vocabulary, to meet demands specific to the argumentative genre. 89 Table 25 Questionnaire Responses by Group: “Do you think the time allotted was enough to write the essays (both genres)?” Group Enough for both genres High-intermediate short-timed (N = 30) High-intermediate long-timed (N = 30) Advanced short-timed (N = 33) Advanced long-timed (N = 30) Enough for the narrative essay only 27% Enough for the argumentative essay only 10% Not enough for both genres 30% 33% 50% 33% 10% 7% 36% 30% 12% 21% 73% 17% 7% 3% Excerpt 3. Intermediate student in long-timed group, #7 “After writing the narrative, I had some spare time to revise. But the time was not enough for writing an argumentative essay. I was able to extend the writing with random vocabulary in narrative. However, in writing an argumentative essay, I felt that I needed to use more sophisticated vocabulary, and thinking about vocabulary consumed a lot of time.” 90 Excerpt 4. Advanced student in long-timed group, #47 “In writing the argumentative essay, I was not able to take enough time to brainstorm ideas. It was okay to write 300 words in one hour, but I wanted to have 20 more minutes to write out my argument, supporting ideas, and examples. On the other hand, in writing the narrative, the given time was enough because the topic was about myself. So I did not take a lot of time for brainstorming. And the word choice in narrative writing is more free than that in argumentative essays, so I can write faster in a narrative.” Table 26 Descriptive Statistics: Writing Difficulty Ratings in the Four Conditions Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 High-intermediate short-timed (N = 30) M (SD) High-intermediate long-timed (N = 30) 95% CI M (SD) Advanced short-timed (N = 33) Advanced long-timed (N = 30) 95% CI M (SD) 95% CI M (SD) 95% CI 4.53 (2.01) 5.83 (1.46) 4.88 (1.81) 5.90 (1.67) 3.58 (1.79) 4.07 (1.85) 5.73 (2.20) 5.85 (2.45) 3.77, 5.28 5.28, 6.38 4.20, 5.55 5.28, 6.52 2.92, 4.25 3.38, 4.76 5.03, 6.59 4.93, 6.76 4.33 (1.98) 5.50 (1.81) 4.46 (1.70) 5.03 (1.56) 4.07 (1.95) 4.27 (1.57) 4.30 (2.42) 4.20 (2.10) 3.59, 5.08 4.82, 6.18 3.83, 5.10 4.45, 5.61 3.34, 4.79 3.67, 4.85 3.40, 5.20 3.41, 4.99 3.90 (1.75) 4.84 (2.00) 4.27 (1.55) 5.00 (1.92) 3.51 (2.15) 4.24 (2.06) 4.58 (2.25) 5.39 (2.45) 3.29, 4.53 4.14, 5.56 3.72, 4.82 4.32, 5.68 2.75, 4.28 3.51, 4.97 3.77, 5.37 4.53, 6.26 4.00 (1.46) 4.27 (1.86) 4.83 (1.53) 4.47 (1.87) 4.13 (1.93) 4.66 (2.00) 2.80 (1.91) 3.36 (2.30) 3.45, 4.54 3.57, 4.95 4.26, 5.40 3.77, 5.16 3.41, 4.85 3.94, 5.40 2.08, 3.51 2.51, 4.23 Note. Ratings are on a 9-point scale: 1 = strongly agree/not difficult/not interesting/not anxious/not at all, and 9 = strongly disagree/very difficult/very interesting/very anxious/a lot. 91 Table 27 Task Difficulty Ratings in the Four Task Conditions (One-Way ANOVA) 4.521 .005* p .505 F .784 d .25 .401 .020 Comparison .988 3.412 Item Q1. How difficult was the narrative essay to write? Q2. How difficult was the argumentative essay to write? Q3. I did well writing the narrative essay. Q4. I did well writing the argumentative essay. Q5. How interesting was it to write the narrative essay? Q6. How interesting was it to write the argumentative essay? Q7. How anxious were you about the time pressure when writing the essays? Q8. How much did the time limit (30 minutes/60 minutes) affect your writing? *p < .0062 (Bonferroni adjustment) Note. HS: high-intermediate short-timed, HL: high-intermediate long-timed, AS: advanced short- timed, AL: advanced long-timed HS > AL HS > AL .22 .47 8.976 <.001* 7.113 <.001* .16 .17 .82 .64 .822 .484 .533 .660 .54 Table 27 presents the results of a one-way ANOVA on the difficulty ratings in the four conditions. Statistically significant differences across the four conditions (groups) were found for three out of eight items, related to the difficulty of writing argumentative essays (Q2), anxiety about time pressure (Q7), and the perception of time constraints (Q8). The responses to Q2, regarding the difficulty of argumentative essays, showed a small effect size. For this question, the post hoc comparisons using the Bonferroni test did not show significant mean differences across the groups. However, compared to the advanced long-timed group students, the high-intermediate students in both short-timed and long-timed groups felt more difficulty in writing argumentative essays. Q7 is related to how anxious the participants felt about the time pressure during the writing tasks. Statistically significant differences across the four conditions (groups) were found 92 with a medium effect size. A Bonferroni test showed that the high-intermediate short-timed group was more anxious about the time pressure than the advanced long-timed group. Compared to the advanced long-timed group, the high intermediate group felt significantly more anxious about the time pressure during the writing. Q8 concerns the participants’ perceptions of the effect of time constraints on their writing. It significantly distinguished the groups, with a medium effect size. Within the same proficiency groups, the short-timed group perceived a significantly larger effect of time constraints than the long-timed group. In addition, a significant mean difference was detected between the high- intermediate short-timed group and the advanced long-timed group. The students in the high- intermediate short-timed group believed that the allotted time had a greater effect on their writing than did those in the advanced long-timed group. 93 CHAPTER 5. DISCUSSION 5.1. Overview of research questions and results As previously described, there is a growing interest in exploring writing fluency behaviors such as pausing and revising because of concerns regarding the validity of assessments of writing fluency. Instead of focusing solely on length-based measures (i.e., product-based measures) that do not consider how the writing is produced, this study explored L2 learners’ writing fluency behaviors and the cognitive processes behind them. Contributing to and extending existing research on cognitive processes associated with pausing and revision behaviors, the study examined how different aspects of writing tasks, such as genre and time constraints, affect different proficiency L2 learners’ writing fluency behaviors and linguistic outcomes in hopes of better understanding L2 writing. To address the first research question, the study compares overall writing fluency behaviors and linguistic outcomes in two different genres when L2 learners of different English proficiency levels write under two different time constraints. The second research question is addressed by an analysis of the participants’ stimulated recall comments regarding the effects of the time constraints and the genres on their writing processes. For the third research question, the study examines how proficiency and time constraints affect writing quality (writing scores) in the two genres. The fourth research question guides the study’s exploration of which writing fluency measures are related to text quality and linguistic complexity, and to what extent. The fifth research question inquires into how the L2 learners perceived the effects of the time constraints and genres after finishing the two writing 94 tasks. Table 28 summarizes the findings of the study. In this chapter, the five research questions and then the overall contribution of the results of the dissertation research are discussed. Table 28 Summary of Findings Independent variables Time constraint Short-timed Long-timed Narrative Genre Argumentative Writing fluency behaviors The short-timed writing showed higher writing fluency Narrative writing showed higher fluency Advanced Intermediate Advanced learners showed higher writing fluency Proficiency Dependent Variables Linguistic complexity Writing quality No difference Argumentative writing showed higher complexity Advanced learners showed higher writing complexity No difference N/A Advanced learners showed higher writing quality 5.2. Research question 1: To what extent do proficiency and time constraints affect writing fluency behaviors and linguistic outcomes of L2 writers’ writing in two genres? The effect of time constraints in the current study appeared only in writing fluency behavior measures. Specifically, the short-timed groups showed more fluency and less pausing than the long-timed groups. These findings are different from those of Elder et al. (2009), who did not find an effect from their two time constraint conditions (30 minutes and 55 minutes). However, the previous study used fluency ratings by two raters instead of fluency measures, which might explain the different findings. The results of the current study also differ from those of previous studies that did find an effect of time constraints on fluency (Knoch & Elder, 2010; Wu & Erlam 2016). Wu and Erlam (2016) compared a long-timed condition and a short-timed 95 condition (70% of the time the learners used in the untimed condition) and reported that the learners produced more words in their short-timed essays. Knoch and Elder (2010) also measured fluency as the number of words and found that a long-timed group (55 minutes) showed better performance than a short-timed group (30 minutes). The discrepancy between these two studies’ findings and the current study’s results may be due to the difference in the operationalization of fluency. There was no effect of time constraints on linguistic outcomes. Previous research has reported mixed results for an effect of time constraints on linguistic complexity (Knoch & Elder, 2010; Wu & Erlam; 2016). Similar to the current study, Wu and Erlam (2016) found no difference between essays produced under two time constraints in terms of complexity and accuracy. However, Knoch and Elder (2010), who compared essays written in two time constraint conditions in terms of both grammatical complexity and lexical complexity, found that the short-timed condition (30 minutes) led to higher grammatical complexity than the long-timed condition (55 minutes) but did not find a difference in lexical complexity. Because they did not report their participants’ L2 proficiency, however, it is difficult to compare their findings and those of the current study. In addition, they used only one grammatical complexity measure (clauses per t-unit), which may not show a full picture of linguistic complexity, considering that syntactic complexity is a multidimensional construct (Norris & Ortega, 2009). The L2 learners’ writing fluency behaviors and the linguistic complexity of their essays differed depending on the genre in which they were writing. These findings corroborate previous studies’ findings of genre effects on L2 learners’ writing, and more specifically on linguistic complexity (Biber & Conrad, 2009; Lu, 2011) and fluency (Beauvais et al., 2011; Medimorec & Risko, 2017; Qin & Uccelli, 2016; Van Hell et al., 2008). In the current study, a genre effect was 96 found in one length-based measure (MLC), one particular structure measure (CN/T), two lexical sophistication measures (WL and WF), and three fluency measures (product: words per minute, P-burst length, and number of R-bursts). These measures indicated that the learners in this study showed greater linguistic complexity but less fluency in argumentative essays than in narrative essays. These findings are similar to those of some previous studies that also found greater linguistic complexity but less fluency in argumentative essays than in narratives (Beers & Nagy, 2009; Qin & Uccelli, 2016). One explanation for this pattern is that dealing with the more demanding task (i.e., argumentative essay writing) inhibits revision behavior due to the limited availability of cognitive resources (Leijten et al., 2010; Schilperoord, 2002; Van Waes et al., 2010). Taken together, these studies’ results might reflect that learners pause more when writing argumentative essays in order to engage in deeper lexical selection (i.e., searching for more sophisticated and less frequent vocabulary) as well as more complex ideas and produce more complex syntactic structures. In other words, producing language appropriate to the argumentative genre may require greater cognitive effort than producing language appropriate to the narrative genre. Hence, learners may slow their production down as they utilize more time for planning or translation, meeting the genre requirements at the expense of fluency (Beauvais et al., 2011; Kellogg, 2001). However, the findings differ partially from those of previous studies that found a genre effect on complexity but not fluency (e.g., Yoon & Polio, 2017). The difference may be due to the measurements of fluency or the L2 proficiency of the participants. Although the current study used different writing fluency behavior measures to assess fluency and reported the L2 learners’ standardized test scores and cloze-test scores, the previous studies used a traditional measure (i.e., the number of words produced in a given time) and did not use standardized test scores for 97 measuring L2 English proficiency. A contrasting result was reported by Yang (2014), who compared four genres (narrative, expository, expo-argumentative, and argumentative) with regard to complexity, accuracy, and fluency; she found higher complexity and fluency in argumentative essays than in narratives. However, she operationalized fluency as the total number of words per essay, which is a traditional length measure, whereas the present study included writing fluency behavior measures. In addition, Yang used the same cloze-test that the current study used, but her participants’ mean scores on the cloze test (argumentative group: M = 26.65; narrative group: M = 28.02) were lower than those in the current study. An interaction between time constraint and genre was found in terms of pausing between words. The L2 learners’ pausing patterns differed in the two genres and in the two time- constraint conditions. In the short-timed condition, the learners paused more between words when writing argumentative essays than when writing narrative essays, whereas in the long- timed condition, they paused between words more often when writing narratives. Drawing on Kellogg’s (1996) model, it was predicted that the L2 learners’ fluency behaviors would differ because the amount of allowed time for a task and the requirements of a task can influence how long L2 learners stay at the translation stage and how they allocate processing time and cognitive effort for planning, translating, and monitoring. In a short-timed condition, increased time pressure may prevent smooth and responsive writing behavior, particularly for argumentative writing; however, in a long-timed condition, L2 learners may pause more while producing narratives to search for elaborate lexical items, to plan the narrative’s storyline or to review their narratives as they extend the discourse with the help of extra time. Previous research found proficiency effects for both linguistic complexity (Lu, 2011; Ortega, 2003; Wolfe-Quintero et al., 1998) and fluency (Sasaki, 2004; Van Waes & Leijten, 98 2015; Way et al., 2000; Yang, 2014). In the current research, a proficiency effect was found in linguistic complexity and in two of the fluency measures (product: words per minute and process: words per minute). Advanced learners produced more words per minute, reflecting their more highly developed language skills. The quantitative results did not show a proficiency effect in revision behaviors, however (e.g., number of R-bursts). This result is dissimilar to Barkaoui’s (2016) finding that low proficiency learners revised more often than high proficiency learners. This difference may be due to participant factors. In the current study, the participants were all post-secondary students at the same university, who differed only in their English proficiency; in contrast, in Barkaoui’s study the participants were first- or second-year graduate or undergraduate students (the high group) and pre-admission students enrolled in pre-academic ESL courses (the low group). The different writing experiences of these two groups of participants may have led to their use of different revision strategies. No interaction between genre and proficiency was found in this study, although genre and proficiency individually affected writing fluency behaviors and linguistic outcomes. This differs from Jeong’s (2017) study, which found a genre bias in proficiency. Jeong reported that novice learners performed better in the narrative genre than the expository genre, while advanced proficiency learners demonstrated better performance in the expository genre than in the narrative genre, in terms of essay scores. However, the findings of the current study indicate that both high-intermediate and advanced proficiency groups appeared to have genre awareness and understand the need to write differently in different genres (e.g., Biber & Conrad, 2009; Biber et al., 2011). In addition, because comparing narrative and argumentative essay scores is akin to comparing apples and oranges, the present study instead compared fluency behaviors and 99 linguistic outcomes in the two genres, which were assessed by two different rubrics; thus, the findings did not show an interaction between genre and proficiency in terms of writing quality. 5.3. Research Question 2: As evidenced by the stimulated recall data, to what extent do proficiency and time constraints affect L2 writers’ writing process in the two genres? The stimulated recall data demonstrate that there were differences in the processes underlying pausing behaviors in the two time-constraint conditions. The learners in the short- timed group spent more time in planning, which was a driving force in enhancing fluency (Sasaki, 2000). In addition, even though both time constraint groups focused more on formulation (planning and translation) than monitoring, the long-timed groups tended to spend more time on monitoring than the short-timed groups. This behavior may have resulted in the long-timed groups’ lower fluency. This finding supports Kellogg’s (1996) model, in the sense that it suggests that the time pressure on the short-timed group limited central executive functions, leading the learners to prioritize formulation over monitoring. The stimulated recall data also show that the genres caused some differences in pausing behaviors; this finding is partly consistent with Kellogg’s model (1996). Although the distribution of stimulated recall comments about translation processes is similar in the two time conditions, the distribution of comments about planning and monitoring during pauses differs in the two genres. Overall, the L2 learners spent more time on planning and monitoring in argumentative essays than narratives. This study’s stimulated recall data show that a higher percentage of pausing comments referred to planning than to translation and monitoring across all groups when they were writing in the argumentative genre. This finding is in line with 100 previous research claims that the argumentative genre is more cognitively demanding and requires more planning than the narrative genre (Beauvais et al., 2011; Kellogg, 2001; Van Hell et al., 2008). The stimulated recall data further show that more time was spent pausing between words by the short-timed groups for the argumentative essays, and by the long-timed groups for the narratives. One possible explanation for these patterns is that the combination of time pressure and the greater cognitive demand of argumentative essays required more pauses (Kellogg, 2001). In contrast, for the long-timed group, a lack of time pressure when writing narratives might have tempted the learners to do more brainstorming to extend their writing. As for the kinds of processing the learners were doing during the pauses between words, the stimulated recall data indicate that the percentages of the various writing processes (planning, translation, and monitoring) were similar between the two genres across time constraint conditions. With regard to how their comments about revision aligned with textual locations, the learners also showed similar patterns in both genres. With regard to the overall writing processes underlying pausing and revision behaviors, Kellogg’s (1996) model suggests that writing requires lower and higher cognitive processes. Many of the participants explained the pauses they made between words and between clauses, and sometimes between sentences, with comments associated with planning. These findings are similar to those of previous research in suggesting that pausing at higher text units, such as sentences, is more likely to be related to higher-level writing processes, such as planning (Révész, Kourtali, & Mazgutova, 2017; Schilperoord, 1996). Most of the learners’ comments related to translation were at the word and below the word level. With regard to how their comments about revision aligned with textual locations, the learners also showed similar patterns in both genres. Most of the learners’ 101 comments related to translation were at the word and below the word level. These patterns may be similar to their pausing behaviors in that they suggest the writers focused on retrieving lexical items or syntactic structures at this smaller discourse unit level. Similar to previous research (e.g., Stevenson et al., 2006), the current study’s stimulated recall data on writing behaviors, such as pausing and revision, also found proficiency differences. In writing the narratives, the advanced learners made more translation-related comments at the word and below the clause level than did the high-intermediate learners. In the argumentative essays, the advanced learners again made more translation-related comments at the word level than did the high-intermediate learners. Compared to the high-intermediate learners, the advanced learners also showed more translation-related revision behaviors at the word and below the clause levels. Considering that the quantitative results showed that the advanced learners produced more syntactic complexity with greater fluency than the high- intermediate learners, the advanced learners may have focused on refining syntactic structures during revision processes at the word or clause level while writing (Stevenson et al., 2006). As for the writing processes underlying the participants’ revision behaviors, only proficiency had a notable effect on them; time constraints and genres did not seem to affect the writing processes underlying the revision behaviors. These results are possibly due to the specific aspect of revision in question, in that the learners tended to focus mainly on a refining process that may be affected more by proficiency than by other factors. According to their comments, the advanced learners tended to spend more time on translation processes (about 60–70%) than did the high-intermediate learners (about 40–50%), as a larger number of pausing and revision comments about lexical retrieval, syntactic encoding, and cohesion were made by the advanced learners than by the high-intermediate learners. Possibly, the amount of engagement in 102 translation processes underlying revision behaviors might have contributed to the differences in production at the two proficiency levels. This finding is dissimilar to the findings of previous L2 research that utilized keystroke logging (e.g., Barkaoui, 2016; Stevenson et al., 2006). Barkaoui (2016) found that low proficiency learners made significantly more revisions than high proficiency learners, and Stevenson et al. (2006) did not find differences between their two proficiency groups; however, they divided the two groups by relative proficiency instead of using standardized scores. These findings in the current study are similar to those of Sasaki (2000), who observed that expert writers spend more time on rhetorical refining than novices do. In the present study, the advanced learners devoted more time to translation processes including retrieving words, syntactic encoding, and cohesion than did the high-intermediate learners. 5.4. Research Question 3: How do L2 proficiency and time constraints affect writing quality in two essay genres? Only proficiency had a significant effect on writing quality; this was true for both the narrative and argumentative essays. Previous studies have also found proficiency effects on quality (e.g., Jeong, 2017; Xu & Ding, 2014). Possibly, more advanced learners’ greater ability to manage the various necessary writing processes allows them to produce higher-quality writing according to all five scales, that is, content, organization, vocabulary, language use, and mechanics (Chenoweth & Hayes, 2001). In the current study, the stimulated recall data show that, compared to the high-intermediate learners, the advanced learners engaged relatively more in translation processes than in planning processes. This finding suggests that greater English proficiency may enable learners to pay more attention to form during writing. In addition, for the 103 high-intermediate learners, who can be assumed to have had less L2 experience than the advanced learners, retrieving lexical items likely required more effort. Thus, in addition to the fact that advanced learners know more language than high-intermediate learners, the extent to which learners engage in different writing processes may affect writing quality (Kellogg, 1990). In contrast to L2 proficiency, the time constraints did not affect writing quality. The findings of this study are similar to those of other previous studies that did not find significant effects of time constraints on writing quality (Caudery, 1990; Elder et al., 2009; Knoch & Elder, 2010; Powers & Fowles, 1996). Based on the current study’s stimulated recall data, the short- timed and long-timed groups did not employ noticeably different writing processes in the two different time conditions, except for in planning and monitoring processes, and these differences were not reflected in writing quality. However, these results are different from those of prior studies that have found longer-timed groups to produce higher quality writing (Hale, 1992; Wu & Erlam, 2016). Hale (1992) suggested that the addition of 15 minutes increased mean scores by one-third of the standard deviation; however, it is unclear whether Hale’s results demonstrate that adding 15 minutes actually contributed to increases in mean scores. Wu and Erlam (2016) compared rated scores on task achievement, coherence and cohesion, lexical variation, grammatical range and accuracy, and overall quality between two time conditions. They found a slightly significant difference (p = .04) only in task achievement (content), which implies that time constraints did not affect writing quality much in their study. 104 5.5. Research Question 4: Which fluency measures are related to text quality and linguistic complexity, and to what extent? Writing fluency measures were found to be related to writing quality in both genres. These results are similar to previous research results that have shown a relationship between writing fluency and quality (e.g., Barkaoui & Knouzi, 2018; Beauvais et al., 2011; Spelman et al., 2008; Stevenson et al., 2006). There are, however, some key differences. Namely, unlike the current study, Barkaoui and Knouzi (2018) operationalized fluency as the number of words, and Spelman et al. (2008) found a relationship between text length and quality. In the current study, writing fluency measures including pausing behaviors underlying different writing processes were positively associated with writing quality. However, no relationship between revision behaviors and writing quality was found. One possible implication of these results is that the extent to which learners have automatized their writing processes, such as how rapidly they can retrieve vocabulary from long-term memory, may affect writing quality (Kellogg, 1990). Nevertheless, based on the relationship between writing fluency and quality, it suffices to say that writing fluency could be good indicators of L2 learners’ writing quality. Writing quality was best predicted by different fluency measures depending on genre. The findings are similar to those of Qin and Uccelli (2016), who used length and lexical, syntactic, and discourse features to see which measures predicted writing quality in narrative and argumentative essays, and found length to be the most predictive of quality. In this study, in the narrative essays, writing quality was best predicted by an increase in product: words per minute and R-burst length; however, in the argumentative essays, writing quality was best predicted by an increase in process: words per minute and a decrease in one revision measure (i.e., the number 105 of R-bursts). Therefore, both this study and Qin and Uccelli’s indicate that fluency measures predict writing quality differently in the two genres. As was expected based on previous research that showed the relationship between complexity and fluency (e.g., Foster & Skehan, 1996; Oh, 2006), the fluency measures were related to the complexity measures in the present study. However, the relationship between fluency measures and complexity measures differed depending on genre. Again, this is similar to Qin and Uccelli’s findings (2016). The results confirmed that the complexity and fluency constructs can measure different dimensions of L2 performance in different writing tasks (Housen & Kuiken, 2009; Housen et al., 2012). In addition, the correlations found in this study confirm the assumptions that writing fluency behaviors are related to linguistic complexity, and indicate that these constructs may share an underlying dimension (Medimorec & Risko, 2017). 5.6. Research Question 5: How do L2 writers perceive the effects of time constraints and genre on their writing? As for the learners’ perceptions of the effect of the time constraints, more than half of the learners in the long-timed groups considered the time enough for both genres, which echoes the findings of the previous research (e.g., Knoch & Elder, 2010; Powers & Fowles, 1996). And while they also largely perceived the short-timed conditions as insufficient, differences between the two time-constraint groups were detected only in writing fluency. No difference arising from the time constraints was found in linguistic complexity or writing quality. In other words, the shorter time seemed to elicit more fluent language without negatively affecting linguistic outcomes and writing quality. 106 The students in the high-intermediate short-timed group believed that the allotted time affected their writing more than those in the advanced long-timed group. For high-intermediate learners, time pressure may increase anxiety (Weigle, 2002), which in turn could affect their writing performance. However, as the comparison between high-intermediate short-timed and long-timed groups showed, differences in linguistic complexity and quality arising from the time constraints were minimal. With regard to the perceived difficulty of writing essays in the two genres, more than half of the learners in both proficiency groups perceived the argumentative essay to be more difficult to write than the narrative, as previous studies have suggested (Ruiz-Funes, 2014, 2015). This is presumably due to the greater cognitive demands of the argumentative essay and the different functional demands of the two genres (Biber et al., 2011; Leijten et al., 2010; Van Waes et al., 2010). For this study, it is difficult to tease apart possible effects of the functional demands versus the cognitive demands of genres on the learners’ perceptions of difficulty (e.g., Yoon & Polio, 2017). The survey results suggested that the learners in both proficiency groups tried to use more sophisticated vocabulary and structures in argumentative essays than in narratives. The findings of higher linguistic complexity in the argumentative essays than in the narratives and of lower fluency in the argumentative essays than in the narrative essays align with these survey results. 107 5.7. Contributions of this dissertation 5.7.1. Understanding time constraints In this study, time constraints had minimal effects on L2 writing products in terms of linguistic complexity and writing quality. Providing L2 learners with extra time to plan and edit their language was expected to make a difference, but the additional 30 minutes of the long- timed condition did not contribute to increased complexity or quality. Although this study could not provide the learners with unlimited time for logistical reasons, the long-timed condition doubled the short-timed condition in order to mimic the untimed conditions in which academic writing is typically done. Among the previous studies (e.g., Knoch & Elder, 2010) that did not find significant differences between two time-constraint conditions as the current study does, Caudery (1990) provided a range of possible explanations for his null findings; some of these explanations may help understand the findings of the current study. Among the possible explanations he suggested, the participants’ level of training in writing skills may have contributed to the finding of no difference in the linguistic complexity and quality of their writing. Because all of the L2 learners in the current study had received sufficient training to have gotten high standardized test scores for writing, they clearly had had much practice in writing timed essays; in short, it is possible that the 30 extra minutes did not lead to differences in linguistic complexity and writing quality because the learners had practiced short-timed writing and knew how to use various strategies for writing under time constraints. Although the two distinct time-constraint groups showed no differences in linguistic complexity and quality, effects of time constraints on the writing process were seen in the fluency measures and stimulated recall. Different time constraints led to differences in how the 108 L2 learners planned and edited their writing, which were reflected in the writing fluency behavior measures. The L2 learners in the short-timed group showed more fluent writing behavior than those in the long-timed group. The stimulated-recall data revealed differences in the processes underlying pausing behaviors in the two time-constraint conditions. The learners in the short-timed conditions more often used their pauses to plan their writing than the learners in the long-timed conditions did; this additional planning may have allowed the learners in the short-timed condition to use less cognitive effort for transcription, which in turn may have led to the higher fluency in the short-timed condition than in the long-timed condition. In other words, the difference in writing processes seemed to contribute to the difference in writing fluency behaviors. 5.7.2. Understanding fluency: Genre effects and fluency’s relationship with complexity and writing quality The genre effect was evident in linguistic complexity; genre effects showed up in both writing processes and writing products as detected through fluency measures and stimulated recall. In addition to the effects of genre on linguistic complexity, which previous research also has found, the L2 learners’ writing fluency behaviors differed in the two genres in the current study. The differences arising from complexity and fluency were confirmed by the stimulated recall data. When the learners wrote argumentative essays, their planning-related pauses were more frequent than when they wrote narratives. During such pauses, the learners planned the content and organization of their writing, which are especially important in argumentative 109 essays. Thus, specific pausing behaviors and the processes in which learners engage during the pauses may contribute to differences in complexity and fluency in the two genres. The current study found that writing fluency was related to writing quality in both genres. Specifically, process: words per minute, product: words per minute, and p-burst length were significantly related to writing quality in the two genres. Many other empirical studies (e.g., Beers & Nagy, 2009; Qin & Uccelli, 2016) have also investigated the relationship between linguistic features and writing quality in different genres, but they considered writing fluency as length of writing. However, considering fluency to be a multidimensional construct, the current study tried to better elucidate the relationship of writing fluency and quality by adding process- based measures instead of looking at only one length-based measure. Based on the results of the different fluency measures, the study suggests that writing fluency features can be indicative of writing quality. A relationship between fluency measures and complexity measures was also found. As one of the CAF measures, fluency is believed to relate to complexity and accuracy. The current study provides empirical evidence for the relationship between the two constructs of complexity and fluency in both of the two genres. Oh (2006) empirically tested the relationship between the two constructs, but she operationalized fluency as the number of T-units and the number of clauses and examined only argumentative essays in testing settings. This study, unlike such previous research, looked at the fluency construct multidimensionally by employing both process-based and product-based measures to examine the relationship between the fluency and complexity constructs in the two genres. L2 learners’ perceptions of genres can also play a role in their writing processes and products. More than half of the learners in both proficiency groups in this study perceived the 110 argumentative essay to be more difficult to write than the narrative essay. The reasons behind these differences in perceived difficulty are mainly due to the structure and the language of argumentative writing. The learners had learned genre differences from English academic writing classes, but they still struggled with the specific requirements of argumentative writing, such as the need to provide clear arguments, supporting ideas, and appropriate examples. In addition, the survey suggested that the learners tried to use more sophisticated vocabulary and structures in argumentative essays than in narrative essays. Their different perceptions of the difficulty of the two genres seemed to be reflected in their writing processes and products: higher linguistic complexity and lower writing fluency behaviors in argumentative essays than in narratives. 111 CHAPTER 6. CONCLUSION 6.1. Summary This dissertation sought to advance the L2 writing research on writing fluency behaviors, and sheds light on the interplay of time constraints, genre, and proficiency in L2 writing fluency behaviors and linguistic outcomes. The study reached several conclusions. First, L2 learners produced more complex language and showed less fluent writing behaviors in argumentative than in narrative essays. A significant interaction between genre and time condition was found in the number of pauses between words. A proficiency effect was found in linguistic complexity and fluency, while a time constraint effect was detected only in fluency and writing fluency behaviors. Second, the stimulated recall data indicate that the learners tended to spend more time on planning and monitoring in the argumentative genre than the narrative genre. For the time constraint comparisons, the short-timed groups did more planning than the long-timed groups, whereas the long-timed groups tended to spend more time on monitoring than the short-timed groups. The advanced learners’ comments about pausing and revision are more associated with translation than the intermediate learners’ for both narrative and argumentative essays. As for writing processes according to locations, higher textual units are associated with higher level processes. Third, L2 proficiency affected writing quality in both genres; however, the difference in time conditions (30 minutes vs. 60 minutes) did not affect writing quality. Fourth, writing fluency measures were correlated with linguistic complexity and writing quality; however, these correlations differed by genre. Writing fluency and revision behaviors can predict writing quality in both genres. Fifth, more than half of the learners in each group perceived argumentative essays as more difficult to write than narratives. The learners perceived the time constraints 112 differently; more than half of the learners in the long-timed group considered the time allotment enough for both genres. Compared to learners in the advanced long-timed group, learners in the high-intermediate short-timed group felt more anxiety and believed more strongly that writing time affected their writing quality. 6.2. Theoretical, methodological, and pedagogical implications From a theoretical perspective, the results shed light on how different genres and time constraints affect different proficiency learners’ writing fluency behaviors and linguistic outcomes, in terms of process and production. Based on Kellogg’s (1996) writing model, this study provided empirical evidence as to how L2 learners of different proficiencies may show different cognitive processes underlying writing behaviors, writing fluency behaviors, language complexity, and text quality when writing in different genres under varying time constraints. This study was intended to help explain cognitive processes associated with writing behaviors in different writing genres and time constraints. In addition, previous research has tended to focus on production differences associated with genres, time constraints, and proficiency. However, this study delved into what leads to these differences by investigating the writing fluency behaviors underlying writing processes in addition to investigating production. With regard to methodological implications, this dissertation research included keystroke logging to unobtrusively capture L2 writing behaviors such as pausing and revising. Along with keystroke logging and automatic textual analysis, the study also employed stimulated recall protocols to enable cognitive-linguistic analysis of writing processes. As it examined writing processes and products multidimensionally, the study used a combination of research methods in 113 order to achieve more valid and accurate interpretations of different proficiency L2 learners’ writing processes when they responded to different genre prompts under different time constraints. This study holds pedagogical and assessment implications. With respect to L2 writing instruction, teachers tend to present different genres to students and set different time constraints for assignments. In this study, the differences that arise from genres and time constraints were explained in terms of L2 learners’ writing processes and production. For instance, in writing argumentative essays, the L2 learners in this study made more planning-related comments than translation- and monitoring-related comments when they were explaining why they paused. These findings indicate that learners may benefit from learning about different planning strategies for writing argumentative essays. In particular, the findings of this study are crucial for test developers and teachers for designing writing tests and assignments. The findings showed that the effects of time constraints (30 minutes vs. 60 minutes) on the written product, in term of quality and language, were not significant; however, the students felt more anxiety in writing short-timed essays than long-timed essays. Moreover, keystroke logging and replaying keystroke logging can provide teachers with insights when diagnosing students’ difficulties in writing. For instance, information obtained from keystroke logging and surveys together can help teachers understand which time constraints or tasks are appropriate for their students at different levels. In addition, from the learners’ perspective, as Ranalli et al. (2018) demonstrated, keystroke logging can also give students information about their writing process. As they read their own keystroke logging information, learners can become more aware of the cognitive processes underlying their writing fluency behaviors. 114 6.3. Limitations and future research The limitations of this study should be acknowledged. As described in the method section, the advanced learners showed better keyboarding skills than the high-intermediate learners. It is clear that proficiency affected their writing processes and products based on the results of the study such as those regarding writing quality; however, keyboarding skill differences in the two English proficiency groups might also have contributed to writing fluency behavior differences (e.g., Barkaoui & Knouzi, 2018). In addition, this study manipulated time constraints as 30-minute and 60-minute conditions. For logistical reasons, the longer-timed condition was used to mimic an untimed condition; however, this manipulation may lack authenticity. Based on the survey, some of the learners in the long-timed groups still felt the allotted time was not enough for writing in either genre. Following previous research (e.g., Révész, Kourtali, & Mazgutova, 2017; Spelman Miller et al., 2008), this study used a threshold of two seconds for determining pauses. However, some researchers have suggested that different thresholds for pauses such as 200 milliseconds or 500 milliseconds might capture different dimensions of writing fluency behaviors such as lower levels of writing processes (Van Waes & Leijten, 2015; Wengelin, 2006). Future research should examine changes in writing fluency behaviors and linguistic outcomes longitudinally (e.g., Spelman Miller et al., 2008). This study used a within-groups design for genre and a between-groups design for exploring proficiency and time constraint effects. The study found time constraint, genre, and proficiency effects on writing fluency behaviors and linguistic outcomes. However, if a study investigated how learners write in the 115 two genres under different time constraints over time, the findings might be different from those of this study, because learners can be expected to develop their L2 over time. In addition, although the current study showed the relationship between complexity and fluency, it is open to question whether accuracy, one of the CAF measures, is related to complexity and fluency. The current study did not include accuracy measures in the analysis because the measure may not be particularly useful for assessing L2 development or differentiating learners by proficiency (e.g., Lambert & Kormos, 2014). However, for the purpose of theory building, it may be useful to add accuracy measures to shed light on the relationship between accuracy and fluency in writing. Some recent studies have employed eye-tracking technology as well as stimulated recall and key-stroke logging (e.g., Ranalli et al., 2018; Révész et al., in press). Eye-tracking methods might uncover other cognitive processes underlying writing behaviors. However, the low frequency eye-trackers such as Tobii 60x, which usually do not hamper natural writing behaviors, are less accurate than high frequency eye-trackers such as Eyelink 1000, and the data from the eye-trackers are messy. Although eye-trackers such as Eyelink 1000 are very accurate in assessing learners’ saccades during writing, it is almost impossible to get participants to act naturally because they need to keep their heads still on a chin-rest to assure high tracking accuracy. Nevertheless, when highly accurate eye-tracking technology that does not intervene in the natural writing process becomes available, it will be helpful for future investigations of learners’ writing processes. 116 APPENDICES 117 APPENDIX A: Prompts for the Narrative and the Argumentative Essays (Yoon, 2017) Narrative prompt: Your friend has plans to learn a foreign language but is afraid it might be useless to spend the time learning a language. You have successfully learned a foreign language and use it often. You want to show your friend that language learning and use can be interesting by telling him/her about your positive experience. Tell a story about one of your positive experiences related to foreign language use. Be sure to fully develop your story by including specific details. Argumentative prompt: You attended a seminar and the main theme was that using a foreign language fluently has become necessary in this globalized era. Write an essay about whether you agree or disagree with the statement about the necessity of foreign language abilities. Support your position with reasons. Be sure to fully develop your essay by including clear explanations and logical supporting ideas. 118 APPENDIX B: Cloze Test and Answer Key (Yang, 2014) DIRECTIONS 1. Read the passage quickly to get the general meaning. 2. Write only one word in each blank next to the item number. Contractions are considered to be one word. 3. Check your answers. You have 25 minutes to complete the cloze test. EXAMPLE: The boy walked up the street. He stepped on a piece of ice. He fell (1) down but he didn’t hurt himself. MAN AND HIS PROGRESS Man is the only living creature that can make and use tools. He is the most teachable of living beings, earning the name of Homo sapiens. (1) ever restless brain has used the (2) and the wisdom of his ancestors (3) improve his way of life. Since (4) is able to walk and run (5) his feet, his hands have always (6) free to carry and to use (7) . Man’s hands have served him well (8) his life on earth. His development, (9) can be divided into three major (10) , is marked by several different ways (11) life. Up to 10,000 years ago, (12) human beings lived by hunting and (13) . They also 119 picked berries and fruits, (14) dug for various edible roots. Most (15) , the men were the hunters, and (16) women acted as food gatherers. Since (17) women were busy with the children, (18) men handled the tools. In a (19) hand, a dead branch became a (20) to knock down fruit or (21) for tasty roots. Sometimes, an animal (22) served as a club, and a (23) piece of stone, fitting comfortably into (24) hand, could be used to break (25) or to throw at an animal. (26) stone was chipped against another until (27) had a sharp edge. The primitive (28) who first thought of putting a (29) stone at the end of a (30) made a brilliant discovery: he (31) joined two things to make a (32) useful tool, the spear. Flint, found (33) many rocks, became a common cutting (34) in the Paleolithic period of man’s (35) . Since no wood or bone tools (36) survived, we know of this man (37) his stone implements, with which he (38) kill animals, cut up the meat, (39) scrape the skins, as well as (40) pictures on the walls of the (41) where he lived during the winter. (42) the warmer seasons, man wandered on (43) steppes of Europe without a fixed (44) , always foraging for food. Perhaps the (45) carried nuts and berries in shells (46) skins or even in light, woven (47) . Wherever they camped, the primitive people (48) fires by striking flint for sparks (49) using dried seeds, moss, and rotten (50) for tinder. With fires that he kindled himself, man could keep wild animals away and could cook those that he killed, as well as provide warmth and light for himself. Answer keys 120 "Man and his progress" - answer keys Exact answer Acceptable answer scoring would also include these possibilities 1 His man's, our, the 2 Knowledge, accomplishments, culture, cunning, examples, experience(s), hands, ideas, information, ingenuity, instinct, intelligence, mistakes, nature, power, skill(s), talent, teaching, technique, thought, will, wit, words, work 3 to 4 man, he 5 on, upon, using, with 6 been, felt, hung, remained 7 tools, adequately, carefully, conventionally, creatively, diligently, efficiently, freely, implements, objects, productively, readily, them, things, weapons 8 during, all, for, improving, in, through, throughout, with 9 which, also, basically, conveniently, easily, historically, however, often, since, that, thus 10 periods, areas, categories, divisions, eras, facets, groups, parts, phases, sections, stages, steps, topics, trends 11 of, for, in, through, towards 12 all, early, hungry, many, most, only, primitive, the, these 13 fishing, farming, foraging, gathering, killing, scavenging, scrounging, sleeping, trapping 14 and, often, ravenously, some, they 15 often, always, emphatically, important, nights, normally, of, times, trips 121 16 the, all, house, many, most, older, their, younger 17 the, all, many, married, most, often, older, primate, these 18 the, all, constructive, many, most, older, primate, tough, younger 19 man's, able, big, closed, coordinated, creative, deft, empty, free, human('s), hunter's, learned, needed, needy, person's, right, single, skilled, skillful, small, strong, trained 20 tool, club, device, instrument, pole, rod, spear, stick, weapon 21 dig, burrow, excavate, probe, search, test 22 bone, arm, easily, foot, head, hide, horn, leg, skull, tail, tusk 23 sharp, big, chipped, fashioned, flat, hard, heavy, large, rough, round, shaped, sizeable, small, smooth, soft, solid, strong, thin 24 the, a, his, man's, one('s) 25 nuts, apart, bark, bones, branches, coconuts, down, firewood, food, heads, ice, items, meat, objects, open, rocks, shells, sticks, stone, things, tinder, trees, wood 26 one, a, each, flat, flint, glass, hard, obsidian, shale, softer, some, the, then, this 27 it, each, one, they 28 man, being, creature, human, hunter, men, owner, people, person 29 sharp, glass, hard, jagged, large, lime, pointed, sharpened, small 30 stick, bone, branch, club, log, pole, rod, shaft 31 had, accidentally, cleverly, clumsily, conveniently, creatively, dexterously, double, easily, first, ingeniously, securely, simply, soon, suddenly, tastefully, then, tightly, would 32 very, bad, extremely, good, hunter's, incredibly, intelligent, long, modern, most, necessarily, new, portentously, quite, tremendously, useful 33 in, all, among, amongst, by, inside, on, that, using, within 122 34 tool, device, edge, implement, instrument, item, material, method, object, piece, practice, stone, utensil 35 development, age, ancestry, discoveries, era, evolution, existence, exploration, history, life, time 36 have, actually, apparently, ever 37 by, and, for, from, had, made, through, used, using 38 could, did, would 39 and, carefully, help, or, skillfully, then, would 40 draw, carve, create, drawing, engrave, hang, paint, painting, place, sketch, some, the 41 cave(s), animals, place(s), room 42 in, and, during, with 43 the, across, aimless, all, barren, dry, flat, high, in, long, many, plain, stone, through, to, toward, unknown, various 44 home, appetite, camp, course, destination, destiny, diet, direction, domain, foundation, habitat, income, knowledge, location, lunch, map, meal, path, pattern, place, plan, route, supplement, supply, time, weapons 45 women, children, families, group, human, hunter, man, men, people, primitives, voyager, wanderers, woman 46 or, and, animal, animal's, covered, in, like, of, on, their, using, with 47 baskets, bags, blankets, chests, cloth(s), clothes, fabric, garments, hides, material, nets, pouches, sacks 48 made, began, built, lighted, lit, produced, started, used 49 and, also, by, occasionally, or, then, together, while 123 50 wood, bark, branches, dung, forage, grass, leaves, lumber, roots, skin, timber, tree(s) 124 APPENDIX C: Timed Key-boarding Skill Test Write the sentence below as many times as you can for two minutes. I voluntarily agree to participate in this writing research. 125 APPENDIX D: Language Experience and Proficiency Questionnaire (Marian et al., 2007) Name Age Date Gender Please list all the languages you know in order of dominance: English is my ____ language. (insert ordinal number: 1st, 2nd, and so on) All questions below refer to your knowledge of English. Please list the number of years and months you spent in each language environment: Years An English-speaking country Months Please provide the following information about your TOEFL/IELTS/TOEIC: Test: Date taken: Total score: 126 APPENDIX E: Exit Questionnaire (Adapted from Yoon, 2017) 1. How did you feel about writing narrative and argumentative essays? Is one type of essay writing more difficult than the other (in terms of brainstorming/planning, writing, and revising)? Why? Please explain. 2. How difficult was the narrative essay to write? (Not difficult at all) 1-2-3-4-5-6-7-8-9 (Very difficult) 3. How difficult was the argumentative essay to write? (Not difficult at all) 1-2-3-4-5-6-7-8-9 (Very difficult) 4. I did well writing the narrative essay. (Strongly Agree) 1-2-3-4-5-6-7-8-9 (Strongly disagree) 5. I did well writing the argumentative essay. (Strongly Agree) 1-2-3-4-5-6-7-8-9 (Strongly disagree) 6. How interesting was it to write the narrative essay? (Very interesting) 1-2-3-4-5-6-7-8-9 (Not interesting) 7. How interesting was it to write the argumentative essay? (Very interesting) 1-2-3-4-5-6-7-8-9 (Not interesting) 8. How anxious were you about the time pressure when writing the essays? (Not anxious at all) 1-2-3-4-5-6-7-8-9 (Very anxious) 9. How much did the time (30 minute/1 hour) affect your writing? (Not at all) 1-2-3-4-5-6-7-8-9 (A lot) 10. Do you think the time allotted was enough to write essays (both genres)? Please explain. 127 APPENDIX F: Stimulated Recall Protocol (Barkaoui, 2015 and Gass & Mackey, 2017) As we watch the video, I’ll be asking you questions about what you were doing. At times I’ll even stop the video so we can examine a word choice, a revision and so forth. As you watch your writing unfold, try to recall what you were thinking at the time; try to put your mind back into the task. Anytime you remember something, say it. Interrupt me, stop the video if you want. I am interested in finding out what you were thinking when you were writing, and it doesn’t matter at all to me if those thoughts were silly or profound. Again, I would like you to tell me what you were thinking when you were completing the task, NOT what you are thinking now. I will audio-record our conversation so I don’t have to divide my attention by taking notes. Open-ended questions will be used: • What were you thinking at this point? • • • Is there anything else that comes to your mind? I see you stopped writing. What were you thinking then? I see you changed the text. Can you tell me what you were thinking then? • Can you tell me your thoughts when you paused (or made a change)? 128 APPENDIX G: Argumentative Essay Rubric (Connor-Linton & Polio, 2014) 20 16 15 11 10 6 5 0 Content Thorough and logical development of thesis Substantive and detailed No irrelevant information Interesting A substantial number of words for amount of time given Good and logical development of thesis Fairly substantive and detailed Almost no irrelevant information Somewhat interesting An adequate number of words for the amount of time given Some development of thesis Not much substance or detail Some irrelevant information Somewhat uninteresting Limited number of words for the amount of time given No development of thesis No substance or details Substantial amount of irrelevant information Completely uninteresting Very few words for the amount of time given Organization Excellent overall 20 organization Clear thesis statement Substantive introduction and conclusion Excellent use of transition word Excellent connections between paragraphs Unity within every 16 paragraph Good overall organization Clear thesis statement Good introduction and conclusion Good use of transition words Good connections between paragraphs Unity within most paragraphs 15 11 10 6 5 0 Some general coherent organization Minimal thesis statement or main idea Minimal introduction and conclusion Occasional use of transitions words Some disjointed connections between paragraphs Some paragraphs may lack unity No coherent organization No thesis statement or main idea No introduction and conclusion No use of transition words Disjointed connections be- tween paragraphs Paragraphs lack unity 20 16 15 11 10 6 5 0 Language Use Score/ 2 Mechanics No major errors in word order or complex structures No errors that interfere with comprehension Only occasional errors in morphology Frequent use of complex sentences Excellent sentence variety Occasional errors in awkward order or complex structures Almost no errors that interfere with comprehension Attempts, even if not completely successful, at a variety of complex structures Some errors in morphology Frequent use of complex sentences Good sentence variety Errors in word order or complex structures Some errors that interfere with comprehension Frequent errors in morphology Minimal use of complex sentences Little sentence variety Serious errors in word order or complex structures Frequent errors that interfere with comprehension Many error in morphology Almost no attempt at complex sentences No sentence variety 20 16 15 11 10 6 5 0 Appropriate layout with indented paragraphs No spelling errors No punctuation errors Appropriate layout with indented paragraphs No more than a few spelling errors in less frequent vocabulary No more than a few punctuation errors Appropriate layout with most paragraphs indented Some spelling errors in less frequent and more frequent vocabulary Several punctuation errors No attempt to arrange essay into paragraphs Several spelling errors even in frequent vocabulary Many punctuation errors Vocabulary Very sophisticated vocabulary Excellent choice of words with no errors Excellent range of vocabulary Idiomatic and near native-like vocabulary Somewhat sophisticated vocabulary Attempts, even if not completely successful, at sophisticated vocabulary Good choice of words with some errors that don’t obscure meaning Adequate range of vocabulary but some repetition Approaching academic register Unsophisticated vocabulary Limited word choice with some errors obscuring meaning Repetitive choice of words No resemblance to academic register Very simple vocabulary Severe errors in word choice that often obscure meaning No variety in word choice No resemblance to academic register 20 16 15 11 10 6 5 0 129 APPENDIX H: Narrative Rubric (Adapted from Connor-Linton & Polio, 2014) 20 16 15 11 10 6 5 0 Content Thorough and logical development of storyline Vivid and detailed No irrelevant information Interesting A substantial number of words for amount of time given Good and logical development of storyline Fairly vivid and detailed Almost no irrelevant information Somewhat interesting An adequate number of words for the amount of time given Unity within every paragraph Excellent overall organization Clear sequence of events and topic Clear sense of beginning and end Excellent use of transition word Organization 20 16 15 11 Unity within most paragraphs Good overall organization Good sequence of events and topic Good sense of beginning and end Good use of transition words Some development of storyline Not much vividness or detail Some irrelevant information Somewhat uninteresting Limited number of words for the amount of time given No development of storyline No vividness or details Substantial amount of irrelevant information Completely uninteresting Very few words for the amount of time given 10 6 5 0 Some paragraphs may lack unity Some general coherent organization Limited sequence of events or topic Limited sense of beginning and end Occasional use of transitions words Paragraphs lack unity No coherent organization No sequence of events or topic No sense of beginning and end No use of transition words 20 16 15 11 10 6 5 0 Vocabulary Very sophisticated vocabulary Excellent choice of words with no errors Excellent range of vocabulary Idiomatic and near native-like vocabulary Somewhat sophisticated vocabulary Attempts, even if not completely successful, at sophisticated vocabulary Good choice of words with some errors that don’t obscure meaning Adequate range of vocabulary but some repetition Unsophisticated vocabulary Limited word choice with some errors obscuring meaning Repetitive choice of words Very simple vocabulary Severe errors in word choice that often obscure meaning No variety in word choice 20 16 15 11 10 6 5 0 Language Use Score/ 2 Mechanics No spelling errors No punctuation errors No more than a few spelling errors in less frequent vocabulary No more than a few punctuation errors Some spelling errors in less frequent and more frequent vocabulary Several punctuation errors Several spelling errors even in frequent vocabulary Many punctuation errors No major errors in word order or complex structures No errors that interfere with comprehension Only occasional errors in morphology Frequent use of complex sentences Excellent sentence variety Occasional errors in awkward order or complex structures Almost no errors that interfere with comprehension Attempts, even if not completely successful, at a variety of complex structures Some errors in morphology Frequent use of complex sentences Good sentence variety Errors in word order or complex structures Some errors that interfere with comprehension Frequent errors in morphology Minimal use of complex sentences Little sentence variety Serious errors in word order or complex structures Frequent errors that interfere with comprehension Many error in morphology Almost no attempt at complex sentences No sentence variety 20 16 15 11 10 6 5 0 130 APPENDIX I: Reasons for Pausing and Revision: Summary of Stimulated Comments Table I-1. Number of comments for pausing in stimulated recalls (high intermediate short timed group) Translation Monitoring No Planning Content Organization Total Cohesion Total Unspecified Total 0 (0 %) 0 (0 %) 26 (25 %) 1 (0 %) 5 (5 %) 9 (8 %) 1 (0 %) 4 (4 %) 32 (31 %) 14 (13 %) 1 (1 %) 0 (0 %) 21 (24 %) 1 (1 %) 3 (3 %) 2 (2 %) 0 (0 %) 1 (1 %) 25 (28 %) 4 (4 %) recall 1 (0 %) 6 (6 %) 0 (0 %) 0 (0 %) 7 (7 %) 0 (0 %) 1 (1 %) 4 (4 %) 0 (0 %) 5 (6 %) 0 4 1 0 5 0 8 0 0 8 1 (0 %) 60 (58 %) 34 (33 %) 9 (9 %) 104 (100 %) 1 (1 %) 48 (54 %) 27 (30 %) 13 (15 %) 89 (100 %) Narrative (N =2) Within words Between words Between clauses Between sentences Total Argumentative (N =2) Within words Between words Between clauses Between sentences Total 0 21 16 2 39 0 15 14 9 38 0 2 3 2 7 0 2 4 3 9 0 0 1 0 1 0 2 0 0 2 Lexical retrieval Syntactic encoding 0 (0 %) 0 23 (22 %) 19 (18 %) 22 3 4 (3 %) 0 46 (44 %) 25 0 (0 %) 1 17 (19 %) 18 (20 %) 12 (13 %) 47 (53 %) 15 2 0 18 0 4 1 1 6 0 4 1 0 5 131 Table I-2. Number of comments for revision in stimulated recalls (high intermediate short timed group) Translation Planning Content Organization Total Cohesion Total Unspecified Total No recall 0 2 2 1 0 5 0 1 0 0 0 1 1 (2 %) 0 1 (2 %) 20 (34 %) 9 (15 %) 1 (2 %) 6 0 (0 %) 3 1 (2 %) 0 (0 %) 0 2 (3 %) 24 (40 %) 21 (36 %) 10 (17 %) 0 (0 %) 0 (0 %) 0 2 (3 %) 2 (3 %) 9 31 (53 %) 2 (4 %) 15 (29 %) 5 (10 %) 1 (2 %) 2 0 (0 %) 0 1 (2 %) 3 3 (6 %) 1 (2 %) 1 59 (100 %) 3 (6 %) 21 (40 %) 13 (25 %) 10 (19 %) 0 (0 %) 0 (0 %) 2 5 (10 %) 25 (48 %) 3 (6 %) 8 52 (100 %) Narrative (N =2) Below the word At the word level Below the clause level At the clause level or above At the sentence level or above Total Argumentative (N =2) Below the word At the word level Below the clause level At the clause level or above At the sentence level or above Total 0 1 5 7 0 13 1 2 5 5 0 13 0 0 0 2 2 4 0 0 0 0 3 3 Lexical retrieval 1 8 5 0 (0 %) 1 (2 %) 5 (8 %) 9 (15 %) 0 2 (3 %) 0 17 (29 %) 1 (2 %) 14 2 2 (4 %) 10 5 (10 %) 4 5 (10 %) 2 3 (6 %) 0 16 (31 %) 18 Syntactic encoding 0 10 2 0 0 12 0 4 1 1 0 6 132 Table I-3. Number of comments for pausing in stimulated recalls (high intermediate long timed group) Translation Monitoring No Planning Content Organization Total Cohesion Total Unspecified Total 0 3 3 0 6 0 9 2 0 3 (3 %) 47 (44 %) 32 (30 %) 25 (23 %) 107 (100 %) 3 (2 %) 98 (61 %) 38 (24 %) 22 (14 %) 11 161 (100 %) 3 (3 %) 0 (0 %) 20 (19 %) 2 (2 %) 5 (5 %) 8 (7 %) 2 (2 %) 4 (4 %) 30 (28 %) 14 (13 %) 3 (2 %) 0 (0 %) 36 (22 %) 14 (9 %) 2 (1 %) 20 (12 %) 2 (1 %) 12 (7 %) 43 (27 %) 46 (29 %) recall 0 (0 %) 4 (4 %) 1 (0 %) 3 (3 %) 8 (7 %) 0 (0 %) 10 (6 %) 3 (2 %) 0 (0 %) 13 (8 %) Narrative (N =2) Within words Between words Between clauses Between sentences Total Argumentative (N =2) Within words Between words Between clauses Between sentences Total 0 17 14 12 43 0 27 9 6 42 0 1 1 4 6 0 2 2 2 6 0 1 1 0 2 0 0 1 2 3 Lexical retrieval Syntactic encoding 0 (0 %) 2 18 (17 %) 15 (14 %) 16 (15 %) 49 (46 %) 13 5 2 22 0 (0 %) 3 29 (18 %) 11 (7 %) 33 1 8 (5 %) 0 48 (30 %) 37 1 7 0 0 8 0 3 0 0 3 133 Table I-4. Number of comments for revision in stimulated recalls (high intermediate long timed group) Translation Planning Content Organization Total Cohesion Total Unspecified Total No recall 0 4 1 0 0 5 0 0 2 0 0 2 3 (6 %) 16 (30 %) 6 (11 %) 0 (0 %) 0 1 (2 %) 1 3 (6 %) 26 (48 %) 1 (2 %) 2 13 (24 %) 1 (2 %) 0 (0 %) 1 2 (4 %) 0 (0 %) 2 (4 %) 0 10 (19 %) 26 (48 %) 4 (7 %) 4 54 (100 %) 0 (0 %) 11 (17 %) 15 (24 %) 0 (0 %) 0 2 (3 %) 1 0 (0 %) 16 (25 %) 4 (6 %) 10 36 (57 %) 1 (2 %) 0 (0 %) 2 6 (10 %) 2 (3 %) 1 (2 %) 0 5 (8 %) 29 (46 %) 7 (11 %) 13 63 (100 %) Narrative (N =2) Below the word At the word level Below the clause level At the clause level or above At the sentence level or above Total Argumentative (N =2) Below the word At the word level Below the clause level At the clause level or above At the sentence level or above Total 0 7 4 0 6 17 0 2 6 3 1 12 0 1 0 0 2 3 0 0 1 0 1 2 Lexical retrieval 2 6 0 (0 %) 8 (15 %) 4 (7 %) 4 0 (0 %) 1 8 (15 %) 20 (37 %) 0 (0 %) 2 (3 %) 7 (11 %) 0 13 0 9 11 3 (5 %) 1 2 (3 %) 2 14 (22 %) 23 Syntactic encoding 1 6 1 0 0 8 0 2 2 0 0 4 134 Table I-5. Number of comments for pausing in stimulated recalls (advanced short timed group) Translation Planning Monitoring No Content Organization Total Cohesion Total Unspecified Total recall 0 (0 %) 5 (5 %) 1 (1 %) 1 (1 %) 7 (6 %) 0 (0 %) 6 (7 %) 1 (1 %) 0 (0 %) 7 (8 %) 0 1 0 0 1 1 2 0 0 3 2 (2 %) 65 (60 %) 27 (25 %) 15 (14 %) 109 (100 %) 3 (4 %) 50 (60 %) 21 (25 %) 10 (12 %) 84 (100 %) 1 (1 %) 1 (1 %) 32 (29 %) 2 (2 %) 6 (6 %) 2 (2 %) 1 (1 %) 5 (6 %) 40 (37 %) 10 (9 %) 1 (1 %) 0 (0 %) 22 (26 %) 1 (1 %) 4 (5 %) 0 (0 %) 2 (2 %) 0 (0 %) 29 (35 %) 1 (1 %) Narrative (N =2) Within words Between words Between clauses Between sentences Total Argumentative (N =2) Within words Between words Between clauses Between sentences Total 0 24 15 4 43 1 17 14 3 35 0 1 3 4 8 0 2 2 5 9 0 3 1 1 5 0 1 0 1 2 Lexical retrieval Syntactic encoding 0 (0 %) 1 25 (23 %) 18 (17 %) 8 (7 %) 51 (47 %) 28 4 0 33 1 (1 %) 1 19 (23 %) 16 (19 %) 8 (10 %) 44 (52 %) 19 1 1 22 0 1 1 0 2 0 2 3 0 5 135 Table I-6. Number of comments for revision in stimulated recalls (advanced short timed group) Translation Planning Content Organization Total Cohesion Total Lexical retrieval 1 Syntactic encoding 0 0 (0 %) Unspecified Total 0 1 2 0 0 3 0 1 0 0 0 1 1 (2 %) 28 (61 %) 16 (35 %) 0 (0 %) 1 (2 %) 46 (100 %) 4 (7 %) 26 (45 %) 15 (26 %) 10 (17 %) 3 (5 %) 58 (100 %) No recall 0 4 2 0 0 6 0 2 1 0 0 3 0 (0 %) 1 (2 %) 20 (43 %) 9 (20 %) 1 (2 %) 4 (9 %) 0 (0 %) 0 (0 %) 0 (0 %) 0 (0 %) 5 (10 %) 0 (0 %) 30 (65 %) 4 (7 %) 21 (36 %) 9 (16 %) 4 (7 %) 1 (2 %) 6 (10 %) 0 (0 %) 1 (2 %) 0 (0 %) 41 (70 %) 5 (9 %) Narrative (N =2) Below the word At the word level Below the clause level At the clause level or above At the sentence level or above Total Argumentative (N =2) Below the word At the word level Below the clause level At the clause level or above At the sentence level or above Total 0 3 4 0 1 8 0 3 1 2 0 6 0 0 0 0 0 0 0 0 1 2 2 5 3 (7 %) 15 4 (9 %) 0 (0 %) 1 (2 %) 5 0 0 8 (17 %) 21 0 (0 %) 4 3 (5 %) 18 2 (3 %) 4 (7 %) 2 (3 %) 11 (19 %) 7 2 1 32 1 2 0 0 3 0 1 1 4 0 6 136 Table I-7. Number of comments for pausing in stimulated recalls (advanced long timed group) Translation Planning Monitoring No Content Organization Total Cohesion Total Unspecified Total recall 0 (0 %) 2 (1 %) 2 (1 %) 0 (0 %) 4 (3 %) 2 (3 %) 0 (0 %) 0 (0 %) 0 (0 %) 2 (3 %) 0 2 4 1 7 0 0 0 0 0 5 (4 %) 83 (61 %) 41 (30 %) 7 (5 %) 136 (100 %) 6 (9 %) 35 (55 %) 18 (28 %) 5 (8 %) 64 (100 %) 4 (3 %) 0 (0 %) 50 (37 %) 12 (9 %) 4 (3 %) 1 (0 %) 0 (0 %) 2 (1 %) 66 (49 %) 7 (5 %) 2 (3 %) 0 (0 %) 22 (34 %) 2 (3 %) 2 (3 %) 4 (6 %) 0 (0 %) 1 (2 %) 26 (40 %) 7 (11 %) Narrative (N =2) Within words Between words Between clauses Between sentences Total Argumentative (N =2) Within words Between words Between clauses Between sentences Total 1 24 21 3 49 2 11 7 1 21 0 1 1 1 3 0 0 5 3 8 0 3 1 0 4 0 1 0 0 1 Lexical retrieval Syntactic encoding 1 (0 %) 4 25 (18 %) 22 (16 %) 40 6 4 (3 %) 0 52 (38 %) 50 2 (3 %) 1 11 (17 %) 12 (18 %) 19 2 4 (6 %) 0 29 (45 %) 22 0 7 5 0 12 1 2 0 0 3 137 Table I-8. Number of comments for revision in stimulated recalls (advanced long timed group) Translation Planning Content Organization Total Cohesion Total Lexical retrieval 3 Syntactic encoding 2 0 (0 %) Narrative (N =2) Below the word At the word level Below the clause level At the clause level or above At the sentence level or above Total Argumentative (N =2) Below the word At the word level Below the clause level At the clause level or above At the sentence level or above Total 0 1 4 1 0 6 0 1 9 2 2 14 0 0 0 0 3 3 0 0 1 1 0 2 No recall 0 (0 %) 1 (1 %) Unspecified Total 0 2 5 (7 %) 33 (47 %) 1 (1 %) 3 20 (29 %) 5 (7 %) 0 (0 %) 0 6 (9 %) 2 (3 %) 1 (1 %) 0 3 (4 %) 0 (0 %) 0 (0 %) 5 0 1 6 (9 %) 70 (100 %) 2 (3 %) 22 (38 %) 1 (2 %) 2 28 (48 %) 1 (2 %) 0 (0 %) 0 4 (7 %) 0 (0 %) 0 (0 %) 0 38 (66 %) 1 (2 %) 3 2 (3 %) 58 (100 %) 5 (7 %) 29 (41 %) 12 (17 %) 53 (76 %) 2 (3 %) 20 (34 %) 15 (26 %) 0 0 1 0 0 1 0 4 0 0 0 4 1 (1 %) 18 4 (6 %) 1 (1 %) 3 (4 %) 9 4 1 9 (13 %) 35 0 (0 %) 1 1 (2 %) 16 10 (17 %) 3 (5 %) 2 (3 %) 16 (28 %) 14 1 0 32 11 2 1 1 17 1 0 1 0 0 2 138 REFERENCES 139 REFERENCES Abdel Latif, M. M. M. (2013). What do we mean by writing fluency and how can it be validly measured? Applied Linguistics, 34(1), 99–105. Alexopoulou, T., Michel, M. C., Murakami, A., & Detmar, M. (2017). Task effects on linguistic complexity and accuracy: A large-scale learner corpus analysis employing Natural Language Processing techniques. Language Learning, 67, 180–208. Almond, R., Deane, P., Quinlan, T., Wagner, M., & Sydorenko, T. (2012). A preliminary analysis of keystroke log data from a timed writing task. (ETS Research Report No. RR-12-13). Princeton, NJ: Educational Testing Service. Alves, R. A., Castro, S. L., & Olive, T. (2008). Execution and pauses in writing narratives: Processing time, cognitive effort and typing skill. International Journal of Psychology, 43(6), 969–979. Ӓdel, A. (2008). Involvement features in writing: Do time and interaction trump register awareness? In G. Gilquin, S. Papp, & M. Díez-Bedmar (Eds.). Linking up contrastive and learner corpus research (pp. 35–53). Amsterdam, The Netherlands: Rodopi. Baaijen, V. M., Galbraith, D., & de Glopper, K. (2012). Keystroke analysis: Reflections on procedures and measures. Written Communication, 29(3), 246–277. Barkaoui, K. (2015). Test takers' writing activities during the TOEFL iBT® writing tasks: A stimulated recall study. (ETS Research Report No. RR-15-04). Princeton, NJ: Educational Testing Service. Barkaoui, K. (2016). What and when second‐language learners revise when responding to timed writing tasks on the computer: The roles of task type, second language proficiency, and keyboarding skills. The Modern Language Journal, 100(1), 320–340. Barkaoui, K., & Knouzi, I. (2018). The effects of writing mode and computer ability on L2 test- takers' essay characteristics and scores. Assessing Writing, 36, 19–31. Beauvais, C., Olive, T., & Passerault, J. M. (2011). Why are some texts good and others not? Relationship between text quality and management of the writing processes. Journal of Educational Psychology, 103(2), 415–428. Beers, S. F., & Nagy, W. E. (2011). Writing development in four genres from grades three to seven: Syntactic complexity and genre differentiation. Reading and Writing, 24(2), 183–202. Bereiter, C., & Scardamalia, M. (2009). The psychology of written composition. New York: Routledge. Biber, D., & Conrad, S. (2009). Register, genre, and style. Cambridge, UK: Cambridge 140 University Press. Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? TESOL Quarterly, 45, 5–35 Bowles, M. A. (2010). The think-aloud controversy in second language research. New York: Routledge. Brown, G. T., Glasswell, K., & Harland, D. (2004). Accuracy in the scoring of writing: Studies of reliability and validity using a New Zealand writing assessment system. Assessing writing, 9(2), 105–121. Caudery, T. (1990). The validity of timed essay tests in the assessment of writing skills. ELT Journal, 44(2), 122–131. Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written communication, 18(1), 80–98. Chenoweth, N. A., & Hayes, J. R. (2003). Inner voice in writing. Written Communication, 20, 99–118. Chukharev-Hudilainen, E. (2014). Pauses in spontaneous written communication: A keystroke logging study. Journal of Writing Research, 6(1), 61–84. Cho, Y. (2003). Assessing writing: Are we bound by only one method?. Assessing writing, 8(3), 165–191. Connor-Linton, J., & Polio, C. (2014). Comparing perspectives on L2 writing: Multiple analyses of a common corpus. Journal of Second Language Writing, 26, 1–9. Deane, P. (2014). Using writing process and product features to assess writing quality and explore how those features relate to other literacy tasks. (ETS Research Report No. RR-14-03). Princeton, NJ: Educational Testing Service. Deane, P., Roth, A., Litz, A., Goswami, V., Steck, F., Lewis, M., Richter, T. (2018). Behavioral differences between retyping, drafting, and editing: A writing process analysis. (ETS Research Report No. RM-18-06). Princeton, NJ: Educational Testing Service. DeKeyser, R. M. (2005). What makes learning second‐language grammar difficult? A review of issues. Language learning, 55(S1), 1–25. de Clercq, B., & Housen, A. (2017). A cross-linguistic perspective on syntactic complexity in L2 development: Syntactic elaboration and diversity. The Modern Language Journal, 101(2), 315–334. de Smet, M. J., Brand-Gruwel, S., Leijten, M., & Kirschner, P. A. (2014). Electronic outlining as a writing strategy: Effects on students' writing products, mental effort and writing 141 process. Computers & Education, 78, 352–366. de Smet, M. J., Leijten, M., & Van Waes, L. (2018). Exploring the process of reading during writing using eye tracking and keystroke logging. Written Communication, 35(4), 411–447. Eklundh, K. (1994). Linear and nonlinear strategies in computer-based writing. Computers and Composition, 11(3), 203–216. Eklundh, K., & Kollberg, P. (2003). Emerging discourse structure: computer-assisted episode analysis as a window to global revision in university students’ writing. Journal of Pragmatics, 35(6), 869–891. Elder, C., Knoch, U., & Zhang, R. (2009). Diagnosing the support needs of second language writers: does the time allowance matter? TESOL Quarterly, 43(2), 351–360. Ellis, R., & Yuan, F. (2004). The effects of planning on fluency, complexity, and accuracy in second language narrative writing. Studies in Second Language Acquisition, 26(1), 59–84. Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College composition and communication, 32(4), 365–387. Foster, P., & Skehan, P. (1996). The influence of planning and task type on second language performance. Studies in Second Language Acquisition, 18(3), 299–323. Gánem‐Gutiérrez, G. A., & Gilmore, A. (2018). Tracking the real‐time evolution of a writing event: Second language writers at different proficiency levels. Language Learning, 68(2), 469–506. Gass, S. M., & Mackey, A. (2017). Stimulated Recall Methodology in Applied Linguistics and L2 Research. New York: Routledge. Geisler, C., & Slattery, S. (2007). Capturing the activity of digital writing: Using, analyzing, and supplementing video screen capture. In H. A. McKee & D. N. DeVoss (Eds.), Digital writing research: Technologies, methodologies, and ethical issues (pp. 185–200). Cresskill, NJ: Hampton Press. Godfrey, L., Treacy, C., & Tarone, E. (2014). Change in French second language writing in study abroad and domestic contexts. Foreign Language Annals, 47(1), 48–65. Godfroid, A., & Spino, L. A. (2015). Reconceptualizing reactivity of think‐alouds and eye tracking: Absence of evidence is not evidence of absence. Language Learning, 65(4), 896–928. Guo, H., Deane, P. D., van Rijn, P. W., Zhang, M., & Bennett, R. E. (2018). Modeling basic writing processes from keystroke logs. Journal of Educational Measurement, 55(2), 194–216. 142 Hale, G. A. (1992). Effects of amount of time allowed on the Test of Written English. (Research Report No. 92–27). Princeton, NJ: Educational Testing Service. Hayes, J. R. (1996). A new framework for understanding cognition and affect in writing. In C. M. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences and applications (pp.1–28). Mahwah, NJ: Erlbaum. Hayes, J. R. (2012). Modeling and remodeling writing. Written communication, 29(3), 369–388. Hayes, J. R., & Flower, L. S. (1980). Identifying the organization of writing processes. In L. W. Gregg & E. R. Steinberg (Eds.), Cognitive processes in writing (pp. 3–30). Hillsdale, NJ: Erlbaum. Housen, A., De Clercq, B., Kuiken, F., & Vedder, I. (2019). Multiple approaches to complexity in second language research. Second Language Research. 35(1), 3–21. Housen, A., & Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisition. Applied Linguistics, 30(4), 461–473. Housen, A., Kuiken, F., & Vedder, I. (Eds.). (2012). Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA. Amsterdam: John Benjamins. Jacobs, H., Zinkgrap, S., Wormuth, D., Hartfiel, V., & Hughey, J. (1981). Testing ESL composition: A practical approach. Rowley, MA: Newbury House. Jeong, H. (2017). Narrative and expository genre effects on students, raters, and performance criteria. Assessing Writing, 31, 113–125. Johnson, M. D., Mercado, L., & Acevedo. A. (2012). The effect of planning sub-processes on L2 writing fluency, grammatical complexity, and lexical complexity. Journal of Second Language Writing, 21(3), 264–282. Kellogg, R. T. (1990). Effectiveness of prewriting strategies as a function of task demands. American Journal of Psychology, 103, 327–342. Kellogg, R. T. (1994). The psychology of writing. New York: Oxford University Press. Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences and applications (pp.57–71). Mahwah, NJ: Erlbaum. Kellogg, R. T. (2001). Competition for working memory among writing processes. The American Journal of Psychology, 114(2), 175–192. Khuder, B., & Harwood, N. (2015). L2 writing in test and non-test situations: Process and product. Journal of Writing Research, 6(3), 233–278. Knoch, U., & Elder, C. (2010). Validity and fairness implications of varying time conditions on a 143 diagnostic test of academic English writing proficiency. System, 38(1), 63–74. Knoch, U., Rouhshad, A., & Storch, N. (2014). Does the writing of undergraduate ESL students develop after one year of study in an English-medium university?. Assessing Writing, 21, 1–17. Knoch, U., Rouhshad, A., Oon, S. P., & Storch, N. (2015). What happens to ESL students’ writing after three years of study at an English medium university?. Journal of Second Language Writing, 28, 39–52. Koponen, M., & Riggenbach, H. (2000). Overview: Varying perspectives on fluency. In H. Riggenbach (Ed.) Perspectives on fluency (pp. 5–24). Ann Arbor: The university of Michigan press. Kowal, I. (2014). Fluency in second language writing: A developmental perspective. Studia Linguistica Universitatis Iagellonicae Cracoviensis, 131, 229–246. Kroll, B. (1990). What does time buy? ESL student performance on home versus class compositions. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 140–154). Cambridge, UK: Cambridge University Press. Kyle, K., & Crossley, S. (2017). Assessing syntactic sophistication in L2 writing: A usage-based approach. Language Testing, 34(4), 513–535. Lambert, C., & Kormos, J. (2014). Complexity, accuracy, and fluency in task-based L2 research: Toward more developmentally based measures of second language acquisition. Applied Linguistics, 35(5), 607–614. Larsen-Freeman, D. (2006). The emergence of complexity, fluency, and accuracy in the oral and written production of five Chinese learners of English. Applied Linguistics, 27, 590– 619. Leijten, M., & Van Waes, L. (2006). Inputlog: New perspectives on the logging of on-line writing processes in a Windows environment. In K. P. H. Sullivan & E. Lindgren (Eds.), Computer key-stroke logging: Methods and applications (pp. 73–93). Oxford, UK: Elsevier. Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358–392. Leijten, M., Van Waes, L., & Ransdell, S. (2010). Correcting text production errors: Isolating the effects of writing mode from error span, input mode, and lexicality. Written communication, 27(2), 189–227. Lennon, P. (1990). Investigating fluency in EFL: A quantitative approach. Language learning, 40(3), 387–417. Lindgren, E. (2005). Writing and revising: Didactic and methodological implications of 144 keystroke logging (Unpublished doctoral dissertation). Umea University, Sweden. Lindgren, E., & Sullivan, K. P. (2003). Stimulated recall as a trigger for increasing noticing and language awareness in the L2 writing classroom: A case study of two young female writers. Language Awareness, 12(3-4), 172–186. Lindgren, E., & Sullivan, K. P. H. (2006). Writing and the analysis of revision: An overview. In K. P. H. Sullivan & E. Lindgren (Eds.), Computer key-stroke logging: Methods and applications (pp. 31–44). Oxford: Elsevier. Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. Lu, X. (2011). A corpus based evaluation of syntactic complexity measures as indices of college- level ESL writers’ language development. TESOL Quarterly, 45, 36–62. Malvern, D. D., Richards, B. J., Chipere, N., & Durán, P. (2004). Lexical diversity and language development: Quantification and assessment. Houndmills, NH: Palgrave Macmillan. Marian, V., Blumenfeld, H., & Kaushanskaya, M. (2007). The Language Experience and Proficiency Questionnaire (LEAP-Q): Assessing language profiles in bilinguals and multilinguals. Journal of Speech, Language, and Hearing Research, 50(4), 940–967. McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392. McNamara, D. S., Graesser, A. C., McCarthy, P., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge, UK: Cambridge University Press. Medimorec, S., & Risko, E. F. (2016). Effects of disfluency in writing. British Journal of Psychology, 107(4), 625–650. Medimorec, S., & Risko, E. F. (2017). Pauses in written composition: on the importance of where writers pause. Reading and Writing, 30(6), 1267–1285. New, E. (1999). Computer–aided writing in French as a foreign language: A qualitative and quantitative look at the process of revision. The Modern Language Journal, 83(1), 80–97 Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. Oh, S. (2006). Investigating the relationship between fluency measures and second language writing placement test decisions (Unpublished Master’s Scholarly Paper). University of Hawaii. Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A 145 research synthesis of college-level L2 writing. Applied Linguistics, 24, 492–518. Pallotti, G. (2015). A simple view of linguistic complexity. Second Language Research, 31(1), 117–134. Papageorgiou, S., Tannenbaum, R. J., Bridgeman, B., & Cho, Y. (2015). The association between TOEFL iBT® test scores and the Common European Framework of Reference (CEFR) levels (Research Memorandum No. RM-15-06). Princeton, NJ: Educational Testing Service. Plonsky, L., & Oswald, F. L. (2014). How big is “big”? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912. Polio, C., & Glew, M. (1996). ESL writing assessment prompts: How students choose. Journal of Second Language Writing, 5(1), 35–49. Polio, C., & Friedman, D. A. (2017). Understanding, evaluating, and conducting second language writing research. New York: Routledge. Polio, C., & Lee, J. (in press). Experimental studies in L2 classrooms. In J. W. Schwieter & Benati, A (Eds.), The Cambridge handbook of language learning. Cambridge: Cambridge University Press. Polio, C., & Lim, J. (under review). Revising a writing rubric based on raters' comments: Does it result in more valid and reliable assessment? In Mehdi M. Riazi, L. Shi, & K. Barkaoui (Eds.), An edited volume in honor of Prof. Alister Cumming. Porte, G. (1996). When writing fails: How academic context and past learning experiences shape revision. System, 24(1), 107–116. Powers, D. E., & Fowles, M. E. (1996). Effects of applying different time limits to a proposed GRE writing test. Journal of Educational Measurement, 33(4), 433–452. Qin, W., & Uccelli, P. (2016). Same language, different functions: A cross-genre analysis of Chinese EFL learners’ writing performance. Journal of Second Language Writing, 33, 3–17. Quinlan, T., Loncke, M., Leijten, M., & Van Waes, L. (2012). Coordinating the cognitive processes of writing: The role of the monitor. Written Communication, 29(3), 345– 368. Ranalli, J., Feng, H. H., & Chukharev-Hudilainen, E. (2018). Exploring the potential of process- tracing technologies to support assessment for learning of L2 writing. Assessing Writing, 36, 77–89. Ranalli, J., Feng, H.-H., & Chukharev-Hudilainen, E. (2019). The affordances of process-tracing technologies for supporting L2 writing instruction. Language Learning & Technology. 23(2), 1–11. 146 Révész, A., Kourtali, N. E., & Mazgutova, D. (2017). Effects of task complexity on L2 writing behaviors and linguistic complexity. Language Learning, 67(1), 208–241. Révész, A., Michel, M., & Lee, M. (2017). Investigating IELTS academic writing task 2: Relationships between cognitive writing processes, text quality, and working memory. IELTS Research Reports Online Series, 44. Révész, A., Michel, M., & Lee, M. (in press). Exploring second language writers' pausing and revision behaviors: A mixed methods study. Studies in Second Language Acquisition. Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied Linguistics, 22(1), 27–57. Roca de Larios, J., Manchón, R., Murphy, L., & Marín, J. (2008). The foreign language writer's strategic behaviour in the allocation of time to writing processes. Journal of Second Language Writing, 17(1), 30–47. Ruiz-Funes, M. (2014). Task complexity and linguistic performance in advanced college-level foreign language writing. In H. Byrnes & R. M. Manchón (Eds.), Task-based language learning: Insights from and for L2 writing (pp. 163–192). Amsterdam: John Benjamins. Ruiz-Funes, M. (2015). Exploring the potential of second/foreign language writing for language learning: The effects of task factors and learner variables. Journal of Second Language Writing, 28, 1–19. Sasaki, M. (2000). Toward an empirical model of EFL writing processes: An exploratory study. Journal of second language writing, 9(3), 259–291. Sasaki, M. (2004). A multiple-data analysis of the 3.5-year development of EFL student writers. Language Learning, 54, 525–582. Sasaki, M., & Hirose, K. (1996). Explanatory variables for EFL students’ expository writing. Language Learning, 46(1), 137–174. Schilperoord, J. (1996). It’s about time: Temporal aspects of cognitive processes in text production. Amsterdam: Rodopi. Schmidt, R. (1992). Psychological mechanisms underlying second language fluency. Studies in Second Language Acquisition, 14(4), 357–385. Schrijver, I., Van Vaerenbergh, L., & Van Waes, L. (2012). An exploratory study of transediting in students’ translation processes. Hermes, Journal of Language and Communication in Business, 49, 99–117. Scott, V. M., & New, E. (1994). Computer-aided analysis of foreign language writing strategies. CALICO Journal, 11, 5–18. 147 Segalowitz, N. (2010). Cognitive bases of second language fluency. New York: Routledge. Skehan, P. (1996). A framework for the implementation of task-based instruction. Applied linguistics, 17(1), 38–62. Snellings, P., Van Gelderen, A., & De Glopper, K. (2004). Validating a test of second language written lexical retrieval: A new measure of fluency in written language production. Language Testing, 21(2), 174–201. Spelman Miller, K. (2000). Academic writers online: Investigating pausing in the production of text. Language Teaching Research, 4, 123–148. Spelman Miller, K. (2005). Second language writing research and pedagogy: A role for computer logging?. Computers and Composition, 22(3), 297–317. Spelman Miller, K., Lindgren, E., & Sullivan, K. P. H. (2008). The psycholinguistic dimension in second language writing: Opportunities for research and pedagogy using computer keystroke logging. TESOL Quarterly, 42, 433–453. Stevenson, M., Schoonen, R., & de Glopper, K. (2006). Revising in two languages: A multi- dimensional comparison of online writing revisions in L1 and FL. Journal of Second Language Writing, 15(3), 201–233. Tavakoli, P. (2014). Storyline complexity and syntactic complexity in writing and speaking tasks. In H. Byrnes & R. M. Manchón (Eds.), Task-based language learning: Insights from and for L2 writing (pp. 217–236). Amsterdam: John Benjamins. Thorson, H. (2000). Using the computer to compare foreign and native language writing processes: A statistical and case study approach. The Modern Language Journal, 84(2), 155–170. Van Hell, J. G., Verhoeven, L., & Van Beijsterveldt, L. M. (2008). Pause time patterns in writing narrative and expository texts by children and adults. Discourse Processes, 45(4–5), 406–427. Van Waes, L., & Leijten, M. (2015). Fluency in writing: A multidimensional perspective on writing fluency applied to L1 and L2. Computers and Composition, 38, 79–95. Van Waes, L., Leijten, M., & Quinlan, T. (2010). Reading during sentence composing and error correction: A multilevel analysis of the influences of task complexity. Reading and Writing, 23(7), 803–834. Van Waes, L., & Schellens, P. J. (2003). Writing profiles: The effect of the writing mode on pausing and revision patterns of experienced writers. Journal of pragmatics, 35(6), 829–853. Van Waes, L., Van Weijen, D., & Leijten, M. (2014). Learning to write in an online writing center: The effect of learning styles on the writing process. Computers & 148 Education, 73, 60–71. Wallot, S., & Grabowski, J. (2013). Typewriting dynamics: What distinguishes simple from complex writing tasks?. Ecological Psychology, 25(3), 267–280. Way, D. P., Joiner, E. G., & Seaman, M. A. (2000). Writing in the secondary foreign language classroom: The effects of prompts and tasks on novice learners of French. The Modern Language Journal, 84(2), 171–184. Wengelin, Å., Torrance, M., Holmqvist, K., Simpson, S., Galbraith, D., Johansson, V., & Johansson, R. (2009). Combined eyetracking and keystroke-logging methods for studying cognitive processes in text production. Behavior research methods, 41(2), 337–351. Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press. Wengelin, Å. (2006). Examining pauses in writing: Theories, methods, and empirical data. In K. P. H. Sullivan & E. Lindgren (Eds.), Computer key-stroke logging and writing: Methods and application (pp. 107–130). Oxford: Elsevier. Wolfe-Quintero, K., Inagaki, S., & Kim, H. Y. (1998). Second language development in writing: Measures of fluency, accuracy, & complexity. Honolulu: University of Hawai‘i Press. Wu, J., & Erlam, R. (2016). The effect of timing on the quantity and quality of test-takers' writing. New Zealand Studies in Applied Linguistics, 22(2), 21–34. Wu, S. L., & Ortega, L. (2013). Measuring global oral proficiency in SLA research: A new elicited imitation test of L2 Chinese. Foreign Language Annals, 46(4), 680–704. Xu, C. (2018). Understanding online revisions in L2 writing: A computer keystroke-log perspective. System, 78, 104–114. Xu, C., & Ding, Y. (2014). An exploratory study of pauses in computer-assisted EFL writing. Language, Learning and Technology, 18(3), 80–96. Yang, W. (2014). Mapping the relationships among the cognitive complexity of independent writing tasks, L2 writing quality, and complexity, accuracy and fluency of L2 writing (Unpublished doctoral dissertation). Georgia State University, Atlanta. Yang, W., Lu, X., & Weigle, S. (2015). Different topics, different discourse: Relationships among writing topic, measures of syntactic complexity, and judgments of writing quality. Journal of Second Language Writing, 28, 53–67. Yoon, H. (2017). Investigating the interactions among genre, task complexity, and proficiency in L2 writing: A comprehensive text analysis and study of learner perceptions (Unpublished doctoral dissertation). Michigan State University. Yoon, H., & Polio, C. (2017). The linguistic development of students of English as a second 149 language in two written genres. TESOL Quarterly, 51, 275–301. Younkin, W. F., (1986). Speededness as a source of test bias for non-native English speakers on the college level academic skills test (Unpublished doctoral dissertation). University of Miami. Yu, G. (2010). Lexical diversity in writing and speaking task performances. Applied Linguistics, 31(2), 236–259. Zhang, M., & Deane, P. (2015), Process features in writing: Internal structure and incremental value over product features. (ETS Research Report No. RR-15-27). Princeton, NJ: Educational Testing Service. 150