INVESTIGATING THE INTERACTIONS AMONG GENRE, TASK COMPLEXITY, AND
PROFICIENCY IN L2 WRITING: A COMPREHENSIVE TEXT ANALYSIS AND STUDY OF
LEARNER PERCEPTIONS
By
Hyung-Jo Yoon

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Second Language Studies—Doctor of Philosophy
2017

ABSTRACT
INVESTIGATING THE INTERACTIONS AMONG GENRE, TASK COMPLEXITY, AND
PROFICIENCY IN L2 WRITING: A COMPREHENSIVE TEXT ANALYSIS AND STUDY OF
LEARNER PERCEPTIONS
By
Hyung-Jo Yoon
In this study, I explored the interactions among genre, task complexity, and L2
proficiency in learners’ writing task performance. Specifically, after identifying the lack of valid
operationalizations of genre and task dimensions in L2 writing research, I examined how genre
functions as a task complexity variable, and how learners’ perceptions and language production
interact with their proficiency. In exploring ESL students’ perceptions and production of
different writing tasks, I used the two genres of narrative and argumentative writing, within
which I manipulated the level of task complexity operationalized as idea support (e.g., narrative
task with supporting ideas is the simple narrative task). I collected essay data from 76 ESL
students. Each student wrote four essays (i.e., a total of 304 essays). Immediately after each
writing session, the students showed their perceptions of a task in terms of six dimensions (task
complexity, difficulty, anxiety, confidence, interest, and motivation). Additionally, I collected
perception data from 30 ESL instructors with regard to how their students at a proficiency level
similar to that of the student participants would perform the target writing tasks. In so doing, I
could compare students’ perceptions with teachers’ expectations of how the tasks would
function.
From the task perception result, I found a gap between the student and teacher groups
regarding their views of the two genres. Specifically, the teachers predicted that ESL students
would have greater difficulty in completing the argumentative genre than the narrative, but

instead the students perceived both genres as involving a similar level of complexity and
difficulty. Also, unlike teachers’ expectations, students consistently judged the tasks with idea
support as less complex and less difficult. One common result from both groups was their
judgments of the narrative genre as sparking greater interest and motivation for further writing
than the argumentative.
The writing result showed that the students’ language varied to a greater extent across the
two genres but not across the idea support conditions. I also found that most linguistic features
did not differ by L2 proficiency. This result suggests that there is a very weak link between
writers’ task perceptions and language production, challenging the common practice of taskbased writing research. Therefore, this result points to the importance of exploring these two
different result types separately in written discourse because writers’ language changes are
largely motivated by varying communicative functions of different genres but not by a task’s
cognitive constraints imposed on writers.
The result of essay quality scores demonstrated that narrative essays tended to receive
higher scores than argumentative essays in terms of discourse-level categories, and that there
were significant interaction effects between genre and idea support. Specifically, argumentative
essays composed with supporting ideas resulted in higher scores, whereas narrative essays with
supporting ideas led to lower scores. Unlike the result of linguistic features, with L2 proficiency
as an additional variable, the result showed that higher proficiency ESL students are likely to
receive higher scores on sentence-level categories. This study offers implications for L2 writing
research, pedagogy, and assessment. Particularly, L2 writing instructors and task developers will
be informed about the possibility of constructing independent writing tasks with various genres
and task complexity to achieve an appropriate alignment of task features with target L2 learners.

I dedicate this work to my parents.

iv

ACKNOWLEDGMENTS

I could not have finished this project without the support of many people around me.
First, I would like to express my sincere thanks to Dr. Charlene Polio for her support and
encouragement over the course of my Ph.D. studies. She has been a great mentor for my studies
as well as for my life. Her constant passion for improvement as a researcher taught me how I can
enjoy my life as a researcher.
I am deeply thankful to each of my dissertation committee members. Dr. Paula Winke has
provided me with valuable advice on how to write a research paper with a professional tone. Dr.
Shawn Loewen has constantly taught me the importance of having a keen understanding of
statistics as an applied linguist. Dr. Aline Godfroid has equipped me with theoretical knowledge
and practical skills necessary for conducting a decent SLA study.
I also thank instructors and teaching assistants at the English Language Center who
allowed me to collect data in their classes. Thanks to their generous permission, I could finish the
stage of data collection with little difficulty. Also, I am grateful to ESL instructors and students
who participated in my project. I would like to extend my gratitude to my friends in the Second
Language Studies program. I could have a relatively stress-free life thanks to their support, and I
will never forget the time we spent together. Most importantly, I would like to say that I could
focus on my studies thanks to my parents in South Korea who have always supported me. Their
love and trust have been a great driving force for me.
This project was funded by The International Research Foundation (TIRF), the National
Federation of Modern Language Teachers’ Association, and the College of Arts and Letters at
Michigan State University.

v

TABLE OF CONTENTS

LIST OF TABLES ....................................................................................................................... viii
LIST OF FIGURES ....................................................................................................................... ix
CHAPTER 1. INTRODUCTION ................................................................................................... 1
CHAPTER 2. LITERATURE REVIEW....................................................................................... 12
Definitions of Genre and Other Related Terms ........................................................................ 12
Cross-genre L1 Studies ............................................................................................................. 16
Cross-genre L2 research ........................................................................................................... 20
Task-based Writing Studies....................................................................................................... 25
Task-based Studies with Cross-genre Manipulations ............................................................... 29
Validation of Task Complexity Manipulations ......................................................................... 33
Text Analysis in TBLT Research .............................................................................................. 35
Rationale for the Present Study................................................................................................. 38
CHAPTER 3. METHOD .............................................................................................................. 42
Participants ................................................................................................................................ 42
Student participants ............................................................................................................... 42
Teacher participants .............................................................................................................. 44
Instruments ................................................................................................................................ 45
Questionnaires....................................................................................................................... 45
Writing prompts .................................................................................................................... 47
Rubric.................................................................................................................................... 48
Procedures ................................................................................................................................. 49
Data collection ...................................................................................................................... 49
Essay scoring ........................................................................................................................ 52
Text Features ............................................................................................................................. 54
Syntactic complexity features ............................................................................................... 55
Lexical features ..................................................................................................................... 57
Discourse features ................................................................................................................. 58
Interactional metadiscourse features ..................................................................................... 59
Analysis..................................................................................................................................... 62
CHAPTER 4. RESULTS .............................................................................................................. 64
Task Perceptions ....................................................................................................................... 64
Textual Feature Changes across Task Types ............................................................................. 78
Interplay of L2 Proficiency and Task Manipulations Influencing Textual Features ................. 88

vi

Essay Score Changes across Task Types .................................................................................. 97
Interplay of L2 proficiency and Task Manipulations Influencing Essay Scores .................... 101
CHAPTER 5. DISCUSSION ...................................................................................................... 106
ESL Students’ and Teachers’ Perceptions of Writing Tasks .................................................... 106
Effects of Task Type on Textual Features ............................................................................... 109
Effects of Task Type on Essay Quality ....................................................................................114
CHAPTER 6. CONCLUSION.....................................................................................................119
Theoretical and Research Implications ....................................................................................119
Pedagogical and Assessment Implications.............................................................................. 121
Limitations and Future Research ............................................................................................ 123
APPENDICES ............................................................................................................................ 126
Appendix A. Writing Prompts................................................................................................. 127
Appendix B. Revised Analytic Scoring Rubric ...................................................................... 129
Appendix C. Cloze Test .......................................................................................................... 130
Appendix D. Example Essays ................................................................................................. 134
REFERENCES ........................................................................................................................... 139

vii

LIST OF TABLES

Table 1. Taxonomy of Genre in This Study .................................................................................. 15
Table 2. Demographic Characteristics of the ESL Student Participants ....................................... 44
Table 3. Counterbalanced Data Collection Procedures................................................................. 50
Table 4. Demographic Characteristics of the High and Low Proficiency Group Students .......... 52
Table 5. Target Text Features ........................................................................................................ 61
Table 6. Descriptive Statistics for ESL Students’ and Teachers’ Perceptions of Writing Tasks ... 66
Table 7. Interaction Effects of Genre, Idea Support, and Group on Task Perceptions ................. 67
Table 8. Post-hoc Analysis Results of Genre and Idea Support Effects for Each Group’s
Perceptions ...................................................................................................................... 67
Table 9. Main Effects of Genre, Idea Support, and Group on Task Perceptions .......................... 72
Table 10. Correlations between Perception Items by Task Type .................................................. 73
Table 11. Summary of Task Perception Results ............................................................................ 77
Table 12. Descriptive Statistics for Target Text Features by Task Type ....................................... 81
Table 13. Inferential Statistics for Genre and Idea Support Effects on Textual Features ............. 82
Table 14. Interaction and Main Effects of L2 Proficiency on Textual Features ........................... 90
Table 15. Summary of Task Manipulation and L2 Proficiency Conditions with Significantly
Higher Values of Textual Features................................................................................... 91
Table 16. Correlations of Perceived Task Complexity with Linguistic Complexity Features ...... 92
Table 17. Descriptive Statistics for Essay Scores by Genre and Idea Support ............................. 98
Table 18. Inferential Statistics for Genre and Idea Support Effects on Essay Scores ................... 98
Table 19. Descriptive Statistics for Essay Scores by L2 Proficiency, Genre, and Idea Support 102
Table 20. Interaction and Main Effects of L2 Proficiency on Textual Features ......................... 103

viii

LIST OF FIGURES

Figure 1. Students’ and teachers’ perceptions of task complexity and difficulty across genre
conditions. ....................................................................................................................... 68
Figure 2. Interaction plots for perceived complexity and difficulty showing an interaction
between genre and idea support only for teacher perceptions. ........................................ 69
Figure 3. Students’ and teachers’ perceptions of task confidence, interest, and motivation across
genre conditions. ............................................................................................................. 71
Figure 4. Interaction plots for complex nominals per clause, modifiers per noun phrase, and
nominalization density showing an interaction between genre and idea support
conditions. ....................................................................................................................... 80
Figure 5. Complex nominals per clause and modifiers per noun phrase across genre and idea
support conditions. .......................................................................................................... 84
Figure 6. Temporal connective density and self-mention density across genre and idea support
conditions. ....................................................................................................................... 85
Figure 7. Nominal clause density, adverbial clause density, and adjectival clause density across
genre and idea support conditions. .................................................................................. 87
Figure 8. Interaction plots for content, organization, and language use scores showing an
interaction between genre and idea support conditions................................................... 99
Figure 9. Content and organization scores across genre and idea support conditions. ............... 100
Figure 10. Vocabulary and language use scores across task types and L2 proficiency. ............. 104

ix

CHAPTER 1.
INTRODUCTION
Much first language (L1) and second language (L2) writing research has investigated the
cognitive processes involved in writing and has provided important suggestions on how writers
deal with different stages of writing that place varying demands on their limited cognitive
resources (e.g., Beauvais, Olive, & Passerault, 2011; Chenoweth & Hayes, 2001; Hayes &
Chenoweth, 2006; Manchón & Roca de Larios, 2007; Olinghouse & Graham, 2009; Quinlan,
Loncke, Leijten, & Van Waes, 2012, among many others). Specifically, drawing on cognitivelyoriented writing models (e.g., Hayes, 1996; Hayes & Flower, 1980; Kellogg, 1996), L1 and L2
writing studies have attempted to explore how writers’ knowledge and memory resources interact
with the task environment (e.g., Barkaoui, 2016; Johansson, Wengelin, Johansson, & Holmqvist,
2010; Johnson, Mercado, & Acevedo, 2012; Kormos, 2011; Leijten & Van Waes, 2013; Wengelin
et al., 2009).
While benefiting greatly from the suggestions of L1 writing studies, due to fundamental
differences between L1 and L2 writers (e.g., age of acquisition, language proficiency, amount of
input, and educational experience), L2 writing research began to establish its own ground by
testing its empirical findings against L2-specific frameworks such as the cognition hypothesis
(Robinson, 2001a, 2001b, 2005, 2007) and the limited attentional capacity model (Skehan, 1998,
2009; Skehan & Foster, 2001). As a result, we have observed an increase in the number of L2
writing studies associated with task-based language teaching (TBLT) that focuses on the effects
of task-internal cognitive demands on written language production (e.g., Ellis & Yuan, 2004;
Frear & Bitchener, 2015; Johnson et al., 2012; Kormos, 2011; Kuiken & Vedder, 2007, 2008;
Ong, 2014; Ong & Zhang, 2010; Révész, Kourtali, & Mazgutova, in press).

1

While many L2 writing studies have found a significant impact of task manipulations on
students’ language use, their specific findings on the link between task features and linguistic
features have not converged. For example, Révész et al. (in press), in which the provision of idea
support was manipulated, found significant task effects on syntactic complexity and lexical
diversity, but a similar task manipulation did not result in significant changes in similar linguistic
units in Kormos (2011). Additionally, Ellis and Yuan (2004) suggested a positive effect of pretask planning on L2 writers’ writing fluency and linguistic complexity in narratives, but Johnson
et al. (2012) failed to find such a significant impact in his study using argumentative essays. In
discussing their different findings from previous studies (Ellie & Yuan, 2004; Ong & Zhang,
2010), Johnson et al. (2012) suggested the use of different genres as one of the potential reasons.
Interestingly, Révész et al. and Kormos (2011) also explored different genres of argumentative
and picture-based narrative writing respectively, which might be a factor leading to discrepant
findings between the two studies.
Ironically, the most consistent findings of task effects have been suggested by several
task-based L2 studies that used genre as a task complexity variable and examined the effect of
genre on linguistic features such as syntactic complexity and accuracy (e.g., Ruiz-Funes, 2014,
2015; Way et al., 2000; Yang, 2014). Specifically, with the assumption that the cognitive
demands induced by narrative tasks are lower than those by non-narrative tasks, these crossgenre studies framed narratives as a simple task and non-narratives (e.g., exposition or
argumentation) as a complex task. The findings of these studies have consistently shown
increased levels of syntactic complexity in non-narrative writing when compared to those in
narratives, which is well aligned with task-based hypotheses such as Robinson’s cognition

2

hypothesis (i.e., more complex tasks may elicit more complex language). Taking into account all
of these findings in the L2 writing literature, I drew three conclusions:
1. Task-based writing studies have produced varying results of task manipulation effects.
2. Task manipulation effects may interact with genre.
3. Different genres elicit different linguistic features.
However, these conclusions do not confirm our understanding of how task complexity
manipulations work in written discourse, but rather they emphasize what is lacking in the field
and suggest the necessity to examine the validity of some presumptions supported by many
researchers.
First, we need to test the level of cognitive demands imposed by different genres. While it
is reasonable to assume that making a logical argument necessitates more in-depth thinking
processes than describing a story does (Beauvais et al., 2011), as the majority of cognitive
models of writing suggest, a writer’s task schemas and other types of knowledge (e.g., topic,
genre, and audience) indisputably influence the cognitive demands of a writing task placed on
the writer. Of several cognitive models of writing, this study is grounded in the model of writing
proposed by Hayes and his colleagues, which has been widely accepted in writing research fields
(Hayes, 1996, 2012; Hayes & Flower, 1980). Since the original model in Hayes and Flower
(1980), John Hayes has constantly modified his model by including more affective and
motivational factors, but the writer’s task schemas and knowledge remain as important resources
that moderate writing processes and cognitive constraints. Specifically, the revised model
(Hayes, 1996) includes the two major factors of the task environment and the individual. The
latter is composed of four components that interact with each other: motivation/affect, cognitive
processes, working memory, and long-term memory. Of these components, most relevant to the

3

focus of this study is the writer’s long-term memory component that includes task schemas, topic
knowledge, audience knowledge, linguistic knowledge, and genre knowledge. During the act of
writing, this knowledge-related component together with other dimensions of individual
differences such as working memory and motivational attributes interacts with the task
environment (e.g., task materials, writing medium, collaborators, and audience). In other words,
writers’ task-relevant knowledge can be used to reduce a level of cognitive demands imposed by
a certain task; writers’ performance is dependent on their familiarity and understanding of a
given task.
One important question inferred from this model is: do adult English as a second
language (ESL) students really have greater genre knowledge and task schemas for narrative
tasks than for argumentative tasks, as accepted by many researchers? In this study, I set out to
answer this question by exploring both perception and language production data. While the
majority of TBLT studies have initially focused on oral tasks, writing researchers recently began
to examine cognitive task complexity to see how it interplays within written discourse (see
Plonsky & Kim, 2016 for a review), either adjusting task features within a specific written genre
(e.g., Ong & Zhang, 2010; idea support condition manipulated in argumentative writing) or
operationalizing genre as a cognitive complexity dimension (e.g., Ruiz-Funes, 2015; expository
genre operationalized as more complex than narrative). Here, I argue that genre as a task variable
needs to be manipulated and analyzed with caution because there are two research lines that
address a similar issue with different starting points and purposes.
Specifically, one research tradition originated from composition studies with L1 children
addresses how learners at different grades (or proficiency levels) show distinct writing skills
across genres, attempting to identify an appropriate genre for a particular age group (e.g., Beers

4

& Nagy, 2011; Berman & Katzenberger, 2004; Berman & Nir-Sagiv, 2004; Ravid, 2005). In this
tradition, researchers have explained potential genre effects on linguistic features by linking
linguistic forms to discourse functions (e.g., extensive use of past tense to express temporality in
narratives and increased noun-phrase complexity to express generality in non-narrative genre;
Berman & Katzenberger, 2004). The other line of research sees written genres as tasks having
different cognitive demands (e.g., Ruiz-Funes, 2014, 2015; Yang, 2014). Drawing on TBLT
hypotheses, researchers in this line have interpreted the increased linguistic complexity of learner
language in a particular genre as evidence of the genre’s higher task complexity; with the
consistent findings of an increase in linguistic complexity in non-narrative genre compared to
narrative genre, they concluded that genre is a valid task complexity variable affecting L2
learners’ cognitive processes and language production. Therefore, despite their similar methods
and results, the two research lines have been established with different assumptions about written
genre, generating diverging interpretations.
To problematize the presumption of different genres’ varying cognitive demands (more
specifically, equating argumentative writing with a high-cognitive demand task and narrative
writing with a low-cognitive task), I draw on the long-term memory component of the writing
model (Hayes, 1996; Hayes & Flower, 1980) that includes genre knowledge. Major assumptions
about genre-specific cognitive demands are based on the findings of L1 writing research whose
participants were mostly children or adolescents (e.g., Berman, 2008; Ravid, 2005). For
example, researchers presuppose that students may have greater experience with narrative tasks
than with argumentative tasks because children in the U.S. educational system actually work
primarily on narrative tasks as a first step of developing their full range of writing skills.
Regarding genre-specific writing skills, the standards of English language arts that have been

5

adopted by forty-two states and the District of Columbia suggest that K-5 students need to
develop skills for narrative, opinion, and simple explanatory writing and those in grades 6-12
develop skills for argumentative writing (CCSS, 2017). The alignment of written genres with
specific grades is a clear reflection of children’s developmental trajectories of cognitive skills.
Specifically, it is widely known that children undergo notable growth in cognitive abilities with
age, and that their cognitive skills for rational judgment and abstract thinking start to develop in
the stage of ages 7 to 11 (Ginsburg & Opper, 1988). Thus, it can be very challenging for children
or young adolescents to compose an argumentative essay, and it is reasonable to assume that
children would feel more comfortable with narratives. However, the same scenario cannot be
applied to adult L2 learners who have already reached a high level of cognitive maturity.
Furthermore, most adult L2 learners have finished primary and secondary schools in educational
contexts distinct from those in the United States, leading me to assume that genre-specific
difficulties for adult L2 learners may depend on their educational experience with various genres
and modes of discourse.
The key focus is the quantity and quality of writing instruction that typical adult L2
learners are likely to have experienced before coming to an ESL context, as well as their
motivation for learning English writing. This is particularly so when I consider the components
of Hayes’ (1996) writing model that include motivation, task schemas, and genre knowledge as
an important part of the individual element. First, it should be noted that adult ESL students who
learned English mostly in primary and secondary schools in their own countries (i.e., English as
a foreign language contexts) are likely to have acquired limited English writing skills due to the
English educational systems greatly influenced by high-stakes exams that focus on receptive
language skills (Butler & Iino, 2005; Byun et al., 2011; Watanabe, 1996). Particularly, L2

6

learners in East Asian countries would likely have received English instruction that focuses on
the development of grammar, vocabulary, and reading comprehension skills because of the
inclusion of such skills in the English section of high-stakes college entrance exams (e.g., Cheng,
2008; Jeon, 2009; Kikuchi, 2006; Sakamoto, 2012).1 For example, Shim and Baik (2004) noted
that English teachers in South Korea have difficulties in teaching productive English skills due to
students’ expectations of having examination-oriented instruction. English teachers in Japan and
Hong Kong have also expressed similar concerns (Butler & Iino, 2005; Chow & Mok-Cheung,
2004).
Given this information, what we can expect from many ESL students is that their major
English writing practice would be for the preparation of standardized English tests (e.g., TOEFL
or IELTS), with the scores being required to obtain admission to schools in English-speaking
countries. Further, considering the fact that argumentative writing has long been a typical genre
for standardized writing assessments (Qin & Karabacak, 2010), we can infer that adult ESL
students would have greater genre knowledge for argumentative essays than narratives. It may
still hold true for adult ESL students who have received college education in an English-speaking
country for years because argumentative writing is a typical and necessary text type for the
college academic curriculum (Christie, 1997; Johns, 1995; Mei, 2006). All of these points likely
challenge a current understanding that implementing an argumentative task will naturally impose
increased cognitive complexity on adult ESL learners and suggest a new prediction that adult L2

1

According to the statistics of Institute of International Education (2016), students from China
constitutes 31.5% of the entire international student population studying in the United States, and
those from China, South Korea, Taiwan, and Japan adds up to 40.3%. The college entrance
exams administered by public institutes in these countries (e.g., National Higher Education
Entrance Examination in China; College Scholastic Ability Test in South Korea; and National
Center Test for University Admissions in Japan) are large-scale, multiple-choice tests that do not
involve actual writing.
7

learners would be more familiar and thus comfortable with argumentative writing, which will be
tested in the present study.
With regard to the literature of TBLT research, I noted earlier that task complexity
writing studies had produced conflicting findings in terms of their support for task complexity
hypotheses (extensively reviewed in the Literature Review section). One possibility is that some
task manipulations are not applicable to written discourse due to several fundamental differences
between the two modalities (written and oral language production) (Biber, 1988, 2006a; Chafe,
1982). Researchers have expressed concerns about the validity of the direct application of
cognitive complexity hypotheses to writing (Frear & Bitchener, 2015; Jackson &
Suethanapornkul, 2013; Johnson et al., 2012; Yoon & Polio, 2017). That is, while the underlying
assumption of cognitive task complexity is the allocation of limited attentional resources, writing
would be less constrained by such cognitive limitations due to the features of writing as a
recursive process that involves a series of planning, monitoring, and revising (Hayes, 1996;
Hayes & Flower, 1980; Kellogg, 1996). In this regard, Yuan and Ellis (2003) argued that writers
are less pressured than speakers in terms of their allocation of attention between idea
conceptualization and linguistic formulation, and also that writers have more attentional
resources available for planning and monitoring than speakers. Based on their finding of the lack
of pre-planning effects on written language production, Johnson et al. (2012) discussed the
following:
[W]riting is fundamentally different from speaking. For this reason, written L2
production may not be described accurately by the Cognition Hypothesis (Robinson,
2001, 2005, 2011a, 2011b) nor by the Limited Attentional Capacity Model (Skehan,
1998; Skehan & Foster, 2001) because such models predict the impact of pre-task

8

planning on L2 oral production. Because speaking is a linear process, planning time prior
to L2 speaking tasks is effective in relieving attentional demands of language production
... In contrast, writing is a recursive process, thus planning time prior to L2 writing tasks
does not obviate online planning as well as monitoring. (p. 271)
Another concern is the validity of the ways that researchers operationalize cognitive
complexity for writing tasks. The manipulations of task complexity in L2 writing studies include
the provision of planning time (e.g., Ellis & Yuan, 2004; Ojima, 2006; Ong, 2013, 2014; Ong &
Zhang, 2010), number of elements (e.g., Kuiken, Mos, & Vedder, 2005; Kuiken & Vedder, 2007,
2008, 2011), here-and-now (Ishikawa, 2007), and conceptualization support through idea
provision (e.g., Kormos, 2011, 2014; Kormos & Trebits, 2012; Ong, 2013, 2014; Ong & Zhang,
2010; Révész et al., in press), most of which have been directly applied from TBLT speaking
studies. Specific to written discourse is the pattern of task manipulations in relation to written
genre. On the one hand, researchers have manipulated task dimensions within a specific genre
(i.e., within-genre manipulation studies). For example, Kuiken and Vedder varied the number of
elements to be considered to decide a travel destination (3 and 6 elements) in letter writing, and
Kormos adjusted the level of conceptual demands in a picture narrative task by changing the
condition of supporting content. On the other hand, a few recent studies have operationalized
genre as one of the resource-directing dimensions of cognitive complexity (i.e., cross-genre
manipulation studies), based on the assumption that argumentative essays would involve higher
cognitive complexity than narrative essays (e.g., Ruiz-Funes, 2014, 2015; Yang, 2014). There
were also some L2 writing studies that investigated multiple genres composed by learners and
interpreted findings with a similar assumption of genre-specific cognitive demands, although

9

these studies were framed as task complexity studies (e.g., Jeong, 2016; Qin & Uccelli, 2016;
Way et al., 2000).
However, as discussed in Polio and Yoon (2016), genre research and task-based research
have suggested varying interpretations from similar findings (e.g., higher syntactic complexity in
non-narrative writing) due to different starting points of each research line (i.e., communicative
functions in genre research and cognitive demands in task research). Furthermore, some previous
task-based studies have suggested different patterns of task effects, potentially due to the use of
different genres (e.g., Kormos, 2011; Ong & Zhang, 2010; Révész et al, in press), suggesting the
need to explore the interaction between genre and task complexity effects on L2 learners’
language production and perceptions. Of several existing task variables for within-genre
manipulations, it seems particularly important to explore the condition of idea support in terms
of its varying roles in different genres because this variable was found to influence writers’
perceptions validly in one genre (i.e., idea support judged as a valid task variable in
argumentative writing by Révész et al, in press), while others have not been tested in terms of
their validity.
To explore the validity of genre and task manipulations, in this study, I examine ESL
learners’ production and perceptions of four writing tasks, together with ESL teachers’
perceptions of the same tasks. The tasks targeted in this study involve argumentative and
narrative genres within which a level of task complexity is manipulated in terms of the provision
of supporting ideas. Students’ perceptions of the tasks are collected immediately after their
writing performance via a self-rating questionnaire. Going beyond the common practice of
examining traditional linguistic complexity features to validate task complexity hypotheses (see
Robinson, 2011 for a review), I analyze textual features at multiple levels (i.e., syntactic, lexical,

10

discourse, and metadiscourse levels), attempting to explain the motivation for linguistic changes
on the basis of their communicative functions. In the following chapters, I review L1 and L2
genre studies, as well as task-based writing studies in order to suggest specific gaps in the
literature and to introduce how this study addresses them appropriately.

11

CHAPTER 2.
LITERATURE REVIEW
Definitions of Genre and Other Related Terms
There has been a large body of research into the effect of genre on learners’ language use
(e.g., Beers & Nagy, 2009, 2011; Lu, 2011; Qin & Uccelli, 2016; Yoon & Polio, 2017, among
others). Researchers have also shown much variation in writing processes and essay scores
arising from genre differences (e.g., Beauvais et al., 2011; Bouwer et al., 2015; Hamp-Lyons &
Mathias, 1994; Jeong, 2016; Way et al., 2000). Findings from such extensive genre research have
suggested the need to control for genre in developmental research and to employ different genres
to obtain a more comprehensive understanding of learners’ writing proficiency in assessment
contexts. However, there is still some confusion about the notion of the term genre because
researchers have used genre and other related terms such as register, text type, and mode of
discourse in different ways. For example, early research used the term mode of discourse in
discussing traditional types of rhetorical categories such as narrative, description, and
argumentation (e.g., Crowhurst, 1979, 1980; Engelhard, Gordon, & Gabrielson, 1992; Kegley,
1986; Steen, 1999), while recent studies referred to such categories as genres (e.g., Jeong 2016;
Lu, 2011; Qin & Uccelli, 2016). Some authors used these terms interchangeably with no explicit
distinction among terms (Stubbs, 1996). Accordingly, to avoid potential confusion, before
reviewing L1 and L2 genre studies, I clarify my use of genre-related terms in this study.
Several studies have attempted to address the elusive nature of these text-classifying
terms by elucidating their different nuances (e.g., Biber, 1988; Lee, 2001; Nunan, 2008;
Paltridge, 1996). An early attempt to differentiate between genre and text type is Biber (1988), in
which genre is considered a classification based on external criteria such as purpose and

12

audience, and text type a category based on text-internal criteria such as linguistic features. That
is, although some texts have very similar linguistic characteristics, they could be seen as different
genres when they have different purposes. In line with Biber’s distinction, Paltridge (1996) also
suggested the criteria of external and internal dimensions to explicate the meanings of genre and
text type. Nunan (2008) noted that a collection of texts can be grouped into the same genre when
they have a common communicative function, while acknowledging great difficulty in building
confirmatory taxonomies of genres.
According to Lee (2001), there is some additional difficulty distinguishing between
genre, register, and style clearly due to some overlap in their meaning and the interchangeable
use of these terms in previous research. In Biber and Conrad’s (2009) book-length study, the
authors endeavored to define each of these terms for clarification purposes. They noted that
register and style are categories based on the frequently occurring linguistic features. That is,
some linguistic features pervasive in one register would be infrequent or rare in another register
(the same premise applicable to styles). The difference between register and style is that the
former primarily involves varying linguistic features arising from different situations and
contexts, while the latter involves linguistic variation related to an individual writer’s linguistic
choices, which is a widely-accepted classification now. Register and genre are in fact the two
terms that requires further elucidation. Drawing on the concepts of systemic-functional
linguistics, Biber and Conrad (2009) distinguished between genre and register:
Register variation focuses on the pervasive patterns of linguistic variation across such
situations, in association with the functions served by linguistic features; genre variation
focuses on the conventional ways in which complete texts of different types are
structures. Taken together, register/genre variation is a fundamental aspect of human

13

language. All cultures and languages have an array of registers/genres, and all humans
control a range of registers/genres. (p. 23)
Other genre researchers have also noted that genre is likely to be associated with its
relevant cultural context, while register concerns the immediate context of situation (Martin,
1993, 2001; Swales, 1990). More importantly, it has been noted that the analysis of register
variation begins with inductive text analysis that contributes to identifying different registers,
while genre variation is analyzed in terms of the occurrence of a particular rhetorical
organization that reflects the predicted structure of a genre (see Lee, 2001). Specifically, the two
unique features of genre are the use of external criteria such as communicative purpose and the
use of pre-identified categories. Biber and Conrad viewed register as the most important category
for text analysis because it fully recognizes linguistic features as units fulfilling situational
functions and all types of texts can be analyzed in terms of their linguistic features that
contribute to register variation. Their support for inductive text categorization based on register
variation is well aligned with their dedication to the multi-dimensional approach, which
identifies a set of co-occurring text features through factor analysis and assign composite scores
on each text to categorize them into different registers or text types. The current study, however,
does not involve any inductive grouping of texts based on a set of linguistic features; instead, it
focuses on potential changes in linguistic features across prearranged genres to identify the
linguistic representations of genre-specific communicative functions, and this is my rationale for
using the term genre as the feature under investigation.
The last, but important, typological feature of genre is its varying levels of generality
(Martin, 1993; Steen, 1999). This means that a particular genre can consist of multiple subgenres, each of which can function as a superordinate genre that include further sub-genres. In

14

this regard, drawing on prototype theory in cognitive science, Steen (1999) suggested that genre
could be conceptualized as having multiple hierarchies that include super-genre (superordinate
level), genre (basic-level), and sub-genre (subordinate level), pointing to the importance of
understanding the flexible nature of the level of generality in recognizing genres. The application
of this taxonomy to the focus of this study is presented in Table 1. Therefore, the two genres used
in this study are argumentative and narrative writing (more specifically, position-setting
argumentative and personal narrative writing). In one sense, timed writing can be considered too
specific to be a super-genre, but given the prevalence of timed writing in a wide range of
academic setting (e.g., standardized tests, placement tests, and in-class tasks), I believe that
timed writing merits a superordinate category that can be further divided into its genres and
subgenres.
Table 1.
Taxonomy of Genre in This Study
Classification

Examples relevant to the present study

Superordinate

Timed writing

(Super-genre)
Basic-level

Argumentative writing, narrative writing

(Genre)
Subordinate

Position-setting argumentative writing (agreement/disagreement), solution-

(Sub-genre)

suggesting argumentative writing (deciding on the best solution), personal
narrative writing, imaginative narrative writing, picture narrative writing ...

Writers are expected to fulfill different functions and communicative purposes in
different genres. Based on their primary rhetorical functions, written genres can be divided into
narratives and non-narrative types (Bruner, 1986); narratives entail an event description with a
focus on people’s actions in a specific time frame, while non-narrative essays involve the
15

argumentation or explanation of general ideas (Berman & Slobin, 1994). In this study, I target
two genres that elicit strikingly different communicative functions: timed argumentative and
timed narrative tasks (i.e., making arguments to convince readers in argumentative and telling an
interesting story to entertain readers in narrative). I use task type as a looser term when referring
to different writing tasks manipulated in terms of either genre or task complexity (idea support).
That is, different genres are always different task types, and the same genre can include different
task types when manipulated in terms of the condition of idea support.
Cross-genre L1 Studies
Over the past thirty years, there have been many L1 writing studies on genre differences.
This sustained attention to genre in L1 writing research reflects the implementation of varying
genres for assessing students in different grades, which is aligned with the state standards
(CCSS, 2017). As described above, children in different grade levels are expected to focus on
developing skills for different genres (i.e., K-5 students for narrative, opinion, and explanatory
genres; 6-12 grade students additionally for argumentative genre). The majority of L1 genre
studies have consistently demonstrated that children have greater difficulty in composing nonnarrative essays than narrative essays (e.g., Berman, 2008; Berman & Sobin, 1994; Hickman,
2003; Peterson & McCabe, 1983; Ravid, 2005), which was often interpreted as the consequence
of teachers’ tendency to use narrative tasks as major writing assignments for young learners
(Engelhard et al., 1992). Specifically, Ravid (2005) noted that children are capable of writing
personal narratives that entail people, events, and places, while young adolescents still have
difficulty with expository writing that requires abstract content knowledge, indicating an
expectation that children would not be able to accomplish argumentative tasks.

16

Previous studies provided further support for a genre-cognition connection by showing
higher essay scores in narratives than in non-narrative writing (e.g., Bouwer, Béguin, Sanders, &
van den Bergh, 2015; Crowhurst, 1980; Engelhard et al., 1992; Kegley, 1986; Sachse, 1984).
Kegley (1986), for example, reported varying proportions of adequate and inadequate writing
performance across four genres (description, narration, exposition, and persuasion). Kegley
collected data from seventh-grade students and categorized their competency as either adequate
performance (scores 2 or lower) or inadequate performance (scores 3 or higher) using a holistic
rubric (scores from 0 to 4). Her result showed the highest proportion of adequate performance in
narrative genre and the lowest proportion in persuasion (i.e., adequate performance proportion in
narration: 56%; description: 43%; exposition: 41%; and persuasion: 31%), suggesting that more
than one fifth of students may be given different evaluations and categorized in different
proficiency groups according to genre. Similarly, Engelhard et al. (1992) explored eighth-grade
students’ performance on the three genres of narrative, descriptive, and expository tasks, and
their results also demonstrated the highest scores on personal narratives and the lowest scores on
expository tasks. Some additional information from this study is that, unlike Kegley (1986),
Engelhard et al. employed an analytic rubric that includes content/organization, style, sentence
formation, usage, and mechanics and suggested that the effect of genre was stronger on
discourse-level development (i.e., content/organization and style) than on sentence-level
sophistication (sentence formation, usage, and mechanics).
Recently, Bouwer et al. (2015) analyzed 67 sixth-grade children’s writing and statistically
verified genre as an important factor explaining 11% of the variance in writing scores (i.e.,
higher scores in narrative tasks than argumentative tasks). The authors suggested genre
knowledge as one of the possible explanations for a clear genre effect on essay scores and,

17

specifically, assumed that children might have build more stable schemata for narrative writing
than those needed for argumentation. Based on the results of generalizability theory, they further
suggested that at least two raters should evaluate three texts in each of four genres to draw
generalizable writing proficiency. Given the practical impossibility of such a testing setting, their
conclusion seemed intended to warn us not to judge one’s writing proficiency based on one
writing performance.
Unlike the majority of L1 genre research findings that indicated significant genre effects
on essay scores, Beers and Nagy (2009), who collected data from 41 seventh and eighth grade
students, showed a different pattern. They explored the two genres of persuasive and narrative
writing for their holistic essay scores and syntactic complexity (clauses per T-unit, words per
clause, and words per T-unit). While having different levels of syntactic complexity (i.e., higher
syntactic complexity in persuasive essays than narratives), the students obtained similar essay
scores on the two genres. More strikingly, Quellmalz, Capell, and Chou (1982) analyzed
expository and narrative essays composed by high school students (those in eleventh and twelfth
grades) and found significantly higher ratings for expository writing than narratives. Noting the
older ages of their participants compared to other studies (i.e., high school students in contrast
with elementary or middle school students in other L1 genre studies), the authors interpreted this
unexpected finding as either the outcome of greater focus of the high school curriculum on
expository genre (strongly established schemata for expository writing) or raters’ varying
leniency across genres.
L1 genre research that focused on linguistic form variations has consistently
demonstrated that syntactic complexity tends to increase in non-narrative writing compared to
narrative writing (e.g., Beers & Nagy, 2009; Crowhurst & Piche, 1979; Ravid, 2005). For

18

example, an early L1 study by Crowhurst and Piche (1979) examined how syntactic complexity
measures (production unit length and subordination) differ across three genres (narrative,
descriptive, and argumentative writing). The authors found the highest syntactic complexity in
argumentative essays and the lowest in narratives, suggesting initial evidence of the variability of
syntactic complexity across genres. With more findings that support this pattern of language
variation, it has been concluded that child and adolescent writers modify their language across
genres to fulfill different rhetorical functions (e.g., Beers & Nagy, 2009; Berman &
Katzenberger, 2004; Ravid, 2005).
While much attention has been given to the dimension of syntactic complexity, there has
also been a body of research that focused on the effect of genre on lexical features (e.g., Gardner,
2004; Grobe, 1981; Olinghouse & Leaird, 2009; Olinghouse & Wilson, 2013). For example, in
their repeated-measures design study, Olinghouse and Wilson investigated how various
dimensions of lexical features would vary by genre (narrative, persuasive, and informative tasks)
and how such lexical features predict the writing quality of each genre. With regard to the effect
of genre on lexical features, Olinghouse and Wilson found that lexical diversity was the highest
in the narrative texts, while content vocabulary and elaboration were the highest in the
informative texts. There were no statistical differences among the three genres in the use of
academic words. In terms of the prediction of each genre’s text quality, the authors identified
lexical diversity as the strongest predictor of narrative writing quality and content vocabulary as
the strongest predictor of the quality of persuasive and informative writing. This result might
indicate a similar expectation of extensive use of topic-relevant content words for high quality
persuasive and information writing, which was not the case for narrative genre. The findings of
this study offered empirical evidence for varying lexical features elicited by different genres,

19

suggesting that the effects of genre might be on a wide range of linguistic features beyond the
traditional scope of syntactic complexity.
To summarize, the findings of L1 genre studies provide sufficient evidence for the
variation in writing performance across genres. Although there were a few exceptions (e.g.,
Beers & Nagy, 2009; Quellmalz et al., 1982), most studies have demonstrated higher scores on
personal narrative essays than on argumentative essays. In terms of language variation, it has
been suggested consistently that non-narrative essays tend to contain more complex language
than narratives. While informative, these findings need to be complemented with more
comprehensive findings because there is little evidence suggesting that higher ratings for
narrative tasks actually reflect learners’ better performance and lower challenges. Because the
majority of previous research followed the tradition of making inferences about cognitive
challenges from essay scores or linguistic features, future research needs to include independent
measures of learner perceptions (Révész, 2014; Sasayama, 2016) to better understand the
cognitive demands of distinct genres. Furthermore, a few genre studies that adopted an analytic
scoring rubric commonly found greater genre effects on discourse-level subscales (e.g.,
organization) than sentence-level ones (e.g., mechanics), indicating the greater sensitivity of
discourse-level scores to genre variation (e.g., Kegley, 1986; Quellmalz et al., 1982). In this
study, I attempt to shed further light on these areas by adopting an analytic scoring rubric and a
task perception questionnaire.
Cross-genre L2 research
So far, I have reviewed the literature of L1 genre studies. The findings from L1 research
have generally shown a significant impact of genre on essay quality (e.g., Bouwer et al., 2015;
Engelhard et al., 1992; Kegley, 1986; Quellmalz et al., 1982) and language use (e.g. Crowhurst

20

& Piche, 1979; Olinghouse & Wilson, 2013; Ravid, 2005), as well as the mediating effect of
genre on the relationship between essay quality and linguistic features (e.g., Beers & Nagy,
2009; Crowhurst, 1980; Olinghouse & Wilson, 2013). While it has been more than 30 years
since some early attempts to explore genre effects in L1 research, genre has begun to attract
researchers’ attention fairly recently in L2 writing research, and the major focus of L2 genre
studies has also been on the effect of genre on learners’ language use (e.g., Lu, 2011; Qin &
Uccelli, 2016; Yoon & Polio, 2017).
Findings from L2 genre research generally indicated that learner language in
argumentative writing tend to be more complex than that in narrative writing, which is well
aligned with findings from L1 studies. Specifically, using his own automated processing tool for
syntactic complexity, Lu (2011) examined the syntactic complexity of Chinese learners of
English in narrative and argumentative texts. According to Lu’s results, L2 learners showed
higher values of production unit length (e.g., mean sentence length and mean clause length) and
phrase-level syntactic complexity (e.g., complex nominals per clause and coordinate phrases per
clause) in argumentative essays than in narrative essays. The findings of L2 research that are
similar to those of L1 research may indicate that both L1 and L2 writers have a certain level of
genre awareness, leading to language variations arising from genre-specific communicative
functions. Based on such notable linguistic changes across genres, L2 studies have also
confirmed the role of genre as an important task variable that should be taken into account when
research explores language development (e.g., Yoon & Polio, 2017).
Unlike the consistent findings regarding the association between genre and language,
previous L2 research into the effect of genre on text quality have suggested mixed findings (e.g.,
Hamp-Lyons & Mathias, 1994; Jeong, 2017; Qin & Uccelli, 2016; Way et al., 2000). For

21

example, Way et al. (2000) explored three different tasks (descriptive, narrative, and expository
writing) composed by low-level L2 French learners and found the lowest essay scores on
expository and the highest scores on descriptive writing. Focusing on task-internal challenges,
the authors concluded that the expository task might have been most challenging and the
descriptive task least challenging for low-level L2 learners. The finding of Hamp-Lyons and
Mathias (1994) showed the opposite direction of genre effects on writing scores (i.e., higher
holistic scores on argument/public writing tasks than expository/private tasks). Unlike Way et al.,
Hamp-Lyons and Mathias focused on task-external features such as raters’ perceptions of the
prompts and interpreted their results as the outcome of raters’ adjustment of rating severity based
on perceived task difficulty (e.g., assigning higher scores for argument tasks that are perceived as
more difficult by raters).
In addition, there have been recent studies that showed a different picture of genre effects
by taking into account additional variables such as writers’ L2 proficiency (Jeong, 2017; Qin &
Uccelli, 2016). For example, Jeong examined the narrative and expository essays written by 180
Korean learners of English at three different proficiency levels (60 students from each of the
novice, intermediate, and advanced levels). Based on the results of a multi-faceted Rasch
analysis, Jeong showed there is no significant difference in EFL writing scores between the two
genres; instead, the author suggested a significant interaction between genre and L2 proficiency.
Specifically, it was revealed that beginning writers tended to obtain higher scores on narrative,
while advanced writers tended to have higher scores on expository writing, suggesting the
complex nature of the role of genre in writing performance and the need to take into account L2
proficiency in exploring genre effects.
Qin and Uccelli (2016) investigated how secondary-school Chinese EFL students

22

perform differently on argumentative and narrative essays. Analyzing 200 texts produced by 100
EFL students, they showed that the students’ writing performance, which was measured using a
holistic scoring rubric, did not significantly differ by genre. Despite no clear effect of genre on
writing scores, the authors found that the quality of each genre was best predicted by a different
set of textual features; narrative writing quality was predicted by stance marker frequency, while
argumentative writing quality was predicted by lexico-syntactic complexity and organization
marker diversity. To summarize, there have been some interesting, but conflicting, findings from
L2 genre studies in terms of the influence of genre on essay scores. While different studies
interpreted their findings with different foci (e.g., rater severity, task difficulty, and L2
proficiency), one methodological commonality of these L2 studies is its reliance on holistic text
scores (possibly due to practical reasons), but as shown by some early L1 studies (e.g., Kegley,
1986; Quellmalz et al., 1982), different categories of a scoring rubric have varying levels of
sensitivity to genre variation, indicating the need to employ an analytic rubric in L2 genre
research for the identification of more specific patterns and, more generally, for the advancement
of the field.
Given the consistent findings of language changes across genres and somewhat
contrasting findings of essay score changes, there is one important question that remains to be
resolved in the area of L2 genre research: how can we determine the reason for these wellattested language differences across genres? To provide empirical evidence related to this
question, Yoon and Polio (2017) analyzed linguistic complexity, accuracy, and fluency (CAF)
features in the narrative and argumentative essays composed by 37 ESL students and 46 native
English-speaking college (NS) students. The starting point of this study was the premise that
ESL students would have greater cognitive pressure for timed writing than NS students due to

23

their limited command of the language. Then, based on the cognition hypothesis (Robinson,
2001b, 2005, 2007) suggesting that L2 learners’ greater use of attentional resources for language
forms in a more cognitively demanding task would lead to their use of more complex language,
Yoon and Polio predicted that ESL students’ language would be influenced more strongly by
genre than that of NS students if different genres in fact pose greatly different cognitive demands
on writers. On the other hand, if both ESL and NS writers show similar genre effects, it may
provide evidence that different linguistic features elicited from different genres are indicative of
their fulfillment of genre-specific functional needs.
The results of Yoon and Polio showed similar patterns of language differences from both
groups, and the authors concluded that language variation across genres may be better explained
as the outcome of different communicative functions expected in different genres than genrespecific cognitive demands. For example, narrative writing is likely to contain more personal
pronouns, while argumentative include more nominalizations and nominal post-modifiers,
leading to higher linguistic complexity in argumentative writing than in narratives (see Biber &
Conrad, 2009 for a detailed description of discourse features). Yoon and Polio paved the way for
questioning the validity of linking genre and cognitive demands, but there is still a need to use an
independent measure of a writer’s task perceptions (Révész, 2014) in order to clearly disentangle
genre effects on linguistic features from those on learners’ perceptions in written discourse. In
the following sections, I review task-based writing studies that examined the effect of various
cognitive complexity variables on L2 learners’ language use and justify the adoption of
supporting idea provision as a target task complexity dimension in this study.

24

Task-based Writing Studies
Over the past decade, there has been an increase in the number of L2 researchers who
have shown an interest in the effects of task complexity on learners’ language. Task complexity,
defined as the “attentional, memory, reasoning, and other information processing demands
imposed by the structure of the task on the language learner” (Robinson, 2001a, p. 29), has been
argued to influence the amount of attentional and cognitive resources available for language
constructions during task performance (Robinson, 2003; Skehan, 1998). To gather evidence of
potential task complexity effects on language production, L2 researchers have explored how the
manipulation of various task features (e.g., planning time availability: Yuan & Ellis, 2003;
number of elements: Kuiken & Vedder, 2011; here-and-now: Gilabert, 2007), identified by
Robinson’s Triadic Componential Framework (Robinson, 2001b, 2005, 2007), can lead to
changes in traditional CAF measures (see Housen, Kuiken, & Vedder, 2012; Norris & Ortega,
2009 for a review of CAF). While most research initially focused on the effects of cognitive task
complexity on oral language production, authors of L2 writing studies began to examine
cognitive task complexity to see how it interplays within written discourse (e.g., Ellis & Yuan,
2004; Frear & Bitchener, 2015; Johnson et al., 2012; Kormos, 2011; Kuiken & Vedder, 2007,
2008; Ong, 2014; Ong & Zhang, 2010; Révész et al., in press).
In cognitively-oriented TBLT studies, manipulations of task complexity are expected to
create differing cognitive demands in the conceptualization stage that may lead to changes in the
amount of attentional resources allocated to language constructions and, accordingly, in the
complexity level of linguistic forms (see Robinson, 2001b, 2005). In terms of the causal
relationships between task complexity (with regard to cognitive demands) and language
production, two competing hypotheses—cognition hypothesis (Robinson, 2001a, 2001b, 2005,

25

2007) and trade-off hypothesis (also referred to as limited attentional capacity model; Skehan,
1998, 2009; Skehan & Foster, 2001)—suggest different explanations of how varying cognitive
demands of language tasks lead to a difference in task performance. Specifically, Robinson’s
cognition hypothesis presumes that there are multiple dimensions of attentional resources that
language learners can access simultaneously. Dividing task complexity into resource-directing
and resource-dispersing dimensions, Robinson (2001a, 2005) argued that increasing task
complexity along resource-directing dimensions (e.g., adding more elements or increasing
reasoning demands) leads learners’ attentional resources to complex language constructions,
facilitating language development. On the other hand, increasing task complexity along resourcedispersing dimensions (e.g., no planning time) imposes greater demands on attentional and
working memory resources, leading to learners’ dispersed attention to language formulation.
In contrast, Skehan’s trade-off hypothesis (Skehan, 1998, 2009; Skehan & Foster, 2001)
presupposes that learners have limited attentional resources and working memory; during task
performance, learners are not capable of attending to content and language at the same time. In
other words, paying attention to one area leads to the reduced attentional resources available for
other areas. Thus, more complex tasks requiring higher conceptual demands direct learners to
focus less on linguistic aspects. The limited amount of attentional resources available for
language further leads to a competing relation between linguistic complexity and accuracy. Of
the two dimensions of cognitive complexity (i.e., resource-directing and resource-dispersing
dimensions), the discrepancy between the two hypotheses exists mainly in terms of the effect of
resource-directing cognitive demands on language production. Therefore, there has been a larger
body of task-based writing research into cognitive complexity effects along the resourcedirecting dimension in an attempt to test the predictions of each of the two hypotheses (e.g.,

26

Frear & Bitchener, 2015; Kormos, 2011, 2014; Kuiken & Vedder, 2007, 2008; Ruiz-Funes, 2015;
Tavakoli, 2014; Yang, 2014; Yang, Lu, & Weigle, 2015), as compared to those along the
resource-dispersing dimension (e.g., Ellis & Yuan, 2004; Ishikawa, 2007; Johnson et al., 2012).
We can view task manipulations in the written modality with regard to genre, classifying
them into within-genre or cross-genre manipulations (Polio & Yoon, 2016). Of several withingenre variables (e.g., here-and-now, number of elements, and reasoning demands), this study
focuses on the level of conceptual demands operationalized as the provision of supporting ideas.
Previous research has explored how varied conceptual demands lead to a difference in written
language production (e.g., Kormos, 2011; Kormos & Trebits, 2012; Ong & Zhang, 2010; Révész
et al., in press; Tavakoli, 2014), with the prediction that a task with greater complexity at the
level of idea conceptualization would lead a writer to formulate more complex language (see
Robinson, 2001b, 2005). First, picture-based writing tasks have been adopted for
conceptualization-level manipulations (e.g., Kormos, 2011; Kormos & Trebits, 2012; Tavakoli,
2014); the picture narration task that requires participants to develop a story plot based on the
pictures given in random order is considered more complex than the cartoon description task that
provides a clear storyline. For example, Kormos (2011) explored how the difference in
conceptualization demands affected linguistic- and discourse-level features in NS and NNS
writing, and found that the writers showed increased lexical sophistication and connective use in
a more complex task (i.e., random-order picture narration) but no difference in lexical diversity,
accuracy, syntactic complexity, and cohesion. While adopting more fine-grained accuracy
measures (ratio of error-free clauses, ratio of error-free relative clauses, error-free verbs, and
error-free past-tense verbs), Kormos and Trebits (2012) still revealed no task complexity effect
on linguistic accuracy.

27

The variable of conceptualization demands has also been operationalized as the provision
of supporting ideas in argumentative writing (e.g., Ong, 2013, 2014; Ong & Zhang, 2010;
Révész et al., in press). For example, Ong and Zhang (2010) manipulated multiple task variables,
one of which was idea support (i.e., three tasks: no ideas given, ideas given, and both ideas and
macro-structure given, in order of decreasing cognitive demands). Their findings showed
significant effects of idea support on lexical diversity (more complex tasks leading to greater
lexical diversity) but no effect on fluency. A recent study by Révész et al. also attempted to
examine the effect of idea support on linguistic complexity (lexical sophistication, lexical
diversity, and syntactic complexity), as well as on participants’ writing behaviors (pausing and
revision behaviors) using the keystroke logging software. For this purpose, the authors collected
data from advanced-level participants (CEFR C1 level). The text production results of this study
showed that task complexity had a significant influence on lexical sophistication, but no clear
effect on lexical diversity and syntactic complexity. Therefore, while both studies examined the
effect of idea support on linguistic features in argumentative genre, their findings exhibited
somewhat contrasting patterns. From the findings of previous studies on conceptual demands, we
can conclude tentatively that the manipulation of idea support does not exert prevalent effects on
linguistic features and that studies have found different patterns of idea support effects on
language due to task-internal (genre: Kormos, 2011; Révész et al, in press) or learner-internal
factors (L2 proficiency: Ong & Zhang, 2010; Révész et al, in press). Particularly, given
potentially different patterns of idea support effects across genres, I attempt to explore both genre
and idea support as target task variables and illuminate their distinct effects on L2 learners’
language use and perceptions, which would enable us to gain a comprehensive picture of task
manipulation effects in written discourse.

28

Task-based Studies with Cross-genre Manipulations
In exploring the effect of reasoning demands on language use, a few studies have
operationalized genre as a task complexity variable (e.g., Ruiz-Funes, 2014, 2015; Yang, 2014).
This line of research is based on the prediction that argumentative genre that involves logical
causal reasoning would be more cognitively demanding to L2 learners than narrative genre. For
example, Yang (2014) examined the four genres of narrative, expository, expo-argumentative and
argumentative essays (in order of increasing cognitive complexity) composed by adult Chinese
EFL learners. Using CAF values as the outcome of task complexity effects, Yang found the
lowest values of lexical density and syntactic complexity (e.g., unit-length and phrasal
coordination measures) in narrative writing and the highest values of the same syntactic
complexity measures in the argumentative task, but no significant genre effects on fluency,
accuracy, and some of the complexity measures (e.g., clausal coordination and subordination
measures). Based on her findings with a pattern of increased linguistic complexity in the
argumentative task, she concluded that her findings provide partial evidence for Robinson’s
cognition hypothesis.
Additionally, Ruiz-Funes (2015) reported on the findings of two repeated-measures
studies regarding task complexity effects on CAF measures: one that explored the writings of
foreign language learners of Spanish at an advanced proficiency level and the other at an
intermediate level. The participants in the first study were required to complete two writing tasks
(analytic and argumentative essays), in which the analytic writing task was predicted to be less
cognitively demanding than the argumentative task. In her second study, the participants also
completed two writing tasks (personal narrative and expository essays) that concerned the shared
topic of study abroad. As regards their relative task complexity, personal narrative writing was

29

operationalized as the low-complexity task and expository writing as the high-complexity task,
with an a priori assumption that providing a thesis and relevant evidence would be more
cognitively demanding to L2 learners than narrating personal experience. While there were no
statistically significant effects of genre on linguistic features due to a small sample size (study 1:
N = 8; study 2: N = 24), the author found a pattern of increased complexity but lower accuracy
and fluency in the writing tasks operationalized as more complex. Ruiz-Funes interpreted these
results as evidence in support of Skehan’s trade-off hypothesis.
Frear and Bitchener (2015) is another task-based writing study that examined how L2
writers of English show different syntactic and lexical complexity features in three letter-writing
tasks manipulated in terms of reasoning demands and number of elements. One methodological
issue in this study was that the level of reasoning demands was manipulated only to differentiate
between the low- and medium-complexity tasks (but the number of elements manipulated for all
three tasks), resulting in a wider gap in cognitive demands between the low- and mediumcomplexity tasks, compared to that between the medium- and high-complexity tasks. Moreover,
while all three tasks were categorized as letter writing, the low- and medium-complexity tasks
seem different in their purpose of writing (i.e., the low-complexity task for description and the
medium-complexity task for persuasion); therefore, these two tasks can be regarded as distinct
genres: descriptive and persuasive writing in a letter format. The findings of this study indicated
L2 writers’ increased lexical diversity in the high-complexity task, compared to the lowcomplexity task. Interestingly, while showing no significant change in general subordination
(dependent clauses per T-unit), Frear and Bitchener found a significant decrease in a more
specific measure of subordination (adverbial clauses per T-unit) with the increase of task

30

complexity, pointing to the importance of exploring different types of subordination for a clearer
understanding of language use and development (Lambert & Kormos, 2014; Rimmer, 2008).
The methodological operationalization of these cross-genre task studies is similar to that
of the L1 and L2 genre studies reviewed earlier (e.g., Beers & Nagy, 2009, 2011; Crowhurst,
1980; Lu, 2011; Ravid, 2005; Yoon & Polio, 2017), as the majority of these studies involve the
analysis of linguistic feature changes across written genres. As a result, the studies of these two
lines have produced very similar results (e.g., higher syntactic complexity, particularly mean unit
length, in non-narrative writing than narrative). The two lines of research, however, have been
grounded in different assumptions. Specifically, task-based research into cross-genre effects
predicts that linguistic features would change notably due to varying levels of cognitive demands
imposed by different genres, whereas genre research focuses on functional and pragmatic
motivations for different linguistic features. As a result of these contrasting starting points, their
findings, although very similar, offer different types of implications. Specifically, the major
implication of task-based research would be related to how to promote language development
more effectively, so more complex language possibly leading to language development is greatly
valued in this research line (i.e., more complex language is better). On the other hand, genre
research has its implication for raising better awareness of genre-appropriate communicative
purposes and language (i.e., more complex language is not necessarily better).
In this regard, although originally framed as a task complexity study, Frear and Bitchener
(2015) argued that their findings might have resulted from functional, which they called
“pragmatic”, requirements of the tasks and participants’ personal language choices, which has
little to do with cognitive factors. Specifically, the authors suggested that their result of task
complexity effects only on a certain type of subordinate clauses (i.e., significant decrease in

31

adverbial clauses per T-unit in a more complex, persuasion-related task) can be better explained
as the outcome of different needs for clause types “as a means of fulfilling the pragmatic
requirements of the task” (p. 52).
Similarly, Polio and Yoon (2016) attempted to associate their linguistic complexity
findings in two genres (argumentative and narrative writing) with a set of functionally motivated
lexico-grammatical features (Biber & Conrad, 2009). Their findings indicated that longer
average words (higher lexical sophistication) in argumentative genre actually come from the
extensive use of nominalization (e.g., transportation and conclusion) in argumentative and that
of personal pronouns (e.g., I, my, and our) in narratives. Polio and Yoon also found that higher
production unit length and complex nominals in argumentative writing arise from the increased
use of nominal post-modifiers (e.g., that-relative clauses: the second reason that the cost of living
is; wh-relative clauses: students who live off campus are; and prepositional phrases: the
experience of living off campus) that are needed more in making logical arguments than in
narrating a personal story. Given this plausible explanation of the findings of cross-genre
research via functional needs, we need to clarify the roles of genre by exploring the impact of
genre differences on learners’ writing performance and task perceptions simultaneously.
Additionally, the inclusion of another variable of task complexity, which has been shown to
influence L2 writers’ conceptual constraints (i.e., provision of supporting ideas), will enable us to
understand how different task manipulations exert varying effects on writers’ perceptions and
production.
A way to achieve the separation between learners’ language production and perceptions is
to implement an independent measure of task complexity (and other task features such as task
difficulty and task motivation) aimed at examining whether task manipulations actually cause

32

intended cognitive effects (Révész, 2014; Révész, Michel, & Gilabert, 2016). Further, for a
meaningful comparison between perception and production results, the use of a wide range of
textual features (not limited to traditional CAF measures) would contribute to providing a fuller
picture of what textual features interact with learners’ task perceptions as a result of the
manipulation of genre and task complexity.
Validation of Task Complexity Manipulations
As argued by Révész (2014), the majority of cognitively-oriented TBLT studies have
focused on testing how their findings correspond to the existing task-based hypotheses: the
cognition hypothesis (Robinson, 2001a, 2001b, 2005, 2007) or the trade-off hypothesis (Skehan,
1998, 2009). Thus, by manipulating task complexity variables in keeping with preexisting
assumptions, many researchers have predicted that their participants would experience differing
levels of cognitive demands imposed by different tasks, which underlies their language changes
(e.g., Kormos, 2011; Kormos & Trebits, 2012; Kuiken & Vedder, 2007, 2008, 2011; Ong, 2013,
2014; Ong & Zhang, 2010; Ruiz-Funes, 2014, 2015; Yang, 2014). However, as discussed above,
it should not be presumed that participants perceive and perform different language tasks in full
accordance with researchers’ intention without an independent perception measure (Révész,
2014). To address the possible incongruence between intended and actual task effects, recent
studies began to employ a separate measure of cognitive demands via Robinson’s (2001a, 2007)
self-rating questionnaire items (e.g., Malicka & Levkina, 2012; Révész, Sachs, & Hama, 2014).
In implementing an independent measure of task features, researchers asked their participants to
perform a language task and then to complete their self-ratings of perceived task qualities such as
task difficulty and task interest.

33

Thus far, there have been two studies that attempted to validate the impact of task design
manipulations on cognitive complexity (Révész et al., 2016; Sasayama, 2016). The authors of the
two studies employed very similar methodologies in their validation process and commonly
provided evidence for the validity of a self-rating questionnaire. Specifically, Révész et al.
examined how three techniques (dual-task methodology, self-ratings, and expert judgments)
assess the task complexity of three oral tasks (a picture narrative, a map task, and a decisionmaking task in order of increasing task complexity). In dual-task methodology, participants are
required to complete a primary task simultaneously with a secondary task (e.g., reacting to
background color changes) with the prediction that participants will show inferior performance
on the secondary task (e.g., slower response or lower accuracy) if the primary task is more
complex. Self-ratings and expert judgments are subjective measures of task complexity by way
of questionnaires. The authors’ findings suggested that participants’ subjective self-ratings were
consistent with the intended manipulations of task complexity (i.e., more complex tasks rated as
more complex and difficult) as well as with other validation techniques, confirming the high
validity of subjective self-ratings for assessing the function of task complexity manipulations.
Sasayama employed four oral narrative tasks manipulated in terms of the number of elements
(different numbers of characters in a story) as a primary task, together with a secondary task of
reacting to a color change. Sasayama used participants’ reaction time, estimated time for task
completion, and self-ratings as independent measures of cognitive complexity, and revealed that
large differences in task complexity (e.g., between the simplest task and the most complex task)
could be detected by all of the measures. Of the three measures, participants’ self-ratings were
found to detect cognitive task complexity with the largest effect sizes. These findings from both

34

studies assured me to employ a self-rating questionnaire as a major independent measure to
assess L2 students’ perceptions of writing tasks in this study.
Text Analysis in TBLT Research
Traditionally, task-based writing studies have examined CAF measures to test the impact
of task feature manipulations (e.g., Ishikawa, 2007; Kormos, 2011; Kuiken & Vedder, 2007,
2008; Ong & Zhang, 2010; Ruiz-Funes, 2015; Tavakoli, 2014; Yang, 2014), providing evidence
for either of the two competing task complexity hypotheses (i.e., Robinson’s cognition
hypothesis and Skehan’s trade-off hypothesis). For example, Kuiken and Vedder (2008)
examined the letters composed by foreign language learners of Italian and French in terms of
syntactic complexity (clausal subordination measures), lexical diversity (type-token ratio
measures), and accuracy (number-of-error measures). In their study, the level of task complexity
was operationalized as the number of requirements to be considered for choosing the destination.
The results of this study indicated that learner language tends to be more accurate (fewer errors)
in a more complex task, while there was no significant change between the two tasks in syntactic
complexity or lexical diversity. With these results, the authors concluded that their findings offer
evidence for L2 writers’ greater attention to language forms when involved in a task more
complex along the resource-directing dimension, thus giving support to the cognition hypothesis
rather than to the trade-off hypothesis.
However, previous task-based research has produced contrasting findings in terms of task
complexity effects on CAF probably because different studies have employed different linguistic
measures for a single construct (e.g., linguistic accuracy operationalized as ratio of error-free
clauses by Kormos, 2011; number of errors per T-unit by Kuiken & Vedder, 2008), manipulated
task complexity inconsistently (e.g., conceptual demands operationalized as the availability of

35

storyline by Kormos, 2014; the number of storylines by Tavakoli, 2014), and adjusted the level
of task complexity in different genres (e.g., planning time in narrative by Ellis & Yuan, 2004;
planning time in argumentative by Johnson et al., 2012). Furthermore, as discussed above,
linguistic features in written discourse are likely to reflect functional needs demanded by a
particular task (Frear & Bitchener, 2015; Yoon & Polio, 2017), and TBLT researchers need to go
beyond comparing their findings to the two task complexity hypotheses (Kormos & Trebits,
2012) in order to gain a more comprehensive understanding of what aspects of task features
motivate notable changes in learner language. These arguments clearly point to the importance of
conducting a more comprehensive analysis of textual features in the domain of genre and task
complexity research.
In fact, there have been some L2 writing studies that showed significant task type effects
on various discourse-level text features (e.g., stance markers: Biber, 2006b; Qin & Uccelli, 2016;
Hong & Cao, 2014; temporal cohesion: Kormos, 2011; quantity of ideas: Ong, 2013). For
example, the study by Kormos (2011), which I described above, examined cohesion features
(causal, temporal, and spatial cohesion features based on the use of connectives, particles, nouns,
and verbs) together with widely-used syntactic complexity, lexical complexity, and accuracy
measures. The finding of this study indicated that L2 writers tend to use more temporal and
logical connectives in the simple condition (narrative task with a given storyline), which was
interpreted as the outcome of L2 writers’ greater attentional resources available for an explicit
indication of cohesive relations in a cognitively less demanding task.
Another plausible area of textual analysis in TBLT research is the use of interactional
metadiscourse features (K. Hyland, 2005), which corresponds to increasing attention to the role
of stance, authorial voice, or writer-reader interaction (all of which are closely related to the

36

concept of interactional metadiscourse) in academic writing (Biber, 2006b; Hong & Cao, 2014;
Jeffery, 2009; Yoon, in press; Zhao, 2012, 2017; Zhao & Llosa, 2008, among others). In
examining quantifiable, text-based features of interactional metadiscourse, researchers have
based their studies on Hyland’s (2005) model of interactional metadiscourse (e.g., Aull &
Lancaster, 2014; Hong & Cao, 2014; Yoon, 2017a; Zhao, 2012, 2017). Hyland’s model consists
of stance (one side including categories such as hedges, boosters, attitude markers, and selfmention) and engagement (the other side including categories such as reader pronouns,
questions, and directives). In brief, stance involves writer-oriented linguistic features, helping
writers to present their opinions and feelings toward a proposition. On the other hand,
engagement involves reader-oriented features, intending to perform the recognition of readers’
presence and invitation of them as discourse participants.
One study particularly relevant to the present study is Hong and Cao (2014), in which
two written genres (descriptive and argumentative essays) were examined with regard to the
occurrence of interactional metadiscourse features. Specifically, the authors investigated the
essays composed by Chinese, Spanish, and Polish EFL learners, and targeted the categories of
hedges, boosters, attitude markers, self-mentions, and engagement markers). The finding of this
study showed a significant effect of genre on the amount of hedges and self-mentions (increased
use of hedges and self-mentions in argumentative writing), and the authors explained this finding
as EFL writers’ tendency to take a tentative stance in making arguments. Given these findings in
the literature, we can see the necessity to examine textual discourse and metadiscourse features,
along with traditional syntactic and lexical features, for a fuller understanding of genre and task
complexity effects on written language production. I will discuss the target textual features of
this study and justify my selection of them in the Method section.

37

Rationale for the Present Study
In the previous chapters, I have discussed the importance of a valid operationalization of
genre and task complexity in L2 writing research, as well as the necessity of examining a wide
range of textual features. Of particular note is that, in task-based writing research, genre has been
employed as a task complexity variable with the a priori prediction that the argumentative genre
will be more difficult and challenging to L2 writers. However, I believe that researchers should
take into account learners’ experience and knowledge, as Johns (2008) noted in her review study:
When we read or write in a genre with which we are familiar, and for which we have a
schema, we instantiate our schema for what typifies that genre, its conventions, as we
read or write, and we use our knowledge of conventions as we produce a new text. The
conventions of a genre can refer to a variety of features: the text structure, the register,
the relationships between the writer and the audience. (p. 241)
What we can infer commonly from Johns’ explanation and Hayes’ (1996, 2012) model of writing
is that genre schemas do play an important role for learners’ writing performance in different
genres. Then, it can be further assumed that if adult ESL students are familiar with argumentative
writing, a typical genre in post-secondary education and testing, they will not find argumentative
writing more difficult or cognitively demanding than other genres such as narrative writing, thus
challenging a stereotype about genre-specific cognitive demands. It is even conceivable that, for
ESL students whose English writing practice have mostly been in settings of standardized test
preparation, narrating a story can be very challenging due to their limited schema for this
unfamiliar genre.
To address these issues, I explore shared and unique effects of genre and task complexity
manipulations on ESL students’ language production and task perceptions. For task perception

38

data, I use a self-rating questionnaire that has been shown to measure participants’ cognitive
processes validly (Révész et al., 2016; Sasayama, 2016). Here, by collecting data from ESL
teachers and ESL students, I attempt to reveal a potential gap between ESL teachers’
expectations of different task types and ESL students’ actual perceptions of the tasks.
As regards text analysis, I do not limit the focus of this study to traditional CAF measures
or to the validity of the competing two task complexity hypotheses, but rather I examine a wide
array of textual features at multiple construction levels (i.e., syntactic, lexical, discourse, and
metadiscourse) to shed light into the interaction between learners’ linguistic features and task
perceptions, thereby explicating distinct roles of communicative functions and cognitive
constraints in shaping particular linguistic features. Additionally, I examine how a group of
textual features predict essay scores of each genre distinctively in an attempt to gain a detailed
picture of task type effects on L2 written language production and writing performance. For the
benefit of speed and reliability, I take advantage of automated processing techniques, which I
will introduce in detail in the Method section.
This study is grounded in Hayes’ (1996, 2012) model of writing that fully recognizes
moderating effects of L2 writers’ linguistic, genre, and task knowledge on writing processes and
performance. With regard to the relationship between L2 proficiency and task complexity effects,
unlike the majority of task-based writing studies that targeted L2 writers at one proficiency level,
Ruiz-Funes (2015) examined the essays written by advanced- and intermediate-level L2 learners,
and found an interaction between L2 proficiency and task complexity effects, that is, a clear
difference between the two learner groups in the way they responded to the writing tasks with
different levels of cognitive complexity. It is also plausible that a language task is already too
complex or too simple for a learner at a certain proficiency level to observe any significant

39

change in language induced by the manipulation of task complexity. For example, Frear and
Bitchener (2015) suggested that their limited task effect might have resulted from the incorrect
alignment of task manipulations with participants’ L2 proficiency. To take into account the
interplay of L2 proficiency and task variables for language production, several task-based studies
(e.g., Kuiken & Vedder, 2008; Sasayama, 2016; Yang, 2014) employed an additional measure of
L2 proficiency, a cloze test, encouraging me to make the same methodological decision of
implementing a cloze test in this study.
With this study, I intend to offer implications for L2 writing research, pedagogy, and
assessment. The present study will provide evidence of valid interpretations regarding the effect
of genre and task complexity manipulations in writing research. L2 writing instructors and task
developers will be informed about the possibility of constructing independent writing tasks with
various genres and task complexity levels to achieve an appropriate alignment of task features
with target language learners. It has been suggested that learners’ performance varies across task
types (Bouwer et al., 2015). There is also a belief that argumentative writing is the most suitable
genre for assessment, making it a dominant genre in the testing context (Qin & Karabacak,
2010). With the findings of this study, I will offer insights into the relationship between
genre/task manipulations and valid writing assessment. The present study is guided by the
following research questions:
1. How do ESL students and teachers perceive writing tasks manipulated in terms of genre and
task complexity?
2. What are the effects of genre and task complexity on textual features in ESL writing?
2.1. How does ESL students’ L2 proficiency interact with task type effects on textual
features?

40

3. What are the effects of genre and task complexity on ESL writing scores?
3.1. How does ESL students’ L2 proficiency interact with task type effects on writing scores?

41

CHAPTER 3.
METHOD
Participants
Student participants. For this study, I collected data from 76 ESL students enrolled in
an English language program at a large U.S. university. The course levels in the program ranged
from 0 to 5, and I recruited participants from the highest level of the program. Level 5 courses,
which are also referred to as English for Academic Purposes (EAP) courses, are intended to
develop academic writing proficiency of international students who do not use English as their
native language. International students admitted to the university with an iBT TOEFL score
below 79 (paper-based test 550) are required to take the placement test administered by the
English language program (multi-skills test including reading, listening, and writing sections)
and are assigned to a certain course level based on their performance. One issue here is that some
level 5 students advanced from level 4, while others placed into level 5 based on their
performance on the placement test. Students assigned directly to level 5 are often more proficient
than those who moved up from a lower level, making me anticipate that some proficiency
variation exist among students in the same level. Therefore, I employed a cloze test as an
objective measure of L2 proficiency at the beginning of data collection.
The participants received four to six hours of L2 writing instruction (2 or 3 classes) per
week during a 15-week-long semester. The primary objectives of level 5 courses were to prepare
students for university-level academic courses and to help them build a clear understanding of
the audience in academic writing. The largest portion of the course grades involved multi-draft
essay writing and revision (50% to 70% of the course grade), indicating the emphasis of the
courses on writing processes. The specific goals of the courses were to develop students’

42

summarizing, paragraphing, and revising skills, as well as to guide them in accomplishing
several multi-draft and timed writing tasks. Quotation and citation skills were also targeted.
Much of the class time was used to develop various academic writing skills (e.g., how to
construct a well-developed paragraph), so students’ participation in this research was their major
practice for timed writing.
Based on the minimum test score needed for course enrollment as well as course
objectives, the majority of the participants could be described as ESL students at the high
intermediate or low advanced level, who are capable of meeting practical writing needs and
composing a multi-paragraph essay within 30 minutes. Forty-six male and 30 female students
participated, and they were all undergraduate students. Their ages ranged from 18 to 27, with a
mean of 19. They came from various countries (e.g., Angola, China, France, Japan, Malaysia,
South Arabia, South Korea, Taiwan, Thailand, and Turkey). Fifty participants spoke Chinese as
their L1, and seven were Arabic native speakers. The remaining participants were either native
speakers of Korean (n = 6), Japanese (n = 3), Portuguese (n = 3), Malay (n = 2), Thai (n = 2),
Turkish (n = 2), or French (n = 1). Their mean length of English study was 104.7 months, and
their mean length of staying in the United States was 15.1 months. From these responses, it can
be inferred that, despite their stay in the United States at the time of data collection, the
participants’ English learning had been mostly in their own countries, that is, English as a foreign
language (EFL) settings. Table 2 presents the demographic information of the student
participants.

43

Table 2.
Demographic Characteristics of the ESL Student Participants
Characteristics

N = 76

Age: Mean (SD)
Gender

Male

19.11 (1.45)
46

First language

Female
Chinese

30
50

Arabic
Korean

7
6

Japanese
Portuguese
Malay
Thai
Turkish
French
Length of English study (months): Mean (SD)
Length of stay in the United States (months): Mean (SD)

3
3
2
2
2
1
104.66 (48.04)
15.07 (15.36)

Teacher participants. For the comparison between students’ perceptions and teachers’
expectations of writing tasks, I collected survey data from 30 ESL teachers using an online
survey platform, Qualtrics. Most teachers were English native speakers (n = 26), and there were
four teachers whose native language was not English (Iranian: n = 1; Korean: n = 1; Russian: n
=1; and Turkish: n = 1). Their mean length of teaching English was 11.7 years (SD = 7.9 years),
and that of teaching English writing skills was 9.4 years (SD = 7.0 years). Their ages ranged
from 26 to 65 with a mean age of 39 (SD = 9.15). Twenty-three teachers were female, and seven
were male. While all teachers were teaching English in the United States at the time of data
collection, three teachers reported that their major field of English teaching had been collegelevel EFL contexts (i.e., the major setting of 27 teachers had been college-level ESL classrooms).

44

Instruments
Questionnaires. I devised questionnaires for participants’ background information and
task perceptions. The background questionnaire includes the items related to participants’
demographic information (e.g., age, gender, educational background, language background, and
length of stay in the United States). The questionnaire for ESL students’ task perceptions
contains six statements relating to each writing task. The six items aim to measure how
participants perceive (1) the mental effort induced by the task (task complexity), (2) task
difficulty, (3) task anxiety, (4) task confidence, (5) task interest, and (6) task motivation (adapted
from Robinson, 2001a; Révész et al., 2016). Traditionally, Robinson (2001a, 2007) assessed the
five dimensions with no inclusion of the item related to the mental effort, but following the
suggestion by Révész et al. (2016), I differentiated between task complexity and task difficulty in
the questionnaire so that their distinct constructs can be fully captured (Brünken, Seufert, &
Paas, 2010). Immediately after completing each writing task, participants were asked to judge
each statement on a 9-point Likert scale. The following were the exact items:
This task required no mental effort at all.

1-2-3-4-5-6-7-8-9

This task required extreme mental effort.

This task was not difficult at all.

1-2-3-4-5-6-7-8-9

This task was extremely difficult.

I felt really relaxed doing this task.

1-2-3-4-5-6-7-8-9

I felt frustrated doing this task.

I didn’t do well on this task.

1-2-3-4-5-6-7-8-9

I did well on this task.

This task was not interesting at all.

1-2-3-4-5-6-7-8-9

This task was very interesting.

I don’t want to do more tasks like this.

1-2-3-4-5-6-7-8-9

I want to do more tasks like this.

The teacher participants were given the online survey that contains Likert scale items for
task perceptions and open-ended response items for additional comments. For a direct
comparison with the perception results of the students, it was clearly stated at the beginning of
the teacher survey that the writing tasks were designed as 30-minute timed writing tasks for L2
45

learners at high-intermediate or low-advanced proficiency levels (i.e., level 5 students in the
English language program). For teacher perceptions of the tasks, I asked the teacher participants
to read each of the writing task prompts carefully and complete six questionnaire items
measuring the same constructs as those targeted by the student survey (i.e., task complexity, task
difficulty, task anxiety, task confidence, task interest, and task motivation).
To attain this goal of tapping the same constructs, instead of constructing new
questionnaire items, I simply manipulated the wording of the task perception statements. For
example, I modified a task difficulty statement This task was not difficult at all to a hypothetical
statement This task will not be difficult for ESL students at all, and a task motivation statement I
don’t want to do more tasks like this to ESL students will not want to do more tasks like this. Four
sets of Likert scale items were prepared for the four writing tasks, and these sets were given to
the teacher participants in a randomized order, which was an attempt to control for potential
sequence effects on task judgments. The exact items were as follows:
This task will require no mental

1-2-3-4-5-6-7-8-9

effort of ESL students at all.
This task will not be difficult for

mental effort of ESL students.
1-2-3-4-5-6-7-8-9

ESL students at all
ESL students will be really relaxing

1-2-3-4-5-6-7-8-9

1-2-3-4-5-6-7-8-9

ESL students will do well on this
task.

1-2-3-4-5-6-7-8-9

ESL students at all.
ESL students will not want to do

ESL students will be frustrated
doing this task.

this task.
This task will not be interesting to

This task will be extremely difficult
for ESL students

doing this task.
ESL students will not do well on

This task will require extreme

This task will be very interesting to
ESL students.

1-2-3-4-5-6-7-8-9

more tasks like this.

ESL students will want to do more
tasks like this.

46

The open-ended questions included in the teacher survey were about (1) their impression
of the writing tasks, (2) reasons for different task perceptions, and (3) possibility of using the
tasks in their class. It took approximately 30 minutes for the teachers to finish the survey. In
return for their participation, they received a $10 gift certificate via email.
Writing prompts. I developed two argumentative and two narrative writing prompts.
The argumentative prompts required participants to make logical arguments on foreign language
learning and use. The narrative prompts involved narrating a personal story related to a similar
topic. When developing the task prompts, I consulted three SLA experts and one test developer,
and improved the quality of the prompts based on their comments and recommendations. Within
each genre, I manipulated the level of conceptual demands operationalized as the provision of
supporting ideas. The writing tasks with lower conceptual demands were given to the
participants with some information (i.e., example storylines for narrative and main ideas for
argumentative) that they could utilize while writing. I intended for the ESL students to find the
tasks with idea support cognitively less demanding and less difficult than the tasks with no such
support. Throughout the manuscript, the argumentative and narrative prompts with idea support
are labeled as Arg/+Support and Nar/+Support respectively (then, the prompts with no idea
support as Arg/-Support and Nar/-Support).
To avoid potential topic effects on learner language (e.g., Hinkel, 2002; Tedick, 1990), I
devised the prompts that shared the topic of foreign language learning or use, but, at the same
time, to minimize task repetition effects, I attempted to develop somewhat distinct prompts.
Specifically, a narrative prompt with no idea support (Nar/-Support) elicited a positive
experience related to foreign language use (Tell a story about ONE of your positive experiences
related to foreign language use), while the other narrative prompt (Nar/+Support) elicited a

47

difficult experience related to interactions using a foreign language (Tell a story about ONE of
your difficult experiences related to interactions using a foreign language). For argumentative
writing, one prompt (Arg/-Support) involved the necessity of using a foreign language fluently in
the globalized era (Write an essay about whether you agree or disagree with the statement about
the necessity of foreign language abilities), while the other (Arg/+Support) entailed the
relationship between the ability to speak a foreign language and the possibility of having a
successful life (Write an essay about whether you agree or disagree with the statement about the
relationship between foreign language abilities and success) (Appendix A for the full prompts).
Rubric. The essays were evaluated using a revised analytic scale (Polio, 2013), adapted
from the ESL composition profile (Jacobs et al., 1981) that comprises the five subscales of
content, organization, vocabulary, language use, and mechanics (see Appendix B). According to
Connor-Linton and Polio (2014), the revised rubric includes the same five categories as the
original scale, but their descriptors and weighting were modified to better reflect what trained
raters had noted regarding actual changes in L2 writing skills over time. The full score of each
subscale is 20 points, except for one subscale, mechanics, whose full score is 10 points (i.e., total
score of the rubric = 90 points); this unequal weighting is based on much fewer points for
mechanics in the original Jacobs et al. rubric.
The content subscale evaluates a full development of ideas, inclusion of detailed and
interesting content, and topic relevance. The organization subscale assesses overall organization,
clear thesis statement, coherence (unity within and across paragraphs), and cohesion (use of
connectors and transition words). The vocabulary subscale includes descriptors related to lexical
sophistication, lexical accuracy, idiomatic vocabulary use, and academic register. The language
use subscale evaluates syntactic complexity, syntactic variety, and syntactic and morphological

48

accuracy. The last subscale, mechanics, evaluates paragraph indentation, spelling accuracy, and
punctuation accuracy. This revised analytic rubric was chosen because this study aims to reveal
the effect of genre and task complexity manipulations on various categories of writing scores,
and because this particular rubric was found to produce more reliable and valid scores than other
rubrics (Connor-Linton & Polio, 2014; Polio, 2013).
Procedures
Data collection. After obtaining approval from the IRB, I contacted ESL teachers who
were teaching levels 5 writing courses at the English language program. Five instructors teaching
a level 5 writing course (course names: ESL 220 and ESL 221) gave me permission to use their
class time for data collection. After discussing my study details with the instructors, I visited
their classes to obtain permission from students. All writing tasks were administered to all
students as a part of the classroom curriculum, which helped students prepare for their final
timed writing exam. All students were informed that research participation was completely
voluntary and that I would not have access to any of the essays without their consent. I also
informed students that their participation would be compensated with a $15 gift certificate and
my feedback on their essays (specifically, receiving error-code feedback within a week from a
writing session). To make students take this study seriously, I also told them that in each class
two students who write the best essays would be selected and given a $25 gift certificate. For
students who were not willing to participate, their instructors were going to give a similar type of
feedback so that no one would be at a disadvantage, but all students enrolled in the five writing
courses agreed to participate.
To minimize potential testing effects from a repeated-measures design, I collected data at
one-week intervals, with the order of the writing prompts fully counterbalanced (see Table 3 for

49

the summary of data collection procedures). I only used data from the participants who
completed all four writing sessions, and the essays written by the students who missed one or
more sessions were excluded from analysis. This process led four students to be excluded (from
79 students originally to a final sample of 76 students).
Table 3.
Counterbalanced Data Collection Procedures
Group

Week 1

Week 2

Week 3

Week 4

Week 5

A

Cloze

Nar/-Support

Arg/-Support

Nar/+Support

Arg/+Support

Task survey

Task survey

Task survey

Task survey

(n = 20)

Background
B

Cloze

(n = 19)

Arg/+Support

Nar/-Support

Arg/-Support

Nar/+Support

Task survey

Task survey

Task survey

Task survey
Background

C

Cloze

(n = 18)

Nar/+Support

Arg/+Support

Nar/-Support

Arg/-Support

Task survey

Task survey

Task survey

Task survey
Background

D
(n = 19)

Cloze

Arg/-Support

Nar/+Support

Arg/+Support

Nar/-Support

Task survey

Task survey

Task survey

Task survey
Background

Note. Arg/-Support = argument task with no idea support; Arg/+Support = argument task with
idea support; Nar/-Support = narrative task with no idea support; Nar/+Support = narrative task
with idea support.
In the first week of data collection, to assess students’ general English proficiency, I
implemented a cloze (fill-in-the-blank) test, which was developed and validated by Brown
(1978). I decided to use a cloze test as a measure of L2 proficiency because previous research
has suggested adequate validity of cloze tests for assessing general language proficiency (e.g.,
Brown, 2002; Fotos, 1991; Hinofotis, 1980; Tremblay, 2011). The cloze test adopted in this study
is Man and His Progress that includes 399 words and 50 blanks (deletion pattern of every 7th
word; see Appendix C). Participants were given clear instructions and an example of how to fill

50

in the blanks, and then they had 30 minutes to complete the cloze test. The answers for the cloze
test are scored using the acceptable-answer method that marks all contextually acceptable items
as correct answers. I chose this scoring method because it was found to surpass other methods,
such as the exact-answer or multiple-choice techniques, in validity, reliability, and item
discrimination (Brown, 1980).
I evaluated students’ performance on the cloze test using an answer sheet adapted from
Yang (2014). The result of the cloze test was found to be reliable (Cronbach’s α = .84),
indicating its consistency in distinguishing among the participants (M = 28.20, SD = 7.55). For
follow-up analyses that include L2 proficiency as a predictor variable, I categorized the student
participants into different proficiency groups. Twenty-nine students who received cloze test
scores equal or higher than 31 were assigned into the high proficiency group, while 28 students
who received scores equal or lower than 25 were assigned into the low proficiency group (see
Table 4). Nineteen students whose scores were in between those of the two groups were
excluded for these analyses in order to assure a greater gap between the proficiency groups.
While dividing participants into separate groups based on performance scores (i.e.,
categorization of a continuous variable) involves the risk of losing much statistical power (see
Plonsky & Oswald, in press), the sample size of each group still appears to be adequate for
inferential statistics (29 for the high proficiency group and 28 for the low proficiency group), and
the structure of the current data set (i.e., repeated measures with four different writing tasks) fits
these statistical procedures better. For the same reasons, several recent task-based studies
employed similar statistical analyses in exploring the interaction between task and L2
proficiency effects (e.g., Kuiken & Vedder, 2008; Sasayama, 2016; Yang, 2014).

51

Table 4.
Demographic Characteristics of the High and Low Proficiency Group Students
Characteristics

High proficiency (n = 29)

Low proficiency (n = 28)

Age: Mean (SD)
Gender
Male

18.97 (1.09)
16

19.54 (1.88)
19

Female
Chinese

13
22

9
14

Arabic
Korean

2
1

3
4

0
1
2
0
0
1
114.62 (40.75)

3
2
0
1
1
0
87.64 (50.49)

14.14 (15.85)

18.14 (17.02)

First language

Japanese
Portuguese
Malay
Thai
Turkish
French
Length of English study (months):
Mean (SD)
Length of stay in the United States
(months): Mean (SD)

From the second to the fifth week, the participants composed timed essays (each under
the time constraint of 30 minutes). They were not allowed to use dictionaries or other resource
tools while writing. Immediately after writing, they were asked to complete a task perception
questionnaire that contained six task statements. In the last week of data collection, following the
steps of a writing task and task questionnaire, the participants completed a background
questionnaire designed to obtain their demographic information. After collecting all essays, I
transcribed them verbatim.
Essay scoring. Using the analytic scoring rubric introduced above, two expert raters
evaluated the transcribed essays. Both raters were Ph.D. students in a language-related major and
had previous experience in rating timed essays administered by the English language program.

52

The two raters first participated in a two-hour training session to ensure grading consistency. The
raters examined the descriptors of the rubric and the prompts used for this study. I asked the
raters to focus on assigning scores that fully reflect the quality of an essay; in other words, I
asked them not to adjust their level of leniency (or stringency) according to the task type. It was
my attempt to elicit essay scores that accurately reflect L2 writers’ task-specific performance.
With these points in mind, the raters completed an iterative process of rating an essay and
discussing its scores. They were instructed to use any of the integer numbers within a score
range. When subscale scores differed by 3 or more, the raters examined the rubric descriptors
again and adjusted their scores after a short discussion. The raters continued this norming process
until they reached full agreement or subscale scores that differed only by one or two. Eight
essays that were not a part of this study data were used for training purposes.
After the norming session, each rater evaluated the entire data set (304 essays)
independently. The raters were given the essays in random order, but they were informed of the
task type of each essay (i.e., Arg/-Support, Arg/+Support, Nar/-Support, or Nar/+Support) so that
they could assign essay scores most relevant to the topic of each task type. After their work of
ratings all the essays, the raters were compensated with $450. The inter-rater reliability of total
essay scores was r = .84 (each subscale: content r = .81; organization r = .78; vocabulary r = .66;
language use r = .72; mechanics r = .60), generally indicating an acceptable level of inter-rater
reliability (Brown, Glasswell, & Harland, 2004). This study used the average scores of the two
raters. For the essays that were assigned seriously discrepant scores (subscale scores differing by
3 or more), a third rater assigned new scores, and the two close scores were used.

53

Text Features
For a detailed text analysis that take into account the multi-faceted nature of writing
proficiency and the traits included in the rubric, I employed four natural language processing
(NLP) tools that generate a wide array of linguistic, discourse, and metadiscourse features: L2
Syntactic Complexity Analyzer (henceforth SCA; Lu, 2010), Coh-Metrix (McNamara, Graesser,
McCarthy & Cai, 2014), the Multidimensional Analysis Tagger (henceforth MAT; Nini, 2015),
and the Authorial Voice Analyzer (henceforth AVA; Yoon, 2017a). The use of these automated
tools was motivated to respond to a call to address multidimensional features of linguistic
complexity (Lu, 2010; Norris & Ortega, 2009) and also to explore discourse and metadiscourse
features beyond traditional CAF measures in task-based research. Measures of syntactic
complexity were obtained using SCA, MAT, and Coh-Metrix. Coh-Metrix and MAT were further
used for lexical and discourse-level features. Last, AVA was employed for interactional
metadiscourse features. In this study, I decided not to explore linguistic accuracy or fluency
because my previous research (Yoon & Polio, 2017) that examined genre effects longitudinally
confirmed that error-count accuracy and fluency measures did not differ significantly by genre
(also the lack of development over time).
Given an extremely large number of textual features that these tools compute, I selected
target measures based on the criteria of redundancy, validity, and construct distinctiveness. To
give examples related to SCA measures, clauses per sentence (C/S) is a measure of clausal
embeddings that tap both subordination and coordination, but these two constructs should be
measured using two distinct measures (clauses per T-unit (C/T) for subordination and T-unit per
sentence (T/S) for coordination) to reflect language development more clearly (Norris & Ortega,
2009). This led to the exclusion of C/S. Also, verb phrases per T-unit (VP/T) and complex T-

54

units per T-unit (CT/T) that have been found to be less valid as language development indicators
(Lu, 2011) were excluded in the present study. Of the three unit-length measures that SCA
generates (mean length of sentence, mean length of T-unit, mean length of clause), I included
mean length of sentence and mean length of clause because these two measures, unlike mean
length of T-unit, were shown to tap two distinct constructs (Yoon, 2017b). For other measures
tapping a very similar construct (e.g., complex nominals per T-unit and complex nominals per
clause), following Yang et al. (2015), this study included only one measure that had the clause as
its base unit.
Syntactic complexity features. This study involves the construct of syntactic complexity
at the clause- and phrase-levels. SCA was used for the calculation of clause-level syntactic
measures. They include mean length of production units (mean length of sentence and mean
length of clause) and subordination (clauses per T-unit). Because clausal coordination (T-units
per sentence) captures beginning-level language development (Bardovi-Harlig, 1992; Norris &
Ortega, 2009), I decided not to include clausal coordination measures in this study that targets
ESL students at a high intermediate or low advanced level. Unit-length and subordination
measures have been widely adopted as language development indicators (see Bulté & Housen,
2012; Ortega, 2003; Wolfe-Quintero, Inagaki, & Kim, 1998); in particular, it was found that
subordination functions as a valid developmental measure for intermediate proficiency levels.
However, several recent studies have shown that subordination as a unitary construct (e.g.,
overall subordination ratio) failed to detect language development over a short period of time
(e.g., Bulté & Housen, 2014; Mazgutova & Kormos, 2015), and also it was not sensitive enough
to reflect genre variation (e.g., Lu, 2011; Yoon & Polio, 2017).

55

In this regard, challenging the tradition of L2 research that measures subordination as a
single construct, Lambert and Kormos (2014) specifically argued for the need to explore these
different clause types separately to show “developmental variation during task performance” (p.
608). Also, recent L2 research has began to examine such more specific clause types as target
measures and suggested distinct patterns of linguistic variation across tasks (e.g., Frear &
Bitchener, 2015; Staples & Reppen, 2016) as well as L2 developmental trajectories (e.g.,
Vercellotti & Packer, 2016). Therefore, together with a general measure of subordination ratio, I
computed more specific measures of subordination that involve three distinct syntactic relations
(nominal clauses, adverbial clauses, and adjectival clauses).
Simply put, nominal clauses that may have a complementizer optionally serve as objects
of superordinate verbs (e.g., I discovered that each culture has its own communication method.).
Adverbial clauses are dependent clauses that modify superordinate verbs and are associated with
main clauses using a subordinate conjunction (e.g. While I was talking, other people started to
interrupt.). Adjectival clauses (also called relative clauses) modify nouns to specify their
meaning with the optional use of a relative pronoun (e.g., Individuals who can speak foreign
language can spread their own culture to foreigners.) (Collins & Hollo, 2010; Nippold, Hesketh,
Duthie, & Mansfield, 2005).
To obtain the density values (occurrences per 1,000 words) of nominal, adverbial, and
adjectival clauses, I availed myself of MAT, an automated processing tool originally developed
for the multidimensional analysis. MAT annotates specific tags to raw texts using the Stanford
POS Tagger (Toutanova, Klein, Manning, & Singer, 2003) and then calculates normalized
frequencies of various linguistic features that include verb tenses, syntactic patterns, discourse
markers, and so forth. The density of nominal clauses was calculated based on the summed

56

frequencies of that verb complements, subordinator that deletion, and Wh-clauses. The density of
adverbial clauses was based on the occurrences of past participial clauses, present participial
clauses, causative adverbial subordinators, concessive adverbial subordinators, conditional
adverbial subordinators, and other adverbial subordinators. Last, the density of adjectival clauses
was calculated using the frequencies of that relative clauses (both subject and object positions),
pied-piping relative clauses, Wh-relative clauses (both subject and object positions), past
participial relatives, and present participial relatives (see Nini, 2015).
Based on the findings that showed advanced writers’ increased use of grammatical
metaphor and nominalization (Halliday & Mathiessen, 1999), writing researchers are giving
increasing attention to phrasal-level complexity (e.g., Biber & Gray, 2010; Biber, Gray, &
Poonpon, 2011; Ortega, 2003; Parkinson & Musgrave, 2014). Phrasal-level complexity
measures, as distinct features of academic writing, have been found to be valid predictors for
language development and overall writing quality (e.g., Biber et al., 2011; Bulté & Housen,
2014; Lu, 2011). In this study, I explore phrasal-level syntactic complexity by investigating
indices such as the number of complex nominals per clause, number of words before the main
verb (degree of left embeddedness), and number of modifiers per noun phrase, all of which tap
the multidimensional construct of noun phrase sophistication. Additionally, I measure the
number of coordinate phrases per clause, which was found to differ across genres (Lu, 2011).
Lexical features. I assessed two lexical measures: lexical sophistication (word
frequency) and lexical diversity (vocd-D) obtained from Coh-Metrix. Lexical sophistication
addresses the various constructs of average word length, word frequency, and nominalization,
whose measures have been regarded as effective predictors of lexical proficiency because L2
writers tend to use longer, infrequent words with an increasing proportion of nominalizations as

57

their proficiency improves (e.g., Biber, 1988; Crossley, Cobb, & McNamara, 2013; Jarvis, Grant,
Bikowski, & Ferris, 2003; Laufer & Nation, 1995). In addition, for lexical diversity, I examined
vocd-D that is known to sufficiently address text length effects (McCarthy & Jarvis, 2010) and to
validly reflect language development (Crossley, Salsbury, McNamara, & Jarvis, 2010; TreffersDaller, 2013). These lexical measures were also found to reflect genre differences effectively:
higher lexical sophistication and lower lexical diversity in argumentative writing (Yoon & Polio,
2017). In task-based research, there have been some contrasting findings about the effect of
conceptual demands on lexical diversity (i.e., increased lexical diversity in the complex task by
Ong & Zhang, 2010; little difference in lexical diversity across tasks by Révész et al., in press).
Given these findings, the exploration of word frequency and vocd-D in this study will advance
our understanding of how these lexical features interact with task type and L2 proficiency.
Discourse features. I measured five discourse features obtained from Coh-Metrix and
MAT. They include coreference cohesion, conceptual cohesion, causal connective density,
temporal connective density and nominalization density. Cohesion generally indicates the link
between ideas in the text that can be achieved with the help of text cohesion cues at three
different levels: local, global, and text levels (Crossley, Kyle, & McNamara, 2016). Of these
different levels of cohesion, this study targets local cohesive devices that involve lexical and
semantic overlap between adjacent sentences. Coreference cohesion is measured through
argument (nouns and pronouns) overlap for adjacent sentences. Conceptual cohesion is measured
in terms of how two adjacent sentences are related conceptually and thematically. For conceptual
cohesion, Coh-Metrix exploits Latent Semantic Analysis (LSA), a statistical method to explore
the underlying semantic associations between textual segments (Landauer, Foltz, & Laham,
1998). Additionally, I assessed the normalized frequencies of three discourse markers that

58

apparently contribute to genre-specific communicative functions: causal connective density (e.g.,
because, consequently, and accordingly), temporal connective density (e.g., first, until, and
finally), and nominalization density (number of normalized words with a derivational suffix; e.g.,
carelessness, difficulty, and investigation).
It has been widely acknowledged that the extensive use of these cohesion measures helps
the reader better understand the text by facilitating the association between the ideas in the text
(Crossley, Yang, & McNamara, 2014; Gernsbacher, 1990), but previous studies have produced
inconsistent findings regarding the contribution of cohesive devices to writing quality (e.g.,
Crossley & McNamara, 2012; McNamara, Crossley, & McCarthy, 2010; Yang & Sun, 2012). L1
writers are likely to experience a transition from a stage of extensive local cohesion use to a next
stage focusing on constructing complex sentences (Haswell, 2000; McCutchen & Perfetti, 1982),
and I postulate that L2 writers will show different patterns of trade-offs, for example, between
linguistic complexity and cohesion depending on their proficiency, and that genre and task
complexity manipulations exert some influence on L2 writers’ use of discourse-level features.
Interactional metadiscourse features. Using AVA, I examine the density of various
interactional metadiscourse features. Built using a regular expression function in Python and
Stanford Parser (Klein & Manning, 2003), AVA calculates normalized frequencies of
interactional metadiscourse features (i.e., hedges, boosters, attitude markers, self-mention, reader
mention, directives, and questions), motivated by the model of interactional metadiscourse (K.
Hyland, 2005). Using 261 EFL argumentative essays, Yoon (2017) examined how AVA measures
predict the holistic ratings of voice strength and essay quality. The finding of this study showed
that three features (i.e., self-mentions, boosters, and attitude markers) explained 26% of the
variance in voice strength scores, while none had a notable contribution to essay quality.

59

Relevant to this study is a recent corpus study that found clear genre effects on the use of hedge
and self-mention markers (Hong & Cao, 2014). In this study, I focus on the density of hedges,
boosters, self-mentions, and reader-mentions that have been specifically targeted by many EAP
and corpus studies (e.g., Hu & Cao, 2011; K. Hyland & Milton, 1997; Lee & Deakin, 2016,
among others). Table 5 presents a summary of the text features explored in this study.

60

Table 5.
Target Text Features
Construct
Length of
production unit
Subordination

Phrasal
complexity

Lexical features
Discourse

Metadiscourse

Measure
Mean length of sentence (MLS)
Mean length of clause (MLC)
Clauses per T-unit (C/T)
Nominal clause density (NOMC)
Adverbial clause density (ADVC)
Adjectival clause density (ADJC)
Coordinate phrases per clause (CP/C)
Complex nominals per clause (CN/C)
Left embeddedness (LEFT)
Modifiers per noun phrase (MOD/N)
vocd-D (D)
Word frequency (WF)
Coreference cohesion
Conceptual cohesion
Causal connective density
Temporal connective density
Nominalization density
Hedge density
Booster density
Self-mention density
Reader pronoun density

Description
# of words / # of sentences
# of words / # of clauses
# of clauses / # of T-units
# of nominal clauses * 1000 / # of words
# of adverbial clauses * 1000 / # of words
# of adjectival clauses * 1000 / # of words
# of coordinate phrases/ # of clause
# of complex nominals / # of clauses
# of words before the main verb
# of modifiers / # of noun phrases
Based on vocd-D formula
Based on the CELEX corpus
Argument overlap between adjacent sentences
Semantic overlap between adjacent sentences
# of causal connectives * 1000 / # of words
# of temporal connectives * 1000 / # of words
# of nominalizations * 1000 / # of words
# of hedges * 1000 / # of words
# of boosters * 1000 / # of words
# of self-mentions * 1000 / # of words
# of reader pronouns * 1000 / # of words

61

Tool
SCA
SCA
SCA
MAT
MAT
MAT
SCA
SCA
Coh-Metrix
Coh-Metrix
Coh-Metrix
Coh-Metrix
Coh-Metrix
Coh-Metrix
Coh-Metrix
Coh-Metrix
MAT
AVA
AVA
AVA
AVA

Analysis
For the first research question regarding task perceptions, I performed three-way mixed
ANOVAs with group (student and teacher) as a between-subjects variable and genre
(argumentative and narrative) and idea support (no support and support) as within-subjects
variables. Prior to the main statistical analysis, I checked assumptions for mixed ANOVAs. As a
first step, I checked for the normality of distribution by using Shapiro-Wilk tests (alpha = .05).
For the dependent variables that had a significant result of this normality test, I calculated zscores of skewness and kurtosis to determine whether their distribution was within acceptable
limits (i.e., absolute z-score values under 3.29; Kim, 2013). This analysis revealed that the
distribution of all variables was within acceptable limits (z-scores of skewness ranging from 2.10 to 2.27; z-scores of kurtosis ranging from -2.04 to 1.13). Additionally, I performed
Levene’s test for homogeneity of between-group variances and found that all dependent
variables failed to reject the null hypothesis (alpha = .05), confirming the appropriacy of the data
set for mixed ANOVAs. The alpha level of all inferential statistic results was set with the
Bonferroni adjustment.
To answer the second research question regarding task manipulation effects on textual
features, I computed a series of two-way ANOVAs with genre and task complexity as withinsubjects variables. The dependent variables that included 21 textual features at different construct
levels (syntactic complexity, lexical complexity, discourse, and metadiscourse) were found to
have limited correlations. Given the lack of linear relationship between dependent variables, I
decided not to run multivariate analyses. Before conducting the main analysis, I checked
assumptions for repeated-measures ANOVAs. I tested the assumption of normality by examining
Shapiro-Wilk test results and z-scores of skewness and kurtosis. For the variables that rejected

62

the null hypothesis of Shapiro-Wilk tests, I calculated z-scores of their skewness and kurtosis
values. This analysis informed me that four variables were not within acceptable limits: mean
length of sentence, coordinate phrases per clause, left-embeddedness, and self-mention density.
Considering their moderately positively skewed distribution, I transformed their values using a
square root transformation (Tabachnick & Fidell, 2001). As a result, the distribution of these
variables became suitable (i.e., skewness and kurtosis within acceptable limits) for two-way
ANOVAs. While using transformed values for inferential statistics, I report untransformed values
for means and standard deviations for ease of interpretation in Table 12.
For the third research question about genre and task complexity effects on writing scores,
as I did for the analysis related to the second research question, I checked the assumption of
normality by testing the significance of Shapiro-Wilk tests and, subsequently, by examining zscores of skewness and kurtosis for variables with significant results. This analysis showed that
the distribution of all variables was within acceptable limits (z-scores of skewness ranging from 2.76 to 0.14; z-scores of kurtosis ranging from -1.11 to 2.28).

63

CHAPTER 4.
RESULTS
Task Perceptions
The descriptive results of the perception data are presented in Table 6. The first column
Item indicates each of the statements included in the questionnaires. Complexity, for example,
refers to a statement tapping into the construct of task complexity (this task required extreme
mental effort) rather than the actual manipulation of task complexity (provision of supporting
ideas). To avoid confusion, throughout this chapter, I use idea support in indicating the
manipulation of task complexity and complexity in indicating perceived task complexity. Scores
of each item range from 1 to 9. Generally, the descriptive results showed complex patterns of
perceived complexity and difficulty across different conditions, while task anxiety seemed to
have little variation across the conditions. Additionally, the levels of interest and motivation for
the writing tasks were apparently distinct between the student and teacher groups. To examine
the effect of genre and idea support on perceptions statistically, I computed mixed ANOVAs with
the Bonferroni adjustment (alpha = .05/6 or .0083). Throughout the section, I first report the
results of interaction effects and their post-hoc results, followed by those of main effects.
Table 7 shows the interaction effects of the three independent variables (genre, idea
support, and group) on task perceptions. Complexity was the only item that showed a significant
three-way interaction (F(1, 104) = 9.25, p = .003, ηp2 = .082). That is, the students and teachers
had different perceptions about how genre and idea support manipulations influence task
complexity. Specifically, the teachers predicted that providing supporting ideas would make the
argumentative task less complex, but the same task manipulation would make the narrative task
even more complex; in contrast, the students reported that they found the provision of supporting

64

ideas lowering the complexity of both genres, leading to a significant three-way interaction
(post-hoc analysis results reported in the next paragraph). In addition, the results showed a
significant interaction between genre and group on perceived task complexity (F(1, 104) = 9.06,
p = .003, ηp2 = .080) and difficulty (F(1, 104) = 10.87, p = .001, ηp2 = .095). That is, the teachers
perceived the argumentative genre more complex than the narrative, while the students found
both genres similarly complex and difficult (see Figure 1). All significant interaction effects on
perceived complexity and difficulty were medium in size, with ηp2 ranging from .08 to .11. Other
categories (task anxiety, confidence, interest, and motivation) did not show any significant
interactions.
For complexity and difficulty, which showed significant interactions, I performed posthoc analyses separately for each group so that the effect of genre and idea support can be more
clearly presented. As shown in Table 8, the manipulation of idea support actually led to
significant changes in the students’ perceptions of task complexity and difficulty, with medium
effect sizes (complexity: F(1, 75) = 6.91, p = .010, ηp2 = .084; difficulty: F(1, 75) = 9.97, p
= .002, ηp2 = .117). On the other hand, the students did not perceive that different genres impose
significantly different levels of complexity and difficulty (complexity: F(1, 75) = 0.45, p = .51,
ηp2 = .006; difficulty: F(1, 75) = 0.81, p = .37, ηp2 = .011). These perception-based findings
potentially give support to the use of the idea provision condition as a cognitive complexity
variable in written discourse and, more interestingly, refute the general assumption that narrative
writing would be cognitively less demanding and less difficult to ESL students than
argumentative writing.

65

Table 6.
Descriptive Statistics for ESL Students’ and Teachers’ Perceptions of Writing Tasks
Item

Group

Arg/-Support
M (SD)

Complexity

Difficulty

Anxiety

Confidence

Interest

Motivation

95% CI

Arg/+Support
M (SD)

95% CI

Nar/-Support
M (SD)

95% CI

Nar/+Support
M (SD)

95% CI

Student

5.55 (1.77) [5.15, 5.96] 5.07 (1.73) [4.67, 5.46] 5.59 (1.67) [5.21, 5.97] 5.25 (1.65) [4.87, 5.63]

Teacher

6.23 (1.17) [5.80, 6.67] 5.23 (2.24) [4.40, 6.07] 4.47 (1.43) [3.93, 5.00] 5.30 (1.82) [4.62, 5.98]

Student

5.14 (1.76) [4.74, 5.55] 4.50 (1.66) [4.12, 4.88] 5.17 (1.84) [4.75, 5.59] 4.82 (1.96) [4.24, 5.56]

Teacher

5.63 (1.38) [5.12, 6.15] 5.27 (2.18) [4.45, 6.08] 4.00 (1.44) [3.46, 4.54] 4.90 (1.77) [4.37, 5.26]

Student

4.95 (2.00) [4.49, 5.40] 4.64 (2.04) [4.18, 5.11] 4.97 (2.14) [4.49, 5.46] 4.62 (1.83) [4.20, 5.04]

Teacher

5.37 (1.79) [4.70, 6.04] 4.67 (1.81) [3.99, 5.34] 4.10 (1.40) [3.58, 4.62] 4.87 (1.78) [4.20, 5.53]

Student

4.84 (1.86) [4.42, 5.27] 5.20 (1.74) [4.80, 5.59] 5.39 (1.76) [4.99, 5.80] 5.03 (1.80) [4.61, 5.44]

Teacher

5.70 (1.68) [5.07, 6.33] 5.90 (1.79) [5.23, 6.57] 6.57 (1.33) [6.07, 7.06] 5.90 (1.63) [5.29, 6.51]

Student

4.59 (1.90) [4.16, 5.03] 5.20 (1.74) [4.80, 5.59] 5.09 (1.67) [4.71, 5.47] 5.53 (1.94) [5.08, 5.97]

Teacher

5.73 (1.72) [5.09, 6.38] 5.87 (1.63) [5.26, 6.48] 6.40 (1.77) [5.74, 7.06] 6.27 (1.66) [5.65, 6.89]

Student

4.86 (2.00) [4.40, 5.31] 5.18 (2.09) [4.71, 5.66] 5.37 (1.87) [4.94, 5.80] 5.47 (1.83) [5.06, 5.89]

Teacher

5.03 (1.94) [4.31, 5.76] 5.33 (1.71) [4.70, 5.97] 6.13 (1.85) [5.44, 6.82] 5.87 (1.48) [5.31, 6.42]

66

Table 7.
Interaction Effects of Genre, Idea Support, and Group on Task Perceptions
Item

Genre × Idea support × Group
p

ηp2

Complexity

.003*

Difficulty

Genre × Group

p

ηp2

.003*

Observed
power
.080
.847

.284

.381

.001*

.095

.904

.043

.576

.141

.021

.823

.001

.056

.520

Interest

.892

.001

.052

Motivation

.639

.002

.075

.082

Observed
power
.854

.097

.026

Anxiety

.032

Confidence

Genre × Idea support
p

ηp2

.011

Observed
power
.187

.001*

.109

Observed
power
.942

.015

.055

.687

.009

.064

.751

.312

.315

.010

.170

.046

.038

.517

.004

.098

.435

.006

.121

.014

.056

.696

.727

.001

.064

.132

.022

.324

.534

.004

.095

.187

.017

.260

.530

.004

.096

.281

.011

.189

p

ηp2

Idea support × Group

Note. *p values are significant with the Bonferroni correction (alpha = .05/6 or .0083).

Table 8.
Post-hoc Analysis Results of Genre and Idea Support Effects for Each Group’s Perceptions
Item

Group

Genre
p

ηp2

Complexity Student

.506

Teacher
Difficulty

Idea support
p

ηp2

.006

Observed
power
.101

.009*

.005*

.240

.833

Student

.370

.011

Teacher

.002*

.279

Genre × Idea support
p

ηp2

.084

Observed
power
.737

.653

.003

Observed
power
.073

.771

.003

.059

< .001*

.492

.999

.145

.002*

.117

.876

.379

.010

.141

.899

.363

.029

.145

.005*

.238

.829

Note. *p values are significant with the Bonferroni correction (p < .05/2 or .025).

67

Figure 1. Students’ and teachers’ perceptions of task complexity and difficulty across genre
conditions.
Results from the teachers showed an entirely different pattern. There were significant
interactions between genre and idea support on their perceived complexity and difficulty, both
with large effect sizes (complexity: F(1, 29) = 28.07, p < .001, ηp2 = .49; difficulty: F(1, 29) =
9.07, p = .005, ηp2 = .24). Specifically, the teachers predicted that the provision of supporting
ideas would mitigate the complexity and difficulty of argument writing, but a similar
manipulation on the narrative genre would increase the level of task complexity and difficulty
(see Figure 2). Furthermore, contrary to the results from the students, the teachers expected that
ESL students would have different levels of task complexity and difficulty across two genres

68

(i.e., argumentative tasks imposing greater complexity and difficulty on ESL students than
narratives; complexity: F(1, 29) = 9.17, p = .005, ηp2 = .24; difficulty: F(1, 29) = 11.23, p = .002,
ηp2 = .28). The wide gap between the students and teachers in their perceptions of task
manipulation effects will be discussed in more detail in the next chapter.

Figure 2. Interaction plots for perceived complexity and difficulty showing an interaction
between genre and idea support only for teacher perceptions.
In terms of the main effect of each variable (see Table 9), the results showed that there
were significant main effects for group on task confidence and interest (confidence: F(1, 104) =
13.63, p < .001, ηp2 = .12; interest: F(1, 104) = 15.76, p < .001, ηp2 = .13). Specifically, as Figure
3 shows, the teachers’ expectations of ESL students’ confidence and interest in the given tasks

69

were shown to be consistently higher than the actual confidence and interest levels expressed by
the students. Moreover, the results showed significant main effects for genre on task interest and
motivation (interest: F(1, 104) = 7.80, p = .006, ηp2 = .07; motivation: F(1, 104) = 15.17, p
< .001, ηp2 = .13). In other words, both students and teachers viewed narrative writing more
interesting than argumentative, making the students feel more strongly motivated to do the
narrative tasks, compared to the argumentative tasks. Task anxiety was the category with no
significant interaction or main effects.

70

Figure 3. Students’ and teachers’ perceptions of task confidence, interest, and motivation across
genre conditions.

71

Table 9.
Main Effects of Genre, Idea Support, and Group on Task Perceptions
Item

Genre
p

ηp2

Idea support
Observed

ηp2

p

power

Group
Observed

p

ηp2

power

Observed
power

Complexity

.023

.049

.629

.109

.025

.361

.827

.001

.055

Difficulty

.022

.050

.638

.454

.005

.116

.875

.001

.053

Anxiety

.141

.021

.312

.412

.006

.129

.869

.001

.053

Confidence

.099

.026

.377

.409

.007

.130

< .001*

.116

.955

Interest

.006*

.070

.790

.132

.022

.324

< .001*

.132

.976

< .001*

.127

.971

.464

.005

.113

.204

.016

.245

Motivation

Note. *p values are significant with the Bonferroni correction (p < .05/6 or .0083).

72

As a next step, I computed Pearson correlations to explore the relationship among the
perception statements answered by the students. In Table 10, we can see that task complexity and
task difficulty are positively related (.34 < rs < .62), and the level of stress caused by each
writing task had positive relationships with both task complexity (.30 < rs < .59) and difficulty
(.54 < rs < .64). Additionally, there were positive relationships among task confidence, interest,
and motivation (.22 < rs < .69). In particular, the correlations between task interest and
motivation were fairly strong (.53 < rs < .69). These findings generally conform to our
understanding of how various dimensions of task perceptions work.
Table 10.
Correlations between Perception Items by Task Type
Students (N = 76)
Difficulty
Anxiety
Confidence
Arg/-Support
Complexity
.497*
.389*
-.168
Difficulty
.594*
-.206
Anxiety
-.204
Confidence
Interest
Arg/+Support
.582*
.587*
-.089
Complexity
Difficulty
.638*
-.405*
Anxiety
-.412*
Confidence
Interest
Nar/-Support
Complexity
.340*
.299*
.141
Difficulty
.634*
-.236
Anxiety
-.278
Confidence
Interest
Nar/+Support
Complexity
.619*
.543*
-.141
Difficulty
.536*
-.300*
Anxiety
-.251
Confidence
Interest
Note. *correlations are significant at the alpha level of .01.

73

Interest

Motivation

.207
.182
.026
.394*
-

.113
.055
-.092
.440*
.687*

.213
-.062
-.070
.217
-

.222
-.061
.001
.225
.567*

.052
.082
-.245
.327*
-

.027
-.256
-.425*
.369*
.530*

.029
-.135
-.138
.339*
-

.044
-.176
-.116
.364*
.683*

Nevertheless, there were some unexpected patterns revealed from this analysis. The
increase in task complexity did not necessarily result in lower task interest or motivation, as
evidenced by non-significant, but mostly positive, correlations of these dimensions with task
complexity. Similarly, although task difficulty showed negative relationships with task
confidence, task complexity did not necessarily correlate negatively with task confidence,
suggesting that task complexity and difficulty tap different constructs and that a reasonable level
of increase in task complexity does not harm learners’ affective states. What we can infer from
the correlation results is that an increased level of task complexity, when suitable for learners’
developmental stage, can allow the learners to become more interested in a task.
To complement the result of task questionnaires, I examined the teachers’ response to
open-ended questions and found that, contrary to a general trend elicited from statistical analyses
of the questionnaire data, two teachers were actually aware of the potential outcome of ESL
students’ learning experience on their knowledge and performance, such as construction of
unbalanced genre schemas and greater difficulty with narrative writing:
In terms of task difficulty, I don't think the two sets of prompts would be much different
(although A2 [Arg/+Support] seems to be much easier than A1 [Arg/-Support] since it
gives answers). Although narratives are often considered easier than argumentation,
many ESL students often have a lot of experience of writing argumentative essays like A1
and A2 for their test prep. Depending on the extent to which they have been exposed to
each genre of writing, for some students A1 and A2 can be easier than N1 and N2 [Nar/Support and Nar/+Support]. (Participant ID: T115)

74

For the N1 and N2 tasks [Nar/-Support and Nar/+Support], I'd say that some students
will be intimidated by the genre of the task, depending on whether or not they've had
experience with this kind of writing. (Participant ID: T120)
Some teachers also noted that the provision of idea support should be performed with
caution. Two relevant excerpts are as follows:
I thought the writing prompts were relevant to students’ lives for the most part, but
although the suggestions in N2 [Nar/+Support] and A2 [Arg/+Support] could help
students by providing a starting point and/or some specific examples to draw on, they
could also be frustrating if students hadn't encountered those specific situations.
(Participant ID: T103)
Both prompts [Nar/+Support and Arg/+Support] provided too much scaffolding, making
it both too easy to do well at the task and too hard, since sometimes it is easier to come
up with support for your own ideas than for another's. (Participant ID: T128)
The excerpts above indicate ESL teachers’ concerns about the adverse effect of supporting ideas,
such as the possibility of their restriction on what students can think and write, particularly when
the supporting ideas included in a prompt do not reflect students’ life experience.
Additionally, teachers cautioned that providing too specific outlines would deprive
students of the opportunity to generate their own unique ideas in writing, which is an important
part of the writing skills generally targeted in L2 learning contexts. The excerpts below are some
examples of such concerns:
In general, I do not like prompts that lead the students too much (like N2 and A2)
[Nar/+Support and Arg/+Support]. These prompts provide a road map for the supporting

75

points that makes a whole classroom of essays very repetitive to read. Is that intentional?
(Participant ID: T107)
Tasks N1 and N2 [two narrative prompts] were relatively simple and because they are
narrative and personal, ESL students will perform well, I believe. Tasks A1 and A2 [two
argumentative prompts] were more academic, but A2 [Arg/+Support] provided a brief
outline of arguments which would be helpful to the test-takers. As an instructor, though, I
think it would be better NOT to provide the outlines in A2 because part of the goal of the
task is to see how well writers can generate and organize their own ideas. (Participant
ID: T119)
Last, as shown in the excerpts below, some teachers expressed the possibility of greater
motivation for narrative writing, potentially arising from its less formulaic and more
personalized characteristics.
The first one [Nar/-Support] was open enough that better writers may be able to do
something interesting and step outside of formulaic 5-paragraph essay writing.
(Participant ID: T127)
Motivation goes up when the student is in a “can-do” situation and is encouraged to
communicate a message that they are personally invested in. (Participant ID: T111)
The examination of these quotes selected from the teachers’ open-ended response enabled us to
have a more in-depth understanding of the teachers’ perceptions of the tasks. All of the
quantitative findings related to task perceptions are summarized in Table 11, which will be
discussed together with text feature results in the next chapter.

76

Table 11.
Summary of Task Perception Results
Complexity

Difficulty

Students
• Similar level of complexity for
narrative and argument
• Lower complexity for the tasks
with idea support
•
•

Anxiety

•

Similar level of difficulty for
narrative and argument
Lower difficulty for the tasks with
idea support
Similar level of anxiety for all
tasks

Teachers
• Higher complexity for argument
than narrative
• Idea support leads to higher
complexity for narrative, but
lower complexity for argument
• Higher difficulty for argument
than narrative
• Idea support leads to higher
difficulty for narrative, but lower
difficulty for argument
• Similar level of anxiety for all
tasks

Correlation patterns
• Positive correlations with task
difficulty and stress

•
•

•
•

Confidence

•

Similar level of confidence for all
tasks

•
•

Interest

•

Higher interest in narrative than
argument

•
•

Motivation

•

Higher motivation for narrative
than argument

•

Similar level of confidence for all
tasks
Higher task confidence from
teachers than students
Higher interest in narrative than
argument
Higher interest from teachers than
students
Higher motivation for narrative
than argument

77

•
•
•

•

Positive correlations with task
complexity
Negative correlations with task
confidence
Positive correlations with task
complexity and difficulty
Negative correlations with task
confidence
Positive correlations with task
interest and motivation
Negative correlations with task
difficulty and stress
Positive correlations with task
confidence and motivation

Positive correlations with task
confidence and interest

Textual Feature Changes across Task Types
The second research question addresses the effect of genre and idea support on various
text features in ESL student writing (i.e., syntactic, lexical, discourse, and metadiscourse
features), which is an attempt to (1) reveal how task manipulations lead students to use different
features of language, (2) associate such linguistic changes with communicative functions of each
genre, and (3) ultimately suggest a comprehensive picture of how genre-specific functions and
learners’ task perceptions influence their language use together and/or separately. To attain these
aims, I examined 21 text features with respect to their changes across task types (see Table 12 for
descriptive results). To illustrate the target measures briefly, there were ten measures tapping the
construct of syntactic complexity:
•

Unit-length measures: mean length of sentence (MLS) and mean length of clause (MLC)

•

Subordination measures: clauses per T-unit (C/T), nominal clauses per 1,000 words
(NOMC), adverbial clauses per 1,000 words (ADVC), and adjective clauses per 1,000
words (ADJC)

•

Phrasal-level measures: coordinate phrases per clause (CP/C), complex nominals per
clause (CN/C), average number of words before the main verb (left embedded), and
average number of modifiers per noun phrase (modifiers/NP).

Two lexical measures were additionally targeted:
•

Lexical diversity based on the vocd-D formula (D) and lexical sophistication based on
average word frequency extracted from the CELEX corpus (WF; here, lower WF
indicates greater lexical sophistication)

78

I examined five discourse measures:
•

Two lexical cohesion measures: argument overlap between adjacent sentences
(coreference cohesion) and semantic overlap between adjacent sentences (conceptual
cohesion)

•

Two connective density measures: causal connectives per 1,000 words (causal connective
density) and temporal connectives per 1,000 words (temporal connective density)

•

Nominalizations per 1,000 words (nominalization density)

Finally, I included four metadiscourse measures:
•

Number of hedges per 1,000 words (hedge density), number of boosters per 1,000 words
(booster density), number of self-mentions per 1,000 words (self-mention density), and
number of reader pronouns per 1,000 words (reader pronoun density)
Table 13 presents the results of two-way ANOVAs regarding how genre and idea support

manipulations elicited different textual features. First, the result showed significant interaction
between genre and idea support on various aspects of noun phrase complexification (CN/C: F(1,
75) = 22.02, p < .001, ηp2 = .23; Modifiers/NP: F(1, 75) = 21.14, p < .001, ηp2 = .22;
Nominalization: F(1, 75) = 15.16, p < .001, ηp2 = .17). As presented in Figure 4, the result of
post-hoc analyses (paired samples t-tests) suggested that the provision of idea support in
argumentative writing had a tendency to lead to a significant increase (or increasing pattern with
no statistical significance) in noun phrase complexity (CN/C: t(75) = -1.46, p = .150, d = -0.14;
Modifiers/NP: t(75) = -3.14, p = .002, d = -0.31; Nominalization: t(75) = -4.43, p < .001, d = 0.96), whereas the same manipulation in narrative writing likely resulted in a significant decrease
(or a decreasing pattern with no statistical significance) in nominal complexity (CN/C: t(75) =

79

5.22, p < .001, d = 0.66; Modifiers/NP: t(75) = 2.95, p = .004, d = 0.33; Nominalization: t(75) =
0.05, p = .96, d = 0.01).

Figure 4. Interaction plots for complex nominals per clause, modifiers per noun phrase, and
nominalization density showing an interaction between genre and idea support conditions. It
should be noted that the y-axes of the plots have different scales.

80

Table 12.
Descriptive Statistics for Target Text Features by Task Type
Measure
Length of unit
MLS
MLC
Subordination
C/T
NOMC
ADVC
ADJC
Phrasal complexity
CP/C
CN/C
Left embedded
Modifiers/NP
Lexical features
D
WF
Discourse
Argument overlap
Semantic overlap
Causal connective
Temporal connective
Nominalization
Metadiscourse
Hedge
Booster
Self-mention
Reader pronoun

Arg/-Support
M (SD)
95% CI

Arg/+Support
M (SD)
95% CI

Nar/-Support
M (SD)
95% CI

Nar/+Support
M (SD)
95% CI

17.17 (4.03)
9.79 (1.67)

[16.24, 18.09]
[9.41, 10.17]

18.16 (4.52)
9.74 (1.38)

[17.13, 19.19]
[9.42, 10.06]

16.20 (4.09)
8.76 (1.57)

[15.27, 17.14]
[8.40, 9.12]

16.00 (4.54)
8.20 (1.34)

[14.96, 17.04]
[7.90, 8.51]

1.58 (0.28)
8.55 (6.52)
16.68 (8.99)
10.32 (7.08)

[1.51, 1.64]
[7.06, 10.04]
[14.62, 18.73]
[8.70, 11.93]

1.67 (0.34)
7.06 (5.92)
16.36 (7.31)
12.02 (6.52)

[1.59, 1.74]
[5.71, 8.42]
[14.69, 18.04]
[10.53, 13.51]

1.59 (0.30)
10.27 (6.11)
13.79 (6.60)
8.31 (6.02)

[1.52, 1.65]
[8.88, 11.67]
[12.28, 15.30]
[6.94, 9.69]

1.67 (0.34)
9.60 (5.90)
12.65 (6.77)
8.64 (6.21)

[1.59, 1.75]
[8.25, 10.95]
[11.10, 14.20]
[7.23, 10.06]

0.20 (0.14)
1.28 (0.34)
5.05 (1.82)
0.76 (0.16)

[0.17, 0.23]
[1.20, 1.35]
[4.63, 5.47]
[0.72, 0.79]

0.19 (0.11)
1.33 (0.33)
4.94 (1.77)
0.82 (0.13)

[0.17, 0.22]
[1.26, 1.41]
[4.53, 5.35]
[0.79, 0.85]

0.17 (0.12)
0.95 (0.32)
4.15 (1.12)
0.63 (0.12)

[0.14, 0.20]
[0.88, 1.02]
[3.90, 4.41]
[0.60, 0.66]

0.15 (0.09)
0.76 (0.20)
3.99 (1.27)
0.59 (0.11)

[0.13, 0.17]
[0.72, 0.81]
[3.70, 4.28]
[0.56, 0.61]

75.36 (13.84) [72.20, 78.52] 76.57 (16.35) [72.83, 80.30] 78.14 (17.07) [74.24, 82.04] 82.57 (15.11) [79.12, 86.02]
3.04 (0.09)
[3.02, 3.06]
3.08 (0.08)
[3.06, 3.09]
3.09 (0.07)
[3.08, 3.11]
3.11 (0.08)
[3.09, 3.12]
0.64 (0.20)
[0.60, 0.69]
0.64 (0.17)
[0.60, 0.68]
0.67 (0.16)
[0.63, 0.71]
0.63 (0.16)
[0.60, 0.67]
0.24 (0.08)
[0.22, 0.26]
0.24 (0.07)
[0.23, 0.26]
0.21 (0.07)
[0.19, 0.23]
0.18 (0.05)
[0.17, 0.19]
37.76 (12.67) [34.86, 40.65] 33.37 (10.86) [30.89, 35.85] 34.16 (11.78) [31.47, 36.85] 34.64 (11.95) [31.91, 37.37]
14.84 (9.22) [12.73, 16.95] 15.08 (7.67) [13.33, 16.83] 20.31 (9.07) [18.25, 22.37] 26.12 (10.13) [23.80, 28.43]
21.83 (12.70) [18.93, 24.73] 31.71 (15.27) [28.22, 35.20] 11.06 (8.13) [9.20, 12.91] 10.99 (11.27) [8.41, 13.56]
11.86 (7.42)
23.35 (13.69)
19.24 (19.12)
28.18 (20.70)

[10.16, 13.55]
[20.23, 26.48]
[14.88, 23.61]
[23.45, 32.91]

17.57 (11.15)
21.88 (10.95)
18.03 (17.12)
29.58 (24.03)

[15.02, 20.11] 15.75 (8.90)
[29.37, 24.38] 24.73 (9.62)
[14.11, 21.94] 71.90 (29.61)
[24.09, 25.07] 32.72 (22.08)

81

[13.72, 17.79] 17.92 (10.13)
[22.53, 26.93] 24.86 (9.90)
[65.13, 78.66] 68.93 (24.97)
[27.78, 37.77] 34.82 (22.29)

[15.60, 20.23]
[22.60, 27.12]
[63.22, 74.63]
[29.73, 39.92]

Table 13.
Inferential Statistics for Genre and Idea Support Effects on Textual Features
Measure
P

Genre
ηp2

Observed
power

p

Idea support
ηp2
Observed
power

Length of unit
MLS
< .001*
.330
1.000
.260
.017
MLC
< .001*
.515
1.000
.038
.056
Subordination
C/T
.785
.001
.058
.003
.109
NOMC
.002*
.120
.883
.134
.030
ADVC
< .001*
.161
.963
.431
.008
ADJC
< .001*
.215
.994
.107
.034
Phrasal complexity
CP/C
.008
.090
.766
.550
.005
CN/C
< .001*
.767
1.000
.020
.070
Left embedded
< .001*
.369
1.000
.313
.014
Modifiers/NP
< .001*
.759
1.000
.368
.011
Lexical features
D
.009
.088
.758
.040
.055
WF
< .001*
.302
1.000
< .001*
.162
Discourse
Argument overlap
.489
.006
.106
.333
.012
Semantic overlap
< .001*
.407
1.000
.090
.038
Causal connective
.334
.012
.160
.104
.035
Temporal connective
< .001*
.447
1.000
.001*
.144
Nominalization
< .001*
.639
1.000
.001*
.139
Metadiscourse
Hedge
.064
.045
.459
< .001*
.202
Booster
.090
.038
.396
.576
.004
Self-mention
< .001*
.824
1.000
.571
.004
Reader pronoun
.037
.056
.552
.445
.008
Note. *p values are significant with the Bonferroni correction (alpha = .05/21 or .0024).

82

p

Genre × Idea support
ηp2
Observed
power

.202
.548

.016
.048

.075
.051

.684
.509

.847
.321
.123
.364

.933
.483
.581
.328

.001
.007
.004
.013

.051
.107
.085
.163

.091
.652
.171
.146

.442
< .001*
.714
< .001*

.008
.227
.002
.220

.119
.996
.065
.995

.541
.964

.240
.063

.018
.045

.216
.462

.161
.397
.369
.938
.930

.396
.006
.089
.003
< .001*

.010
.097
.038
.114
.168

.134
.801
.398
.867
.970

.991
.086
.087
.118

.056
.447
.822
.857

.048
.008
.001
.001

.483
.117
.056
.054

Main effects of genre were prevalent for many of the textual measures with medium to
large effect sizes (ηp2 from .12 to .82), while those of idea support existed only for a few
measures. With regard to genre effects, the argumentative essays elicited significantly higher
values of unit length (MLS: F(1, 75) = 36.92, p < .001, ηp2 = .33; MLC: F(1, 75) = 79.71, p
< .001, ηp2 = .52), phrasal complexity (CN/C: F(1, 75) = 247.46, p < .001, ηp2 = .77; left
embedded: F(1, 75) = 43.88, p < .001, ηp2 = .37; modifiers/NP: F(1, 75) = 236.71, p < .001, ηp2
= .76), and discourse measures (semantic overlap: F(1, 75) = 51.55, p < .001, ηp2 = .41;
nominalization: F(1, 75) = 132.62, p < .001, ηp2 = .64) than the narrative essays. Of these
measures with significant changes, the density of complex nominals (CN/C) was found to have
the largest effect size (ηp2 = .77). The significant main effects of CN/C and modifiers/NP are
illustrated in Figure 5.

83

Figure 5. Complex nominals per clause and modifiers per noun phrase across genre and idea
support conditions.
On the other hand, as displayed in Figure 6, the narratives showed significantly higher
values in temporal connective density (F(1, 75) = 60.54, p < .001, ηp2 = .45) and self-mention
density (F(1, 75) = 350.46, p < .001, ηp2 = .82) than the argumentative essays. This result of
increased temporal connectives and self-mentions in narrative writing is not very surprising
because they are important linguistic resources that writers use in narrating a personal story
(Biber & Conrad, 2009).
84

Figure 6. Temporal connective density and self-mention density across genre and idea support
conditions.
The clauses per T-unit (C/T) measure, which had been extensively adopted as a typical
measure of clausal subordination, was not shown to change across the two genres (F(1, 75) =
0.08, p = .79, ηp2 = .001), and this result is in line with the findings of previous research (e.g., Lu,
2011; Yoon & Polio, 2017). However, using more fine-grained measures of clausal subordination
(i.e., nominal, adverbial, and adjectival clause density), I found that narrative writing is
characterized by increased nominal clause density (F(1, 75) = 10.18, p = .002, ηp2 = .12) and
argumentative writing by increased density of adverbial clauses (F(1, 75) = 14.38, p < .001, ηp2
85

= .16) and adjectival clauses (F(1, 75) = 20.49, p < .001, ηp2 = .22). This result is notable in that,
unlike previous studies attending to phrasal measures and, accordingly, rejecting clausal
subordination in relation to genre variation (except for Frear & Bitchener, 2015), the result
clearly indicates that the use of more specific measures allows us to detect how different genres
elicit different characteristics of clausal subordination in L2 writing (see Figure 7), which has
been gone unnoticed in most previous research due to its reliance on a general subordination
measure (Lambert & Kormos, 2014; Wolfe-Quintero et al., 1998).
There were some text features that varied significantly with the provision of idea support
(WF: F(1, 75) = 14.53, p < .001, ηp2 = .16; nominalization: F(1, 75) = 12.12, p = .001, ηp2 = .14;
temporal connective density: F(1, 75) = 12.57, p = .001, ηp2 = .14; hedge density: F(1, 75) =
19.03, p < .001, ηp2 = .20). For these measures, there was a general trend that the provision of
idea support led to a significant increase in density. For example, the tasks with idea support
elicited significantly more temporal connectives in learner writing than those without idea
support. Also, the provision of idea support elicited more frequent lexical items (i.e., lower
lexical sophistication), for which I will present possible explanations in the Discussion section.

86

Figure 7. Nominal clause density, adverbial clause density, and adjectival clause density across
genre and idea support conditions.

87

Interplay of L2 Proficiency and Task Manipulations Influencing Textual Features
Next, I explored how the effects of genre and idea support on textual features vary with
L2 proficiency in order to give insight into how genre and task manipulations need to be aligned
with proficiency levels. For example, if the significant effect of genre exists only for the highproficiency group’s language (i.e., significant interaction between genre and proficiency), we can
assume that low-proficiency students may not be fully capable of producing different language
needed for different genres. Also, if idea support has significant effects on the low-proficiency
group’s language but not on that of the high-proficiency group (i.e., significant interaction
between idea support and proficiency), we can assume that the manipulation of idea support may
work greatly for the low-proficiency group because the cognitive complexity of the target tasks
aligns well with their developmental stage. To test these hypotheses, as introduced in the
Methods section, I used the high- and low-proficiency groups assigned based on cloze test
performance (high-proficiency students who had cloze test scores equal or higher than 31 (n =
29); low-proficiency students who had cloze test scores equal or lower than 25 (n = 28)) for
three-way mixed ANOVAs (between-subjects variable: L2 proficiency; within-subjects variables:
genre and idea support).
As shown in Table 14, the ANOVA result indicated that L2 proficiency exerted no
significant main effect on any of the textual features. Additionally, there was no significant
interaction that involves L2 proficiency, suggesting that the high- and low-proficiency groups
constructed their essays with very similar linguistic resources. Although there were two text
measures with notable three-way interactions (NOMC: F(1, 55) = 7.64, p = .008, ηp2 = .12; WF:
F(1, 55) = 4.47, p = .039, ηp2 = .08) and one measure with the interaction between idea support

88

and proficiency (nominalization: F(1, 55) = 6.58, p = .013, ηp2 = .11), all of these measures were
not statistically significant after the Bonferroni correction.
Table 15 presents the summary of the statistical analyses for the second research question
(i.e., task type and L2 proficiency that led to a significant increase in textual features). To delve
into the motivation for language changes across task types, I compared the results of text feature
changes (Table 13) with the results of students’ task perceptions (the Students column of Table
11). An interesting finding elicited from this comparison is that the majority of text feature
changes across task types had little to do with how the students judged the writing tasks in terms
of their task complexity or difficulty, clearly challenging a widely held assumption in task-based
writing research. Specifically, in many previous studies, the validity of task manipulations (e.g.,
whether the addition of cognitive demands in a writing prompt actually leads to an increase in
the cognitive burden associated with writing production) has been tested with regard to
significant changes in linguistic measures, mostly those tapping the constructs of linguistic
complexity or accuracy. However, in this study, while the addition of idea support, which was
intended to lower students’ cognitive pressure, actually led to a significant decrease in students’
perceived task complexity and difficulty, this effective manipulation of task complexity did not
push the students to complete the tasks with different linguistic resources.

89

Table 14.
Interaction and Main Effects of L2 Proficiency on Textual Features
Item

Length of unit
MLS
MLC
Subordination
C/T
NOMC
ADVC
ADJC
Phrasal complexity
CP/C
CN/C
Left embedded
Modifiers/NP
Lexical features
D
WF
Discourse
Argument overlap
Semantic overlap
Causal connective
Temporal connective
Nominalization
Metadiscourse
Hedge
Booster
Self-mention
Reader pronoun

Genre × Idea support × Level
p
ηp2
Observed
power

p

Genre × Level
ηp2
Observed
power

Idea support × Level
p
ηp2
Observed
power

p

Level
ηp2
Observed
power

.460
.545

.010
.007

.113
.092

.276
.052

.022
.067

.191
.496

.875
.257

.001
.023

.053
.203

.545
.172

.007
.034

.092
.275

.819
.008
.788
.521

.001
.122
.001
.008

.056
.774
.058
.097

.134
.472
.942
.426

.040
.009
.001
.012

.321
.110
.051
.124

.745
.351
.404
.224

.002
.016
.013
.027

.062
.152
.131
.227

.962
.845
.422
.327

.001
.001
.012
.017

.050
.054
.125
.163

.976
.721
.646
.958

.001
.002
.004
.001

.050
.064
.074
.050

.301
.428
.174
.121

.019
.011
.033
.043

.176
.123
.273
.340

.092
.826
.967
.976

.051
.001
.001
.001

.392
.055
.050
.050

.776
.297
.634
.303

.001
.020
.004
.019

.059
.179
.076
.176

.186
.039

.032
.075

.261
.547

.101
.931

.048
.001

.374
.051

.913
.642

.001
.004

.051
.074

.206
.716

.029
.002

.242
.065

.119
.912
.080
.750
.407

.044
.001
.055
.002
.013

.344
.051
.417
.061
.130

.736
.326
.090
.057
.679

.002
.018
.051
.064
.003

.063
.164
.396
.481
.069

.915
.463
.858
.905
.013

.001
.010
.001
.001
.107

.051
.112
.054
.052
.712

.933
.536
.793
.689
.594

.001
.007
.001
.003
.005

.051
.094
.058
.068
.082

.794
.776
.729
.852

.001
.001
.002
.001

.058
.059
.064
.054

.557
.293
.497
.710

.006
.020
.008
.003

.089
.181
.103
.066

.269
.646
.820
.505

.022
.004
.001
.008

.196
.074
.056
.101

.944
.973
.512
.703

.001
.001
.008
.003

.051
.050
.099
.066

Note. Level = L2 proficiency level

90

Table 15.
Summary of Task Manipulation and L2 Proficiency Conditions with Significantly Higher Values
of Textual Features
Construct

Genre

Idea support

L2 proficiency

Length of production

Argument

-

-

Nominal clause

Narrative

-

-

Adverbial clause

Argument

-

-

Adjectival clause

Argument

-

-

Noun phrase complexity

Argument

-

-

Lexical sophistication

Argument

No support

-

Conceptual cohesion

Argument

-

-

Connectives

Narrative

With support

-

Metadiscourse

Narrative

With support

-

Conversely, genre variation, which was shown to have little influence on students’
perceptions of task complexity and difficulty, led the students to use widely different language in
writing. This finding suggests the necessity of disentangling the effects of task manipulation on
students’ perceptions from those on their language production because different levels of
cognitive burden elicited from writing tasks do not necessarily result in the formulation of
different linguistic constructions, potentially due to the characteristics of the written mode that
allows for a series of planning and revising (Hayes, 1996; Hayes & Flower, 1980). We also need
to understand that writers modify their language to fulfill different rhetorical functions in
different genres (e.g., Gilquin & Paquot, 2008; Ravid, 2005; Yasuda, 2011), pointing to the need
to separate between task complexity and linguistic complexity in writing. In this respect, the
findings of this study that showed extensive genre effects on L2 learners’ language can be
explained as the outcome of their attempt to accomplish genre-specific functions.

91

To further test the relationship between task complexity and linguistic complexity (or the
influence of task complexity on linguistic complexity), for each task type, I computed Pearson
correlations of perceived task complexity with various text features that tap linguistic complexity
dimensions. As shown in Table 16, the result of this analysis indicated very limited relationships
between ESL writers’ perceptions of task complexity and their linguistic performance.
Table 16.
Correlations of Perceived Task Complexity with Linguistic Complexity Features
Linguistic features

Arg/-Support

Arg/+Support

Nar/-Support

Nar/+Support

r

r

r

r

MLS

.217

.090

-.205

-.103

MLC

.146

.177

-.221

-.046

C/T

.072

.009

-.099

-.062

NOMC

-.081

-.051

-.096

.113

ADVC

.120

-.064

-.174

.057

ADJC

.081

-.126

-.098

-.079

CP/C

.064

.030

-.162

-.077

CN/C

.208

.107

-.189

.013

Left embedded

.123

.093

-.269*

.217

Modifiers/NP

.193

.149

-.167

.016

D

-.212

-.018

-.168

.077

WF

.063

-.176

.106

-.171

Length of unit

Subordination

Phrasal complexity

Lexical features

Note. *correlations are significant at the alpha level of .05.
While refuting the assumption of a close link between perceived task complexity and
linguistic complexity, I found it necessary to suggest a detailed functional interpretation of genre-

92

specific linguistic features for more convincing arguments. To this end, I conducted a qualitative
analysis of some textual features that showed clear genre variation. Of many syntactic
complexity measures, nominal complexity (complex nominals per clause and modifiers per noun
phrase) was found to change to the largest extent across the two genres. These notable withinsubjects changes can be interpreted in terms of how ESL students’ language use in written
discourse reflects their selection of linguistic resources to fulfill different communicative
functions. The example excerpts extracted from the two essays composed by the same writer are
presented below (full essays in Appendix D). The underlined parts of the excerpts indicate
complex nominals based on the scheme used for the validation of automated processing tools
(Polio & Yoon, in preparation).
The chief reason to support my idea is that an adequate foreign language is beneficial to
enlarge social network. It's very common for student who study on abroad that the living
level depends on the language level. In this society, the social network is very important
for having a successful life. Taking my own example, I have good level of English. So I
can find many internships in MSU, which are very useful for me to know many brilliant
students and to enlarge social network. Hence, that can lay a fundament for my future
career. (Arg/+Support, Participant ID: S4)
About two month ago, I was in the airplane from Beijing to Detroit. A waitress came to
me and said “Sir, would you want something to drink.” I was so happy, because at this
time I was extremely thirsty. And I replied that “Sure, I want orange juice. Please add
some ice.” Then, I found the waitress was very unhappy. She said “Sir, if you want ass,
please add your own ass.” Eventually, I realized that my pronunciation was wrong. That I
pronounced a wrong vowel sound led the waitress to misunderstand my meaning. I

93

immediately apologized to this waitress and explained my real meaning. To be honest, I
felt really embarrassed in that situation. But at least I corrected a wrong pronunciation.
(Nar/+Support, Participant ID: S4)
The first excerpt is from an argumentative essay, and the second one is from a narrative essay. As
you can see from the excerpts, an ESL writer’s use of complex nominals varied greatly across the
two genres, clearly indicating that the use of complex noun phrases concerns an issue beyond
language development but rather relates to the selection of appropriate linguistics resources in
different rhetorical situations.
Additionally, narrative essays were characterized by increased use of temporal
connectives and personal pronouns that are necessary for the coherent organization of a personal
story. It has been widely acknowledged that the extensive use of first person pronouns allows
writers to clearly denote their position as a main character in their personal story, and the use of
temporal connectives contributes to linking events in chronological order. The following are the
example excerpts with these points highlighted (see the D2 part of Appendix D for full essays).
Besides understanding culture, speaking a foreign language has lots of other benefits, for
instance, you will be provided a greater job opportunities related to international
business. This opportunity is valuable since there are huge markets in other countries.
Those who can speak many languages have earned a lot of money from international
business. Moreover, by having a good command of a foreign language, you gain more fun
from various activities such as traveling or watch foreign TV programs. You can enjoy
different kind of view and broaden your horizons. This is a very cool experience that
definitely worth a try. (Arg/+Support, Participant ID: S45)

94

One month ago, I started my new life in America. Everything went well at first, and I was
quite satisfied with my new circumstance here. The air was clean and fresh, and the sky
was pure blue. I can seldom enjoy this kind of environment in my hometown. I was in
good mood, and well-prepared to start my study life here, until that day I went to my first
Mathematics class. I found my classroom easily and took a seat there. I was nervous
since I was unfamiliar with the American teaching style, but I was confident too because
my mathematics had always been very good in China. When the professor started
talking, I was astonished that he spoke too fast for me to follow. (Nar/+Support,
Participant ID: S45)
These patterns presented in the excerpts clearly represent linguistic features prevalent in the
entire essays (e.g., only one first person pronoun and two temporal connectives used in the entire
argumentative essay).
Below are two example excerpts intended to show how various types of dependent
clauses appear differently in argumentative and narrative essays (nominal clauses in bold,
adverbial clauses double underlined, and adjectival clauses underlined). In interpreting these
excerpts, I focus only on nominal and adjectival clauses that have fairly contrastive functions.
With the globalization in Asia, a increasingly amount of countries are seeking the
opportunities of cooperating with China, so the people who have the ability to speak
other languages have more chances to participate in international events. In the
meantime, the rise of international companies gives people more job opportunities, and
most of the jobs they provide a relatively high income... On the other hand, you travel
experience can be fantastic if you can understand the language that the country use.
(Arg/+Support, Participant ID: S47)

95

I remember that I tried to ask somebody for the right path by using English, because my
friend said it’s okay to say English to them, they’ll understand. But soon I found out that
my biggest issue is not speaking correct English to them, but I can’t understand what
they reply in English. Then I had to read their gesture, and a nice lady even used
electronic dictionary in her phone to translate her word into English. Fortunately, most
of them can understand what you said in English. All I have to do is that to get used to
their Korean-style English, and I did it. (Nar/+Support, Participant ID: S47)
When I attended to the occurrences of nominal clauses, it was observed that narrative
writing tends to include many stative mental verb + nominal clause constructions (verbs
including find out, remember, and understand), whereas the excerpt from argumentative writing
does not contain any nominal clause (only one case in the entire essay; see the D3 part of
Appendix D). Given the major function of mental verbs for describing states and actions
experienced by humans (Biber, Johansson, Leech, Conrad, & Finegan, 1999), this finding of
increased nominal clauses in narrative writing can be interpreted as high-level ESL writers’
attempt to describe their experience in an accurate way.
On the other hand, as illustrated by the excerpts above, argumentative essays likely
contain more adjectival clauses (e.g., people who have the ability to speak other languages and
jobs they provide). This pattern of increased postmodifying adjective clauses and complex noun
phrases is known to allow the meaning of an academic text to be more compressed and denser,
thus making its knowledge transfer and argumentation more effective (Biber & Gray, 2011;
Halliday, 1993; Parkinson & Musgrave, 2014). This finding of higher adjectival clauses in
argumentation, therefore, can be explained as ESL students’ effort to convey a complex meaning
from condensed nominal expressions for more convincing arguments.

96

Essay Score Changes across Task Types
The third research question involved how ESL students’ writing scores vary across genres
and idea provision conditions. The two expert raters scored all essays using the revised analytic
rubric introduced in the Method section, and their averaged scores were used. For the essays with
seriously discrepant scores (subscale scores differing by 3 or more), a third rater assigned new
scores, and the average of two close scores was used. Table 17 presents descriptive statistics for
the essay scores analyzed in this study. Each of the rubric categories had a full score of 20
(except for mechanics whose full score was 10). The total score in Table 17 indicates the sum of
the five rubric categories (full score = 90).
Table 18 presents the results of two-way ANOVAs with genre and idea support as withinsubjects variables. As shown in Figure 8, the result indicated significant interaction effects
between genre and idea support on content (F(1, 75) = 11.16, p = .001, ηp2 = .13), organization
(F(1, 75) = 7.82, p = .007, ηp2 = .09), and language use scores (F(1, 75) = 7.51, p = .008, ηp2
= .09), jointly leading to a significant interaction between genre and idea support on the essays’
total scores (F(1, 75) = 9.47, p = .003, ηp2 = .11). Specifically, significantly higher scores (or such
pattern with no statistical significance) were given to the three rubric categories (content,
organization, and language use) for the argumentative prompt with idea support (content: t(75) =
-2.82, p = .006, d = -0.32; organization: t(75) = -2.81, p = .006, d = -0.33; language use: t(75) = 1.60, p = .11, d = -0.19) and for the narrative prompt without idea support (content: t(75) = 2.09,
p = .04, d = 0.24; organization: t(75) = 1.70, p = .09, d = 0.20; language use: t(75) = 2.36, p =
.02, d = 0.27). That is, the condition of idea support entailed a positive impact on the quality of
argumentative writing but negatively affected the quality of narrative writing, particularly with
regard to idea development (the largest effect size for the content category).

97

Table 17.
Descriptive Statistics for Essay Scores by Genre and Idea Support
Category
(full score)

Arg/-Support

Arg/+Support

Nar/-Support

Nar/+Support

M (SD)

95% CI

M (SD)

95% CI

M (SD)

95% CI

M (SD)

95% CI

Content (20)

12.63 (2.21)

[12.12, 13.13]

13.23 (2.22)

[12.72, 13.74]

14.15 (2.11)

[13.67, 14.63]

13.53 (2.24)

[13.01, 14.04]

Organization (20)

12.65 (2.28)

[12.12, 13.17]

13.26 (2.01)

[12.80, 13.72]

14.36 (1.92)

[13.92, 14.79]

13.88 (2.01)

[13.42, 14.34]

Vocabulary (20)

13.53 (1.54)

[13.17, 13.88]

13.70 (1.38)

[13.38, 14.01]

14.11 (1.39)

[13.79, 14.42]

13.63 (1.50)

[13.28, 13.97]

Language use (20)

13.53 (1.76)

[13.13, 13.93]

13.84 (1.45)

[13.51, 14.17]

13.88 (1.69)

[13.50, 14.27]

13.39 (1.51)

[13.04, 13.73]

Mechanics (10)

7.43 (1.26)

[7.14, 7.72]

7.34 (1.19)

[7.06, 7.36]

7.44 (1.25)

[7.15, 7.73]

7.33 (0.96)

[7.11, 7.54]

Total score (90)

59.75 (7.53)

[58.03, 61.47]

61.36 (6.89)

[59.78, 62.93]

63.93 (6.51)

[62.45, 65.42]

61.75 (7.04)

[60.14, 63.35]

Table 18.
Inferential Statistics for Genre and Idea Support Effects on Essay Scores
Category
p

Genre
ηp2

Content

< .001*

Organization

Idea support
ηp2
Observed
power
.001
.050

Genre × Idea support
ηp2
Observed
power
.001*
.130
.909

.182

Observed
power
.981

.957

< .001*

.303

1.000

.662

.003

.072

.007*

.094

.788

Vocabulary

.079

.041

.420

.210

.021

.239

.027

.064

.608

Language use

.756

.001

.061

.507

.006

.101

.008*

.091

.772

Mechanics

.999

.001

.050

.300

.014

.178

.925

.001

.051

Total score

< .001*

.133

.918

.584

.004

.084

.003*

.112

.859

p

Note. *p values are significant with the Bonferroni correction (alpha = .05/6 or .0083).
98

p

Figure 8. Interaction plots for content, organization, and language use scores showing an
interaction between genre and idea support conditions.
This result can be seen as evidence for beneficial effects of supporting ideas on the
perceptions and production of argumentative writing because such provided ideas would enable
the students to focus on developing more detailed ideas and coherent organization. On the other
hand, in the personal narrative genre, the provision of supporting ideas potentially restricts L2
learners to a limited range of storylines provided in the prompt rather than helps them develop
fully developed stories, resulting in lower scores on the narrative essays composed with
supporting ideas.
Additionally, it was found that the students obtained significantly higher content and
organization scores on their narratives than argumentative essays (content: F(1, 75) = 16.65, p
< .001, ηp2 = .18; organization: F(1, 75) = 32.65, p < .001, ηp2 = .30), with a particularly large
effect on organization scores (see Figure 9). This result that showed a clear genre effect on
discourse-level writing scores can be interpreted either as the outcome of an actual difference in
essay quality across the two genres or as the outcome of the difficulty of assigning comparable

99

scores on discourse-level categories due to raters’ different levels of strictness with regard to
genre (Hamp-Lyons & Mathias, 1994). However, there was no significant influence of genre on
any of the sentence-level writing scores (vocabulary, language use, and mechanics), which also
contrasts with the syntactic complexity finding that showed prevalent genre effects.

Figure 9. Content and organization scores across genre and idea support conditions.
The result showed that none of the rubric categories had a significant main effect of idea
support, indicating that, despite a clear function of idea support in relieving L2 learners’
cognitive burden, the existence of supporting ideas in the prompts did not necessarily lead to

100

different essay scores. Taking into account the result of essay scores and that of learner
perceptions together, I suggest that the students’ subjective judgments of writing tasks do not
correspond with the quality of their essays assessed by expert raters.
Interplay of L2 proficiency and Task Manipulations Influencing Essay Scores
Thus far, I have demonstrated that genre and idea support had interaction effects on
various dimensions of essay quality (content, organization, and language use). Additionally, I
showed that genre exerted a significant effect on discourse-level essay quality (content and
organization), while idea support has no significant effect on any of the rubric categories. To
explore the potential interplay of L2 proficiency and task manipulations, I computed three-way
mixed ANOVAs with L2 proficiency as a between-subjects variable, as well as genre and idea
support as within-subjects variables (Table 19 for descriptive statistics). The alpha level was set
with the Bonferroni adjustment (alpha = .05/6 or .0083). As shown in Table 20, the result
indicated that the high-proficiency students received significantly higher scores on sentence-level
rubric categories than the low-proficiency students (vocabulary: F(1, 55) = 9.75, p = .003, ηp2
= .15; language use: F(1, 55) = 8.78, p = .004, ηp2 = .14), while the effect of L2 proficiency on
the content and organization categories approached statistical significance (content: F(1, 55) =
6.49, p = .014, ηp2 = .11; organization: F(1, 55) = 6.49, p = .014, ηp2 = .11). Figure 10 illustrates
specific patterns of L2 proficiency effects on vocabulary and language use scores across task
types.

101

Table 19.
Descriptive Statistics for Essay Scores by L2 Proficiency, Genre, and Idea Support
Category

Level

(full score)

Arg/-Support
M (SD)

95% CI

Arg/+Support
M (SD)

95% CI

Nar/-Support
M (SD)

95% CI

Nar/+Support
M (SD)

95% CI

Content

High

13.22 (1.93) [12.49, 13.96] 13.95 (1.88) [13.23, 14.66] 14.40 (2.21) [13.56, 15.24] 14.22 (1.77) [13.55, 14.90]

(20)

Low

12.09 (2.77) [11.02, 13.16] 12.86 (2.75) [11.79, 13.92] 13.80 (1.93) [13.06, 14.55] 12.63 (2.47) [11.67, 13.58]

Organization

High

13.19 (2.01) [12.43, 13.95] 13.74 (1.79) [13.06, 14.42] 14.62 (1.89) [13.90, 15.34] 14.64 (1.61) [14.03, 15.25]

(20)

Low

11.95 (2.87) [10.84, 13.06] 13.04 (2.43) [12.09, 13.98] 14.02 (1.98) [13.25, 14.79] 12.95 (2.29) [12.06, 13.84]

Vocabulary

High

14.07 (1.27) [13.59, 14.55] 14.03 (1.16) [13.59, 14.50] 14.59 (1.42) [14.05, 15.13] 14.17 (1.27) [13.69, 14.66]

(20)

Low

13.07 (1.80) [12.37, 13.77] 13.61 (1.83) [12.90, 14.32] 13.75 (1.17) [13.30, 14.21] 12.96 (1.70) [12.31, 13.62]

Language use

High

14.21 (1.64) [13.58, 14.83] 14.26 (1.45) [13.71, 14.81] 14.41 (1.28) [13.93, 14.90] 13.74 (1.37) [13.22, 14.26]

(20)

Low

12.79 (1.80) [12.09, 13.48] 13.54 (1.62) [12.91, 14.16] 13.68 (1.71) [13.02, 14.34] 12.98 (1.75) [12.30, 13.66]

Mechanics

High

7.58 (1.38)

[7.05, 8.10]

7.51 (1.31)

[7.00, 8.01]

7.52 (1.23)

[7.05, 7.98]

7.71 (0.90)

[7.36, 8.05]

(10)

Low

7.21 (1.30)

[6.70, 7.71]

7.20 (1.16)

[6.75, 7.65]

7.52 (1.24)

[7.04, 8.00]

7.09 (0.97)

[6.72, 7.46]

Total score

High

62.27 (7.05) [59.59, 64.95] 63.49 (6.03) [61.20, 65.78] 65.53 (6.28) [63.15, 67.92] 64.48 (5.51) [62.39, 66.58]

(90)

Low

57.10 (8.80) [53.69, 60.51] 60.23 (8.69) [56.86, 63.60] 62.77 (6.23) [60.35, 65.18] 58.61 (8.20) [55.43, 61.79]

102

Table 20.
Interaction and Main Effects of L2 Proficiency on Textual Features
Category

Genre × Idea support × Level
p

ηp2

Observed

Genre × Level
p

ηp2

power

Observed

Idea support × Level
p

ηp2

power

Observed

Level
ηp2

p

power

Observed
power

Content

.182

.032

.264

.976

.001

.050

.292

.020

.182

.014

.106

.707

Organization

.065

.061

.456

.740

.002

.062

.455

.010

.115

.014

.106

.707

Vocabulary

.135

.040

.320

.361

.015

.148

.744

.002

.062

.003*

.151

.866

Language use

.274

.022

.192

.341

.017

.157

.308

.019

.173

.004*

.138

.829

Mechanics

.133

.041

.323

.890

.001

.052

.241

.025

.214

.189

.031

.257

Total score

.063

.062

.463

.949

.001

.050

.654

.004

.073

.004*

.139

.833

Note. Level = L2 proficiency level; *p values are significant with the Bonferroni correction (alpha = .05/6 or .0083).

103

Figure 10. Vocabulary and language use scores across task types and L2 proficiency.

104

Of particular note here is that the high- and low-proficiency groups had statistically
different essay scores (greater difference in sentence-level rubric categories), whereas the two
groups did not differ in their use of linguistic resources in writing. This finding potentially
indicates that the quality of language use and vocabulary involves qualitative dimensions that
cannot be fully captured through quantity-based textual features. That is, while there is no group
difference in their use of textual features, the high-proficiency group may still have better
command of the target language in fulfilling the goal of a writing task. In contrast, the result
showed no significant interaction that involves L2 proficiency (see Table 20), suggesting that the
impact of task manipulations on essay quality is likely to be consistent regardless of L2
proficiency (or at least within the proficiency level range targeted in this study).

105

CHAPTER 5.
DISCUSSION
ESL Students’ and Teachers’ Perceptions of Writing Tasks
This study aimed to add to the limited amount of research into the perceptions and
production of various L2 writing tasks. The results of the questionnaires indicated that there is a
gap between students’ and teachers’ perceptions of the writing tasks adopted in this study. As
shown in Table 8, the most notable difference between the two groups involved the cognitive
complexity and difficulty imposed by each of the two genres. Specifically, although the teachers
predicted that ESL students would have greater cognitive pressure and difficulty in composing
the argumentative genre than the narrative, the students found both genres causing a similar level
of complexity and difficulty. In fact, the teachers’ expectations of genre-specific cognitive
demands imposed by argumentative and narrative tasks reflect how L2 researchers have
explained their findings that involved multiple genres (e.g., higher cognitive demands of nonnarrative writing than narrative; Ruiz-Funes, 2014, 2015; Yang, 2014), which merits further
discussion.
It has been a widely accepted belief that L2 students would find the argumentative genre
more cognitively demanding than the narrative because the former necessitates students’ higherorder reasoning and interpretation that goes beyond knowledge telling (Bereiter & Scardamalia,
1987). It may be true that reasoning skills needed to fulfill argumentative tasks are more difficult
to obtain than those needed for narrative tasks and that such argumentation skills require more
conceptual processes of writers. This may be why young writers, who have not fully developed a
mature cognitive system, have greater difficulty completing argumentative or expository tasks
than narratives, as shown in much L1 writing research (e.g., Berman, 2008; Engelhard et al.,

106

1992; Ravid, 2005). However, this prediction does not seem to be in line with ESL students’
actual perceptions of argumentative and narrative tasks, potentially because of their extensive
experience with argumentation as a primary genre in academic settings (Christie, 1997; Johns,
1995; Mei, 2006).
More specifically, I argue that the same prediction about a genre-cognition connection
should not be made to adult L2 learners who have extensive academic writing experience and are
equipped with a full-fledged cognitive system. Cognitive models of writing processes (Hayes,
1996; Hayes & Flower, 1980) emphasize the mediating effects of genre schemas, task schemas,
and other long-term memory factors (e.g., topic awareness) on working memory pressures during
writing. Therefore, a potential explanation is that the majority of adult L2 writers who had much
experience in preparing for a standardized L2 writing test are likely to possess well-established
genre schemas for argumentation, and accordingly the use of these genre schemas probably leads
to a reduced processing burden during argumentative writing despite the inherent, higher-level
cognitive loads of this particular genre.
Unlike the lack of genre effects on the students’ perceptions of task complexity and
difficulty, the finding of this study showed that the idea support condition led to a significant
change in the level of perceived complexity and difficulty for both genres. In line with Révész et
al.’s (in press) results using argumentative tasks, the current finding from the argumentative and
narrative genres can be seen as additional evidence of idea support as a valid task manipulation
in written discourse. With this finding as a starting point, future studies would be able to explore
how to maximize the intended impact of idea support manipulations in various writing tasks and
test the applicability of other task variables to the written modality (e.g., exploring the function

107

of the number of elements in writing based on the Triadic Componential Framework; Robinson,
2001b, 2007).
Regarding the role of idea support for different genres, in their open-ended responses, the
teachers expressed concerns about the potentially adverse effect of idea support on narrative
writing performance. This point was also confirmed by the teachers’ response to the task
perception questionnaire that indicated a significant interaction between genre and idea support
on task complexity and difficulty (i.e., teachers’ expectations that the provision of idea support in
argumentative writing would decrease its cognitive complexity and difficulty, whereas the same
manipulation would increase the complexity and difficulty of narrative writing). That is,
considering the nature of personal narratives, having students draw on specific storylines
provided by a task developer can cause detrimental effects on their writing performance because
the given stories can be largely irrelevant to students’ experience (Hinkel, 2002; Lo & F. Hyland,
2007). Therefore, considering the present result that showed a negative effect of supporting ideas
on narrative writing scores (RQ 3) as well as the previous findings that showed the elicitation of
increased syntactic complexity and better performance from a topic more closely related to
students’ lives (Hinkel, 2002; Yoon, 2017b), I argue that all information constituting a writing
prompt (e.g., topic, task, and supporting ideas) should be relevant to writers’ experience in order
to elicit their best performance.
An additional point to discuss from the perception result is the level of task interest and
motivation across the two genres. The results showed that both students and teachers found the
narrative genre involving more interest-sparking features than the argumentative. In this regard,
Zhang (2013) stated that “many ESL learners’ personal written narratives are embodiments of
their dreams and aspirations” (p. 447), which implies that personal narrative writing is a medium

108

that enables students to communicate their experience in written discourse. Also, because
narrative writing is full of culture- and language-specific characteristics (Berman & Slobin,
1994; Kang, 2005), an instructional focus on narrative writing will allow ESL students to learn
how to use their linguistic and cultural resources in organizing their personal thoughts.
In terms of relationships between task perception items (see Table 10), the result showed
that, although significantly correlated, task complexity and difficulty operate as two different
constructs (Révész et al., 2016), as demonstrated by the positive relationship of task complexity
with task interest and motivation, which did not hold true for task difficulty in most cases. I view
this finding as evidence pointing to the importance of developing a task appropriately
challenging to the target student population. For example, if a writing task is too simple to
students, they will not be fully engaged in the task and have lower motivation for completing the
task successfully. Likewise, Xu (2003) suggested the use of moderately challenging tasks as one
of the ways to increase L2 learning motivation. Melendy (2008) also showed that approximately
50% of the undergraduate student participants selected the most challenging writing task when
asked to select one out of the three task options to complete for assessment purposes. Given
these findings, going beyond the well-known sequencing of simple-to-complex tasks in a
language curriculum (Robinson, 2010), our next step is to build a framework for designing
appropriately challenging tasks for students at various proficiency levels (and for those with
different educational backgrounds).
Effects of Task Type on Textual Features
The second goal of this study was to explore various textual features with a focus on how
they vary across task types. I first examined the effect of genre and idea support on the language
use of all student participants and, then, further analyzed how such task type effects interact with

109

the students’ L2 proficiency. The major finding of these analyses is that the language produced
by ESL students differed widely across the two genres, while their language differed to a limited
extent across the idea support conditions. This confirms some of the previous findings and, at the
same time, refutes several assumptions that have existed in the field of task-based writing
research.
First, supporting the findings of previous research (e.g., Lu, 2011; Qin & Uccelli, 2016;
Way et al., 2000; Yoon & Polio, 2017), I argue that genre indeed functions as a task variable that
elicits different linguistic features from L2 learners. Specifically, it was confirmed that the
argumentative genre leads students to produce syntactically more complex language, while the
narrative allows them to produce more temporal connectives and first person pronouns. In this
regard, I showed the argumentative and narrative excerpts that were composed by one writer but
were characterized by notably different linguistic structures, suggesting evidence of the writer’s
understanding of register flexibility and capability of communicating different meanings across
the two genres. We can infer from this finding that, for example, temporal connectives and
personal pronouns need to be targeted as linguistic resources for coherent narrative writing.
In addition, using the fine-grained measures of subordination (nominal, adverbial, and
adjectival clause density), I found that the argumentative essays indicated greater adverbial and
adjectival clause density, while the narratives showed greater nominal clause density. The present
finding that contrasts with the previous findings of genre effects on clausal syntactic complexity
(e.g., Lu, 2011; Yoon & Polio, 2017) points to the importance of adopting more specific
measures when exploring genre effects (or generally task type effects) on clausal subordination
(i.e., Frear & Bitchener, 2015). Also, as we observed from the examination of the essay excerpts,
researchers need to interpret genre-specific language structures with regard to their

110

communicative functions necessary or useful for that particular genre (or task) because one of
the important functions of language tasks is to elicit task natural, useful, and essential structures
from L2 learners (Loschky & Bley-Vroman, 1993). For example, L2 learners with adequate
competence in grammar will be prompted to use more nominal clauses and temporal connectives
in the narrative task, while using more adverbial and adjectival clause structures in the
argumentative task, because different language structures are useful for the completion of
different genres.
The findings of the present study have indicated that ESL students at high intermediate or
low advanced proficiency seem to have sufficient genre awareness and understand the need to
write differently in different contexts. Particularly, I have shown how rhetorical functions
associated with each genre leads to a range of genre-specific linguistic features, demonstrating
the importance of focusing on what meaning writers attempt to communicate in their writing
rather than on how the different cognitive demands of writing tasks lead to changes in language
use. That is, as Berman and Slobin (1994) suggested, “the development of grammar cannot be
profitably considered without attention to the psycholinguistic and communicative demands of
the production of connective discourse” (p. 2). This argument for the connection between
rhetorical functions and linguistic features is further strengthened by the result that showed no
genre effects on perceived task complexity and difficulty.
As I discussed above, unlike prevalent effects of genre, the provision of idea support
influenced learners’ language use to a limited extent. Specifically, the result showed a significant
increase in a few textual features in the idea support condition (e.g., temporal connective,
nominalization, and hedge density), while lexical sophistication was significantly lower (i.e.,
higher word frequency) in the essays composed with supporting ideas. There are several

111

interpretations of this finding, each of which is discussed here in terms of their viability. The first
explanation involves priming effects on language use, which has been investigated extensively
with a focus on oral language development (see McDonough & Trofimovich, 2011). For
example, when given a prompt that includes many low-frequency words, L2 learners who are
likely to borrow some words included in the prompt due to their limited lexical repertoire would
compose an essay that contains more low-frequency words. When checked for the current
prompts, however, this explanation did not hold true because there was no particular pattern of
higher or lower levels of word frequency between the +Support and -Support prompts (average
word frequency of all words used in each prompt: Arg/-Support: 2.85; Arg/+Support: 2.81; Nar/Support: 3.00; Nar/+Support: 3.00).
Another possibility is the influence of essay length on lexical sophistication. It has been
argued that various dimensions of linguistic complexity, accuracy, and fluency tend to be in
competition due to limited cognitive resources (Skehan, 1998, 2009; Skehan & Foster, 2001).
While some researchers argued for the positive relationship between linguistic complexity and
accuracy (Robinson, 2001a, 2005, 2007), it is conceivable that an essay full of sophisticated
words would be relatively shorter than that full of simple words when composed under the same
time constraint. Therefore, a potential scenario is that if the idea support condition in fact
encouraged students to write more within a given time, their greater attention to fluency (i.e.,
completing a lengthier essay) might have led to lower lexical sophistication. I tested this
hypothesis by examining text length (total word count) for each task type as well as the
relationship between text length and word frequency. This analysis showed inconsistent patterns
of change in text length with regard to the idea support condition (text length in words: Arg/Support: 289.09; Arg/+Support: 302.80; Nar/-Support: 310.67; Nar/+Support: 301.39), offering

112

no evidence for this hypothesis. Similarly, the correlation result showed the lack of relationships
between text length and word frequency (Arg/-Support: r = .11; Arg/+Support: r = .03; Nar/Support: r = .04; Nar/+Support: r = .08), rejecting the feasibility of this explanation.
By refuting these two interpretations, I was assured that the decrease in lexical
sophistication (i.e., higher average word frequency) in the idea support condition (i.e., less
complex tasks as indicated in the student perception result) might be evidence of a significant
impact of task complexity on lexical complexity. That is, of various dimensions of linguistic
complexity, lexical sophistication is probably the only area that gives reliable support to the
connection between cognitive complexity and linguistic complexity. This explanation is in line
with most previous task-based studies that explored the effect of idea support (e.g., Kormos,
2011; Ong & Zhang, 2010; Révész et al., in press). For example, Ong and Zhang found that the
increase in cognitive complexity through idea support led to greater lexical complexity but little
change in fluency. Additionally, Révész et al. and Kormos indicated significant effects of idea
support on lexical complexity but not on the majority of other complexity or accuracy measures.
Given such consistent findings of the association between task complexity and lexical
complexity in written discourse (Kormos, 2011; Ong & Zhang, 2010; Révész et al., in press), I
tentatively argue that the major area on which the cognitive burden of a writing task exerts an
influence is the extent to which ESL student use sophisticated lexical items. Specifically, when
given a more cognitively demanding task in which students need to come up with more specific
and relevant ideas, they would direct a greater amount of their attentional resources to using
more sophisticated words. In contrast, the majority of syntactic complexity dimensions that are
exploited to fulfill various communicative functions would not be greatly influenced by the
cognitive demand of a task in the written mode.

113

Last, regarding a significant interaction between genre an idea support on nominal
features, I could infer that, with more cognitive resources made available from idea support, L2
learners might be able to provide more packed information by increasing the use of complex
noun phrases and nominalizations in argumentative writing, which contributed to making
convincing arguments. In contrast, the greater amount of cognitive resources, which could be
used for more intriguing narration, led L2 learners to focus even less on nominal features
because complex noun phrases and nominalizations make the text more informational and,
accordingly, less interpersonal (Halliday & Mathiessen, 1999).
Effects of Task Type on Essay Quality
Regarding the effect of task type on essay scores, this study showed several important
findings. First, while the provision of supporting ideas resulted in higher argumentative essay
scores, the same task manipulation led to lower narrative writing scores. This interaction effect
between genre and idea support on quality scores existed for three rubric categories (content,
organization, and language use), with the largest effect on content scores. If we equate an essay
score assigned by expert raters with the quality of an essay, this finding can be interpreted as
varying roles of idea support in assisting L2 writers across genres.
One possible scenario is that, given some ideas to use as supporting points, ESL students
might have had a lower cognitive burden for completing the argumentative task, which enabled
them to allocate their greater cognitive and attentional resources to other writing areas related to
language construction and essay structure. However, when given several possible plots that
needed to be incorporated for narrative writing, students might have felt forced to use them
rather than narrate their own stories, potentially leading to the narration of a less relevant story
and, consequently, to lower essay scores. This point was expressed in the teachers’ responses to

114

open-ended survey questions; specifically, as shown in some excerpts above, several teachers
indicated that the incorporation of supporting ideas that are not relevant to ESL students’
experience could be a challenging task to them.
This finding can also be explained as the outcome of varying areas that ESL students find
challenging in composing different genres. For example, if ESL students can improve the quality
of their argumentative writing with some supporting ideas provided in a prompt, it can be
inferred that the area of students’ difficulty involves coming up with logical, convincing ideas in
the argumentative genre and, accordingly, that idea development needs greater pedagogical
attention when teaching argumentation. In contrast, adult students may already have sufficient
ideas and experiences to use as storyline resources for the personal narrative task. Accordingly,
the support that the students probably need for narrative writing is register-specific linguistic
expressions that they can rely on when turning their experience at the conceptual level into the
language needed to complete the narration.
I suggested in the Literature Review section that, considering the emphasis of
standardized L2 writing tests on argumentation (Qin & Karabacak, 2010), adult ESL students are
likely to have experienced narrative tasks much less than many researchers and teachers have
expected. Because the students received higher scores on the narrative tasks than on the
argumentative tasks, the finding of this study does not fully support this reasoning that points to
the need for more instructional focus on L2 narrative writing. However, the perception results
indicated that there is a wide gap between the students and teachers in how they view different
genres, and the students did not see the argumentative tasks more cognitively demanding than the
narrative, despite the potentially increased reasoning for argumentation.

115

Based on these findings, I suggest that the writing instruction intended to teach narrativerelated linguistic resources (e.g., particle phrasal verbs, locative elements, and temporal
connections) would contribute to expanding ESL students’ genre conventions and improving
their general L2 writing proficiency. One of the potential reasons that ESL students have
difficulty fulfilling L2 narrative writing is the need to use many particle verbs in expressing the
path and manner of motion in the narrative of English, a satellite-framed language (see Berman
& Slobin, 1994). Particularly, ESL students who use L1s that tend to express manner and path in
verbs (i.e., verb-framed languages such as Hebrew, Japanese, Korean, Spanish, and Turkish,
although still under debate) can find it very challenging to use various types of particle verbs
appropriately (Slobin, 2004; Talmy, 1985, 2000). This argument, however, is somewhat
speculative; thus, exploring the effect of such instruction on ESL students’ perceptions and
production of narrative writing will advance our understanding of the development of narrative
writing skills.
Another major finding related to essay quality is that ESL students received significantly
higher scores on the narratives than the argumentative essays, offering confirmatory evidence
against the generalizability of writing scores across different genres (e.g., Bouwer et al., 2015;
Way et al., 2000). Particularly, the results indicated significantly higher content and organization
scores on the narrative genre than the argumentative. One of the possible explanations for this
finding is that students were expected to follow more rigid top-down organization rules for
argumentative writing (Berman, 2008) and, as a result, argumentative essays that did not meet
such organizational expectations were likely to gain lower scores. Narrative writing typically
involves a linear structure, which is less salient than argumentative writing’s hierarchical, topdown structure (i.e., main ideas first, followed by supporting information) (Van Dijk & Kintsch,

116

1983). These genre-specific expectations, from a perspective of rater effects, might encourage
raters to be more lenient for the narrative genre with regard to content and organization,
suggesting the need to have raters better understand such potential genre effects on their rating
behavior and to train them to evaluate different writing tasks more reliably.
Much L1 and L2 research has interpreted their finding of higher narrative writing scores
than those of non-narrative genres as the outcome of higher cognitive demands of non-narrative
genres (e.g., Bouwer et al., 2015; Crowhurst, 1980; Engelhard et al., 1992; Kegley, 1986; Way et
al., 2000). However, in this study, I do not attribute a significant genre difference in text quality
scores to the cognitive demands required by different genres because task perception results
showed no difference in task complexity or difficulty between argumentative and narrative tasks.
Instead, considering other dimensions of task perceptions, I suggest the potential role of students’
interest and motivation in eliciting different score between the two genres. The perception results
from the student participants indicated significantly higher interest and motivation for narrative
writing than for argumentative, which might have led them to devote more attention to the
narrative genre. It has been extensively documented that task interest and attitudes exert a
significant impact on writing performance (e.g., Graham, Berninger, & Fan, 2007; Knudson,
1995; Lo & F. Hyland, 2007; Zimmerman & Bandura, 1994), and the finding of this study can be
seen as empirical evidence partly supporting this claim, although it still needs more reliable data
controlled for genre and rater effects.
The last point to discuss in this chapter is the interaction of L2 proficiency and task type
on essay scores. Previously, I showed that none of 21 text features was significantly influenced
by L2 proficiency, which was interpreted as the consequence of either a narrow proficiency
range or the incorrect alignment of writing tasks with target learner characteristics. However, the

117

finding of essay scores showed significant main effects of L2 proficiency on vocabulary and
language use scores. That is, ESL students’ sentence-level writing scores (vocabulary and
language use categories) better reflected their L2 proficiency than discourse-level scores (content
and organization categories) did. This finding can be interpreted mainly in two ways. First, this
finding can be seen as the effect of rating behaviors. In this regard, Rezaei and Lovorn (2010)
revealed raters’ greater sensitivity to syntactic and mechanical features than to content or
rhetorical features, meaning, for example, that a subtle difference in the quality of sentence-level
features between essays can lead to changes in their scores. Accordingly, despite a narrow range
of L2 proficiency levels among the student participants, they still had significantly different
scores on their use of syntactic structures and lexical items.
Another possible interpretation is that L2 proficiency approximated by cloze test scores
might tap sentence-level writing skills, better reflecting the development of sentence-level
writing skills. There has been much debate on whether cloze tests are capable to assess both
sentence- and discourse-level competence or they can only assess sentence-level competence
(see Tremblay, 2011); thus, I acknowledge the possibility that different patterns might have been
obtained with different measures of L2 proficiency. This issue can be resolved by using a more
objective, standardized method to assess L2 proficiency, or by replicating this study with ESL
students at a much lower proficiency level. I particularly assume that the latter will allow us to
obtain a more comprehensive picture of proficiency effects on the performance of different
writing tasks. In the following section, I will discuss implications of the present study and
directions for future research.

118

CHAPTER 6.
CONCLUSION
Theoretical and Research Implications
This study offers several important implications for L2 writing research. First, as the
perception result showed, there is a possibility that some interpretations based on long-standing
beliefs do not accurately depict the motivation behind what have been empirically observed. The
presumption that I intended to explore and challenge involved the genre-cognition connection in
L2 writing research. Thus far, many L2 researchers have explained their findings of cross-genre
language and score differences as arising from the difference in cognitive pressure between
genres (e.g., argumentative tasks as cognitively more complex than narrative tasks), and this
practice has been widely accepted in the field because many have believed that linguistic features
are dependent on cognitive processes due to humans’ limited cognitive resources and the
majority of previous research has produced very consistent findings of higher linguistic
complexity and lower essay scores in the argumentative genre than the narrative. However, as
evidenced by the findings of the present study, L2 learners’ perceptions of the complexity and
difficulty of writing tasks have little to do with linguistic features or quality scores of their
essays. Specifically, it was found that the majority of textual features are a manifestation of a set
of communicative functions demanded by each genre, while lexical sophistication is one of a few
areas that were shown to differ according to the cognitive complexity of writing tasks.
Therefore, task-based writing researchers should not set out to investigate their research
questions with the presumption of task-specific challenges and task manipulation effects
because, for example, their prediction of task manipulation effects would not always match
students’ actual perceptions of different tasks. A possible way of addressing this issue is to

119

conduct task-based research in two separate stages: (1) testing students’ perceptions of task
manipulation effects for various dimensions and (2) investigating the effect of confirmed task
manipulations on students’ language use. In doing so, researchers will be able to better
understand how to gain intended task manipulation effects and interpret their findings in more
flexible and accurate ways.
Additionally, in the field of TBLT research, there has been a tendency to explore changes
in traditional linguistic complexity and accuracy measures in an attempt to infer the cognitive
demands of different tasks. This trend in task-based studies might have come from their focus on
the validation of competing cognition hypotheses (Robinson, 2005, 2007; Skehan, 1998, 2009).
However, by exploring a comprehensive range of linguistic features at different levels, this study
identified some findings that had gone unnoticed in previous research. For example, I found how
hedging expressions differ across the idea support conditions, and how various cohesion markers
and connectives vary across the two genres. More interestingly, the present finding indicated
how important it is to employee more fine-grained dependent clauses as target measures, instead
of traditional subordination ratio measures, in identifying more specific patterns across task
types. Therefore, I recommend that future research into task manipulation effects on linguistic
features need to explore linguistic features at various levels to obtain a more comprehensive
picture of how some task features elicit different language use and promote development.
While this study showed confirmatory evidence for the function of supporting ideas in
reducing cognitive pressure, there is still an issue of how specific such supporting ideas should
be in a prompt. On this point, Huot (1990) argued that a moderate level of specificity that clearly
informs audience and purpose would greatly benefit students’ writing production, while Brossell
(1983) and Smith et al. (1985) suggested that there is the potential that writing prompts with too

120

specific information will cause adverse effects on students’ writing. Similarly, some teacher
participants in the present study cautioned that too much supporting ideas could derive students
of the opportunity to develop their own ideas. As a next step, future research can explore how
different levels of specificity and amount in supporting ideas exert different effects on students’
perceptions and language production.
Pedagogical and Assessment Implications
This study offers implications for L2 writing pedagogy that generally involve how
teachers need to understand and implement different genres in L2 writing classes. Considering
the present finding that revealed a wide gap between teachers’ and students’ task perceptions, I
suggest that it is important for teachers to have a better awareness of potential genre effects on
students’ task perceptions and language production. For this purpose, teachers may need some
training to increase their knowledge of how various areas of task features create different
outcomes. As a result of such training, they will be able to design and select writing tasks
appropriate for their students. In the case of choosing target tasks, while considering students’ L2
proficiency as a primary factor, teachers also should take into account students’ task interest or
motivation because such motivational variables have been found to influence students’ writing
performance (e.g., Graham et al., 2007; Zimmerman & Bandura, 1994). One way of achieving
this goal is to conduct task-based needs analysis at the beginning of a semester (Long, 2015) and
then select a range of target tasks that will be covered over the course of the semester.
Furthermore, due to a widespread belief that argumentative writing, a cognitively
challenging task, is most suitable for testing purposes, L2 writing teachers tend to focus on
developing students’ skills for argumentative writing; accordingly, they have paid relatively less
attention to other genres such as narrative or descriptive writing. Similarly, it is likely that

121

teachers assume that they do not need any more instruction for narrative writing when their
students show sufficient skills for argumentative writing because of their conception of narrative
as a simpler task than argumentative writing. However, based on the finding of this study, I
argue that teachers should not make an a priori decision on how tasks will work and what to
include in a curriculum. Interestingly, several parts of the present findings pointed to the need for
giving greater instructional emphasis on narrative writing. First, it was found that ESL students
tend to see the narrative genre as more interesting and motivating than the argumentative. Aside
from the cultural or affective benefits of narrative tasks (e.g., Berman & Slobin, 1994; Kang,
2005; Zhang, 2013), this study also suggested an additional justification for the inclusion of
narrative writing in the ESL classroom, namely, the lack of schemas for effective narrative
writing.
One of the unexpected findings of this study was a significant interaction between genre
and idea support on discourse-level writing scores. This result in fact arose from students’
significantly lower narrative scores when they were given the prompt with supporting ideas (see
Table 17). As discussed above, the most probable explanation for this finding is supporting ideas’
unexpected restrictions on the scope of personal stories that need to be used for interesting
narrative construction. An important implication of this finding is that the provision of
supporting ideas for a personal narrative task should be avoided in order to give students
opportunities to better learn how to turn their experience into a well-organized narrative essay.
For lower-level students who need additional support, teachers can instead provide a list of
relevant particle verbs that students can use for narrative writing, while offering some idea
support for argumentative writing.

122

Based on their previous test-taking experience, ESL and EFL students are likely to expect
that they will be given argumentative tasks in the context of standardized writing assessment. In
fact, despite some attempts to implement multiple writing tasks in one language test, there is still
a tendency to rely on a single task of argumentation in various proficiency and placement test
settings, mostly for practical reasons. However, different linguistic features and task
performances across different genres have informed us that test developers need to provide at
least more than one genre to obtain a more comprehensive picture of test-takers’ writing
proficiency. Similarly, calling for the necessity of targeting multiple genres (or modes of
discourse in her study), Kegley (1986) argued “the practitioner should be cognizant of the
limitations of using a single mode of discourse for making decisions about overall student
writing competency for either groups of, or individual, students” (p. 154).
While following this suggestion may cause some concerns related to the constraints of
time and cost (e.g., more time for test implementation and increased cost for rating), test
developers can avail themselves of an automated language processing technology that has gone
through much advancement over the past decade. While the scores produced by such automated
systems may not fully reflect the complexity of writing proficiency, they can be used with scores
from human raters. In this process of incorporating computational techniques into essay scoring,
it would be extremely important for researchers and test developers to have a clear and everevolving understanding of the variations in linguistic, discourse, and metadiscourse features
across written genres (and even across sub-genres) to obtain valid and reliable scores.
Limitations and Future Research
In this study, I explored genre and task complexity effects on students’ perceptions and
language production systematically and provided meaningful suggestions on how researchers

123

and teachers need to understand genre and idea support as distinct task variables. Nevertheless,
there are several limitations that need to be addressed in future research in order to further
advance this line of research. First, the target of this study was limited to the independent writing
task under the time constraint of 30 minutes. Although this reflects a strictly controlled design of
the present study, given the increasing trend of integrating other skills materials in assessing
writing (Plakans, 2010; Plakans & Gebril, 2013), the exploration of L2 learners’ performance
across genres in the format of integrated writing will offer valuable information on more
authentic writing skills.
Additionally, the student participants of this study were the ESL students enrolled in
high-level courses at the English language program. We can expect that L2 learners at this
proficiency level might have acquired sufficient genre awareness, leading to clear genre effects
on language production. Although I attempted to examine the relationship between L2
proficiency and task type effects, I acknowledge that dividing the student participants into two
groups based on their cloze test scores might have resulted in reduced power (Plonsky &
Oswald, in press) and that the gap between the two proficiency groups was not large enough to
confirm the generalizability of the findings to lower proficiency students. Therefore, future
research needs to be followed in order to test how beginning-level students’ perceptions and
production of the writing tasks differ from the current findings from high intermediate students,
offering a more complete picture of task type effects in written discourse.
Finally, in an attempt to control for topic effects, I used the shared topic of foreign
language use for all writing tasks targeted in this study. While it was a proper decision to design
and use such an approachable topic, it might also be the case that this topic is quite common to
many L2 learners, and there is a possibility that some of the participants might have experienced

124

a similar writing prompt before. For example, in their various projects of L1 genre differences,
Berman and her colleagues have used interpersonal conflict as a shared topic (e.g., Berman,
2008; Berman & Katzenberger, 2004; Berman & Nir-Sagiv, 2004, 2007), which can be
considered less common than foreign language use for many adult L2 learners; using such topics
might have elicited somewhat different patterns. Therefore, exploring similar research questions
using a different (less common and more challenging) topic will provide information on whether
the present findings are generalizable to uncommon or complex topics.

125

APPENDICES

126

Appendix A.
Writing Prompts
Argumentative 1 (Arg/-Support)
Situation:
You attended a seminar and the main theme was that using a foreign language fluently has
become necessary in this globalized era.
Writing task:
Write an essay about whether you agree or disagree with the statement about the necessity of
foreign language abilities. Support your position with reasons. Be sure to fully develop your
essay by including clear explanations and logical supporting ideas.

Argumentative 2 (Arg/+Support)
Situation:
You attended a seminar and the main theme was that the ability to speak a foreign language
raises the possibility of having a successful life.
Writing task:
Write an essay about whether you agree or disagree with the statement about the relationship
between foreign language abilities and success. Support your position using the reasons provided
below. Be sure to fully develop your essay by including clear explanations and logical supporting
ideas.
Agree/Support to argue for the position
•

Better understanding of cultural differences and other ethnic groups

•

Greater job opportunities related to international business

•

Possibilities for fun activities such as traveling or watching foreign TV programs

Disagree/Support to argue against the position
•

Other qualities (such as self-confidence) more important than foreign language skills

•

Foreign language skills not necessary for many great jobs

•

A huge investment of time and effort for language learning that could be used for other skill
development

127

Narrative1 (Nar/-Support)
Situation:
Your friend has plans to learn a foreign language but is afraid it might be useless to spend the
time learning a language. You have successfully learned a foreign language and use it often. You
want to show your friend that language learning and use can be interesting by telling him/her
about your positive experience.
Writing task:
Tell a story about ONE of your positive experiences related to foreign language use. Be sure to
fully develop your story by including specific details.

Narrative 2 (Nar/+Support)
Situation:
Your friend is planning a trip to a foreign country. While excited about this trip, your friend is
worried about how to communicate with people using a foreign language. You have greater
foreign language experience, so your friend wants to know some of the possible difficulties she
may have while interacting with foreigners.
Writing task:
Tell a story about ONE of your difficult experiences related to interactions using a foreign
language. When developing your ideas, you can refer to the storylines below and use any of them
to facilitate your writing. Be sure to fully develop your story by including specific details.
Example storylines
•

You visited a public place in a foreign country. When you were talking to a foreigner, he/she
corrected your language constantly, making you feel offended. Then…

•

You were talking to a foreigner. While interacting with him/her, you experienced some
cultural differences that made you feel uncomfortable. Then…

•

You had to fix a problem or sign a contract using a foreign language. For such purposes, you
expressed your ideas to a native speaker of the language, but it caused a misunderstanding,
leading to a serious accident. Then…

128

Appendix B.
Revised Analytic Scoring Rubric

129

Appendix C.
Cloze Test
Name: ______________________________

Class: ______________________________

DIRECTIONS:
1. Read the passage quickly to get the general meaning.
2. Write only one word in each blank in the column to the right. Contractions (e.g., can’t) are
considered one word.
3. Check your answers.
NOTE: Spelling will not count against you as long as the scorer can read the word.
EXAMPLE:
I met my friend who took a final exam yesterday.
He told me that he is satisfied __________ his performance.

Answer: with

You have 30 minutes to complete the cloze test.
MAN AND HIS PROGRESS
Man is the only living creature that can make and use tools. He is the most teachable of
living beings, earning the name of Homo sapiens. ____1____ ever restless brain has used the
____2____ and the wisdom of his ancestors ____3____ improve his way of life. Since
____4____ is able to walk and run ____5____ his feet, his hands have always ____6____ free to
carry and to use ____7____. Man’s hands have served him well ____8____ his life on earth. His
development, _____9_____ can be divided into three major ____10____, is marked by several
different ways ____11____ life.
Up to 10,000 years ago, ____12____ human beings lived by hunting and ____13____.
They also picked berries and fruits, ____14____ dug for various edible roots. Most ____15____,
the men were the hunters, and ____16____ women acted as food gatherers. Since ____17____
women were busy with the children, ____18____ men handled the tools. In a ____19____ hand,
a dead branch became a ____20____ to knock down fruit or to ____21____ for tasty roots.
Sometimes, an animal ____22____ served as a club, and a ____23____ piece of stone, fitting
comfortably into ____24____ hand, could be used to break ____25____ or to throw at an animal.
____26____ stone was chipped against another until ____27____ had a sharp edge. The
primitive ____28____ who first thought of putting a ____29____ stone at the end of a
____30____ made a brilliant discovery: he ____31____ joined two things to make a ____32____

130

useful tool, the spear. Flint, found ____33____ many rocks, became a common cutting
____34____ in the Paleolithic period of man’s ____35____. Since no wood or bone tools
____36____ survived, we know of this man ____37____ his stone implements, with which he
____38____ kill animals, cut up the meat, ____39____ scrape the skins, as well as ____40____
pictures on the walls of the ____41____ where he lived during the winter.
____42____ the warmer seasons, man wandered on ____43____ steppes of Europe
without a fixed ____44____, always foraging for food. Perhaps the ____45____ carried nuts and
berries in shells ____46____ skins or even in light, woven ____47____. Wherever they camped,
the primitive people ____48____ fires by striking flint for sparks ____49____ using dried seeds,
moss, and rotten ____50____ for tinder. With fires that he kindled himself, man could keep wild
animals away and could cook those that he killed, as well as provide warmth and light for
himself.

Cloze Test Answers
Exact answer

Acceptable answers

1

his

man’s, our, the

2

knowledge

accomplishments, culture, cunning, examples, experience(s), hands,
ideas, information, ingenuity, instinct, intelligence, mistakes, nature,
power, skill(s), talent, teaching, technique, thought, will, wit, words,
work

3

to

4

man

he

5

on

upon, using, with

6

been

felt, hung, remained

7

tools

adequately, carefully, conventionally, creatively, diligently, efficiently,
freely, implements, objects, productively, readily, them, things, weapons

8

during

all, for, improving, in, through, throughout, with

9

which

also, basically, conveniently, easily, historically, however, often, since,
that, thus

10 periods

areas, categories, divisions, eras, facets, groups, parts, phrases, sections,
stages, steps, topics, trends

131

11 of

for, in, through, towards

12 all

early, hungry, many, most, only, primitive, the, these

13 fishing

farming, foraging, gathering, killing, scavenging, scrounging, sleeping,
trapping

14 and

or, often, some, they

15 often

always, emphatically, important, nights, normally, of, times, tribes

16 the

all, house, many, most, older, their, younger

17 the

all, many, married, most, often, older, primate, these

18 the

all, constructive, many, most, older, primate, tough, younger

19 man’s

able, big, closed, coordinated, creative, deft, empty, free, human(’s),
hunter’s, learned, needed, needy, person’s, right, single, skilled, skillful,
small, strong, trained

20 tool

club, device, instrument, pole, rod, spear, stick, weapon

21 dig

burrow, excavate, probe, search, test

22 bone

arm, easily, foot, head, hide, horn, leg, skull, tail, tusk

23 sharp

big, chipped, fashioned, flat, hard, heavy, large, rough, round, shaped,
sizeable, small, smooth, soft, solid, strong, thin

24 the

a, his, man’s, one(’s)

25 nuts

apart, bark, bones, branches, coconuts, down, firewood, food, fruit,
heads, ice, items, meat, objects, open, rocks, shells, sticks, stone, things,
tinder, wood

26 one

a, each, flat, flint, glass, hard, obsidian, shale, softer, some, the, then, this

27 it

each, one, they

28 man

being, creature, human, hunter, men, owner, people, person

29 sharp

glass, hard, jagged, large, lime, pointed, sharpened, small

30 stick

bone, branch, club, log, pole, rod, shaft

31 had

accidentally, cleverly, clumsily, conveniently, creatively, dexterously,
double, easily, first, ingeniously, securely, simply, soon, suddenly,
tastefully, then, tightly

32 very

bad, extremely, good, hunter’s, incredibly, intelligent, long, modern,
most, necessarily, new, portentously, quite, really, tremendously

33 in

all, among, amongst, by, inside, on, that, using, within

132

34 tool

device, edge, implement, instrument, item, material, method, object,
piece, practice, stone, utensil
age, ancestry, discoveries, era, evolution, existence, exploration, history,

35 development

life, time

36 have

actually, apparently, ever

37 by

and, for, from, had, made, through, used, using

38 could

did, would

39 and

carefully, help, or, skillfully, then, would

40 draw

carve, create, drawing, engrave, hang, paint, painting, place, sketch,
some, the

41 cave(s)

animals, place(s), room

42 in

and, during, with

43 the

across, aimless, all, barren, dry, flat, high, in, long, many, plain, stone,
through, to, toward, unknown, various

44 home

appetite, camp, course, destination, destiny, diet, direction, domain,
foundation, habitat, income, knowledge, location, lunch, map, meal, path,
pattern, place, plan, route, supplement, supply, time, weapon

45 women
46 or

and, animal, animal’s, covered, in, like, of, on, their, using, with

47 baskets

bags, blankets, chests, cloth(s), clothes, fabric, garments, hides, material,
nets, pouches, sacks

48 made

began, built, lighted, lit, produced, set, started, used

49 and

also, by, occasionally, or, then, together, while

50 wood

bark, branches, dung, forage, grass, leaves, lumber, roots, skin, timber,
tree(s)

133

Appendix D.
Example Essays
For greater clarity, I corrected all spelling errors contained in the example essays and included an
indentation at the beginning of each paragraph. I did not change any grammatical or lexical
errors.

D1. Two essays for the analysis of complex nominals
For the following two essays (composed by S4), I underlined complex nominals based on the
scheme used for Polio & Yoon, in preparation (e.g., noun phrases with multiple premodifiers,
noun phrases with postmodifiers, noun clauses, and infinitives and gerunds in the subject
position).
•

Arg/+Support (Participant ID: S4)
Recent years, we had witnessed the rapid globalization in our world. Naturally, we have

increasing chance to use a foreign language. And the ability to speak a foreign language has
increasingly close relation with the possibility of having a successful life, because an adequate
skills of a foreign language is beneficial to enlarge our social network and get more good
chances for future career.
Admittedly, some people who never need to go abroad think the ability to speak a foreign
has nothing to do with the possibility of having a successful life. And getting the skills of a
foreign language would waste a lot of time. However, it is viewed from another angle, getting a
foreign language represents a general trend. If you want avoid to lose chances in the future, you
need to handle a foreign language.
The chief reason to support my idea is that an adequate foreign language is beneficial to
enlarge social network. It's very common for student who study on abroad that the living level
depends on the language level. In this society, the social network is very important for having a
successful life. Taking my own example, I have good level of English. So I can find many
internships in MSU, which are very useful for me to know many brilliant students and to enlarge
social network. Hence, that can lay a fundament for my future career.
The another reason that should be take into account is that it is good for getting more
good chances in your future career. For instant, my sister, who graduated from UCLA, is a senior
manager in IBM. She can get this opportunity because of her good English skill.

134

•

Nar/+Support (Participant ID: S4)
It is very common for foreign language user to have some difficult experiences related to

interactions. Of course, I also have a very embarrassing experience about making
communication with English speaker. This experience is still vivid. And I never forget it in my
life. In the following, I would like to share my embarrassing story to you and also want you not
to worry too much.
About two month ago, I was in the airplane from Beijing to Detroit. A waitress came to
me and said “Sir, would you want something to drink.” I was so happy, because at this time I
was extremely thirsty. And I replied that “Sure, I want orange juice. Please add some ice.” Then,
I found the waitress was very unhappy. She said “Sir, if you want ass, please add your own ass.”
Eventually, I realized that my pronunciation was wrong. That I pronounced a wrong vowel sound
led the waitress to misunderstand my meaning. I immediately apologized to this waitress and
explained my real meaning. To be honest, I felt really embarrassed in that situation. But at least I
corrected a wrong pronunciation.
All in all, even though you will face a lot challenge to use foreign language, you are
supposed to be brave. Practicing a lot can make you adequate.

D2. Two essays for the analysis of temporal connectives and first person pronouns
For the following two essays (composed by S45), I put temporal connectives in bold and
underlined first person pronouns.
•

Arg/+Support (Participant ID: S45)
Many people are in favor of the idea that speaking a foreign language raises the

possibility of being successful. As far as I am concerned, this statement is very reasonable, since
ability of speaking a foreign language is a huge advantage and it is significant in many ways.
First of all, speaking a foreign language indicates a better understanding of cultural
differences and other ethnic groups. Language is like a key to the gate of communication, once
you have the ability of communicating, you can chat with people and understand their thoughts.
It’s easy to live with local people and get used to their culture and lifestyle with the ability of
speaking their language.

135

Besides understanding culture, speaking a foreign language has lots of other benefits, for
instance, you will be provided a greater job opportunities related to international business. This
opportunity is valuable since there are huge markets in other countries. Those who can speak
many languages have earned a lot of money from international business.
Moreover, by having a good command of a foreign language, you gain more fun from
various activities such as traveling or watch foreign TV programs. You can enjoy different kind of
view and broaden your horizons. This is a very cool experience that definitely worth a try.
Speaking a foreign language is so beneficial that it is almost necessary if you want to be
successful in this globalized era. We should attach significance to learning a foreign language
and enjoy the great benefits brought by that.
•

Nar/+Support (Participant ID: S45)
Studying overseas is a wonderful experience. I can see and feel different culture and

make foreign friend. However, there can be as many difficulties as the benefits as well. I had
many difficulties when I first came to America, and I had to confront them. It was really a tough
experience for me.
One month ago, I started my new life in America. Everything went well at first, and I was
quite satisfied with my new circumstance here. The air was clean and fresh, and the sky was pure
blue. I can seldom enjoy this kind of environment in my hometown. I was in good mood, and
well-prepared to start my study life here, until that day I went to my first Mathematics class. I
found my classroom easily and took a seat there. I was nervous since I was unfamiliar with the
American teaching style, but I was confident too because my mathematics had always been very
good in China. When the professor started talking, I was astonished that he spoke too fast for me
to follow. I couldn’t even understand what the homework assignments were. All my confidence
were destroyed and I felt self-abashed. The professor was nice and humorous, but I just couldn’t
understand the jokes. I was worried about my future here and I was really stressed.
After spending a month studying here, I finally get used to the speed that my professor
talks. It was a tough time at first, but once you make up your mind to confront it to overcome it,
nothing will stop you, and you will be fine at last. So don’t be nervous and afraid, my friend,
there will be difficulties, but that’s not a big deal. It’s better to get prepared for the vocabulary
before you go abroad. That will make you feel more comfortable.

136

D3. Two essays for the analysis of dependent clauses
For the following two essays (composed by S47), I put nominal clauses in bold, double
underlined adverbial clauses, and underlined adjective clauses.
•

Arg/+Support (Participant ID: 47)
Mastering another language can make people successful. I agree with the statement that

being capable of another language can make people successful.
With the globalization in Asia, a increasingly amount of countries are seeking the
opportunities of cooperating with China, so the people who have the ability to speak other
languages have more chances to participate in international events. In the meantime, the rise of
international companies gives people more job opportunities, and most of the jobs they provide a
relatively high income.
Maybe money is the common standard of a successful life. Being able to speak other
languages, however, can give people more benefits than just material life. Learning another
language let people know what is like in another side of the globe, they can also learn more
about cultural differences. In the earlier period of Qing dynasty, China refuse to communicate
with other countries, and that led to a severe consequence, which is he left over in education,
technology and so on.
Therefore, language is the key to another world. On one hand, learning language can
give you multiple ways to perceive. It can also help you have a better understand of other
countries’ culture. On the other hand, you travel experience can be fantastic if you can
understand the language that the country use.
In conclusion, mastering other languages can give people much more amazing
experiences than they ever have, the tendency of globalization make it like a requirement if you
want to be successful.
•

Nar/+Support (Participant ID: 47)
Two years ago, I have gone to the South Korea to visit a friend, but the interesting thing

is that I can’t say a single Korean word. Also my phone couldn’t work in there. So this trip is
more like an adventure and a really amazing one.
137

I remember that I tried to ask somebody for the right path by using English, because my
friend said it’s okay to say English to them, they’ll understand. But soon I found out that my
biggest issue is not speaking correct English to them, but I can’t understand what they reply in
English. Then I had to read their gesture, and a nice lady even used electronic dictionary in her
phone to translate her word into English. Fortunately, most of them can understand what you
said in English. All I have to do is that to get used to their Korean-style English, and I did it.
Since you’re going to have a trip to another country, I personally think what kind of
language they can speak is a very important thing you need to know, such as Dubai, Korea, or
most countries in Europe, most of them can understand English and even can talk to you in
English. In this case, you don’t have to worry too much about language. In the mean time, you
have to know people may speak English, but they have accent, like Japan or India, so make sure
you prepare for this. You even can install an app in your cellphone for translating.
Another thing is that respect their cultures, there’s different manners in different
countries, you can search that online to make sure you won’t be too rude.
Finally, I hope you enjoy it, travelling to another country is really a great experience.

138

REFERENCES

139

REFERENCES

Aull, L. L., & Lancaster, Z. (2014). Linguistic markers of stance in early and advanced academic
writing: A corpus-based comparison. Written Communication, 31, 151–183.
Barkaoui, K. (2016). What and when second-language learners revise when responding to timed
writing tasks on the computer: The roles of task type, second language proficiency, and
keyboarding skills. The Modern Language Journal, 100, 320–240.
Beauvais, C., Olive, T., & Passerault, J. M. (2011). Why are some texts good and others not?
Relationship between text quality and management of the writing processes. Journal of
Educational Psychology, 103, 415–428.
Beers, S., & Nagy, W. (2009). Syntactic complexity as a predictor of adolescent writing quality:
Which measures? Which genre? Reading and Writing, 22, 185–200.
Beers, S., & Nagy, W. (2011). Writing development in four genres from grades three to seven:
Syntactic complexity and genre differentiation. Reading and Writing, 24, 183–202.
Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Hillsdale, NJ:
Lawrence Erlbaum.
Berman, R. A. (2008). The psycholinguistics of developing text construction. Journal of Child
Language, 35, 735–771.
Berman, R. A., & Katzenberger, I. (2004). Form and function in introducing narrative and
expository texts: A developmental perspective. Discourse Processes, 38, 57–94.
Berman, R. A., & Nir-Sagiv, B. (2004). Linguistic indicators of inter-genre differentiation in
later language development. Journal of Child Language, 31, 339–380.
Berman, R. A., & Nir-Sagiv, B. (2007). Comparing narrative and expository text construction
across adolescence: A developmental paradox. Discourse Processes 43, 79–120.
Berman, R. A., & Slobin, D. I. (1994). Relating events in narrative: A crosslinguistic
developmental study. Hillsdale: Erlbaum.
Biber, D. (1988). Variation across speech and writing. Cambridge, UK: Cambridge University
Press.
Biber, D. (2006a). A corpus-based study of spoken and written registers. Amsterdam: John
Benjamins.
Biber, D. (2006b). Stance in spoken and written university registers. Journal of English for
Academic Purposes, 5, 97–116.

140

Biber, D., & Conrad, S. (2009). Register, genre, and style. Cambridge, UK: Cambridge
University Press.
Biber, D., & Gray, B. (2010). Challenging stereotypes about academic writing: Complexity,
elaboration, explicitness. Journal of English for Academic Purposes, 9, 2–20.
Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to
measure grammatical complexity in L2 writing development? TESOL Quarterly, 45, 5–
35.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of
spoken and written English. Harlow, UK: Longman.
Bouwer, R., Béguin, A., Sanders, T., & van den Bergh, H. (2015). Effect of genre on the
generalizability of writing scores. Language Testing, 31, 83–100.
Brossell, G. (1983). Rhetorical specification in essay topics. College English, 45, 165–173.
Brown, J. D. (1978). Correlational study of four methods for scoring cloze tests. MA Thesis,
University of California at Los Angeles.
Brown, J. D. (1980). Relative merits of four methods for scoring cloze tests. The Modern
Language Journal, 64, 311–317.
Brown, J. D., Hilgers, T., & Marsella, J. (1991). Essay prompts and topics: Minimizing the effect
of mean differences. Written Communications, 8, 533–556.
Brünken, R., Seufert, T., & Paas, F. (2010). Measuring cognitive load. In J. L. Plass, R. Moreno,
& R. Brünken (Eds.), Cognitive load theory (pp. 181–202). Cambridge: Cambridge
University Press.
Bulté, B., & Housen, A. (2014). Conceptualizing and measuring short-term changes in L2
writing complexity. Journal of Second Language Writing, 26, 42–65.
Butler, Y.G., & Iino, M. (2005). Current Japanese reforms in English language education: the
2003 ‘Action Plan’. Language Policy, 4, 25–45.
Byun, K., Chu, H., Kim, M., Park, I., Kim, S., & Jung, J. (2011). English-medium teaching in
Korean higher education: Policy debates and reality. Higher Education, 62, 431–449.
Common Core State Standards (CCSS). (2017). English language arts standards. Retrieved from
http://www.corestandards.org/ELA-Literacy/
Chafe, W. L. (1982). Integration and involvement in speaking, writing, and oral literature. In D.
Tannen (Ed.), Spoken and written language: Exploring orality and literacy (pp. 35–54).
Norwood, NJ: Ablex.
Cheng, L. (2008). The key to success: English language testing in China. Language Testing, 25,

141

15–37.
Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2.
Written Communication, 18, 80–98.
Chow, A. W., & Mok-Cheung, A. H. (2004). English language teaching in Hong Kong SAR:
Tradition, translation and transformation. In W. K. Ho & R. Y. L. Wong (Eds.), English
language teaching in East Asia today (pp. 150–177). Singapore: Eastern Universities
Press.
Christie, F. (1997). Curriculum macrogenres as forms of initiation into a culture. In F. Christie &
J. R. Martin (Eds.), Genre and institutions: Social processes in the workplace and school
(pp. 134–160). New York, NY: Continuum.
Connor-Linton, J., & Polio, C. (2014). Comparing perspectives on L2 writing: Multiple analyses
of a common corpus. Journal of Second Language Writing, 26, 1–9.
Crossley, S. A., Cobb, T., & McNamara, D. S. (2013). Comparing count-based and band-based
indices of word frequency: Implications for active vocabulary research and pedagogical
applications. System, 41, 965–981.
Crossley, S. A., Kyle, C., & McNamara, D. S. (in press). The development and use of cohesive
devices in L2 writing and their relations to judgments of essay quality. Journal of Second
Language Writing.
Crossley, S. A., & McNamara, D. S. (2012). Predicting second language writing proficiency:
The roles of cohesion and linguistic sophistication. Journal of Research in Reading, 35,
115–135.
Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A
computational investigation of syntactic complexity in L2 learners. Journal of Second
Language Writing, 26, 66–79.
Crossley, S. A., Salsbury, T., McNamara, D. S. & Jarvis, S. (2010). Predicting lexical proficiency
in language learner texts using computational indices. Language Testing, 28(4), 561–580.
Crossley, S. A., Yang, H. S., & McNamara, D. S. (2014). What’s so simple about simplified
texts? A computational and psycholinguistic investigation of text comprehension and text
processing. Reading in a Foreign Language, 26, 92–113.
Crowhurst, M. (1980). Syntactic complexity and teachers’ quality ratings of narrations and
arguments. Research in the Teaching of English, 14, 223–231.
Ellis, R., & Yuan, F. (2004). The effects of planning on fluency, complexity, and accuracy in
second language narrative writing. Studies on Second Language Acquisition, 26, 59–84.
Engelhard, G., Gordon, B., & Gabrielson, S. (1992). The influences of mode of discourse,
experiential demand, and gender on the quality of student writing. Research in the
142

Teaching of English, 26, 315–336.
Foltz, P. W. (2007). Discourse coherence and LSA. In T. K. Landauer, D. S. McNamara, S.
Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 167–184).
Mahwah, NJ: Lawrence Erlbaum.
Fotos, S. S., (1991). The cloze test as an integrative measure of EFL proficiency: A substitute for
essays on college entrance examinations? Language Learning, 41, 313–336.
Frear, M. W., & Bitchener, J. (2015). The effects of cognitive task complexity on writing
complexity. Journal of Second Language Writing, 30, 45–57.
Gernsbacher, M. A. (1990). Language comprehension as structure building. Hillsdale, NJ:
Lawrence Erlbaum. 
Gilabert, R. (2007). Effects of manipulating task complexity on self-repairs during L2 oral
production. International Review of Applied Linguistics in Language Teaching, 45, 215–
240.
Ginsburg, H. P., & Opper, S. (1988). Piaget’s theory of intellectual development (3rd ed.).
Englewood Cliffs, NJ: Prentice-Hall.
Grabe, W., & Kaplan, R. B. (1996). Theory and practice of writing. New York: Longman.
Graham, S., Berninger, V. W., & Fan, W. (2007). The structural relationship between writing
attitude and writing achievement in first and third grade students. Contemporary
Educational Psychology, 32, 516–536.
Halliday, M. A. K. (1993). Some grammatical problems in scientific English. In M. A. K.
Halliday & J. R. Martin (Eds.), Writing science (pp. 2–21). London: The Falmer Press.
Halliday, M. A. K., & Matthiessen, C. (1999). Construing experience through meaning: A
language-based approach to cognition. London: Cassell.
Haswell, R. H. (2000). Documenting improvement in college writing: A longitudinal approach.
Written Communication, 17, 307–352.
Hayes, J. R. (1996). A new framework for understanding cognition and affect in writing. In C.
M. Levy & S. Randall (Eds.), The science of writing: Theories, methods, individual
differences, and applications (pp. 1–27). Mahwah, NJ: Erlbaum.
Hayes, J. R., & Chenoweth, N. A. (2006). Is working memory involved in the transcribing and
editing of texts? Written Communication, 23, 135–149.
Hayes, J. R., & Flower, L. S. (1980). Identifying the organization of writing processes. In L. W.
Gregg & E. R. Steinberg (Eds.), Cognitive processes in writing (pp. 3–30). Hillsdale, NJ:
Erlbaum.

143

Hickmann, M. (2003). Children’s discourse: Person, space, and time across languages.
Cambridge: Cambridge University Press.
Hinkel, E. (2002). Second language writers’ text: Linguistic and rhetorical features. Mahwah,
NJ: Lawrence Erlbaum.
Hinofotis, F. B. (1980). Cloze as an alternative method of ESL placement and proficiency
testing. In J. W. Oller, Jr., & K. Perkins (Eds.), Research in language testing (pp. 121–
128). Rowley, MA: Newbury House.
Hong, H., & Cao, F. (2014). Interactional metadiscourse in young EFL learner writing: A
corpus-based study. Interactional Journal of Corpus Linguistics, 19, 201–224.
Housen, A., Kuiken, F., & Vedder, I. (2012). Complexity, accuracy and fluency: Definitions,
measurement and research. In A. Housen, F. Kuiken, & I. Vedder (Eds.), Dimensions of
L2 performance and proficiency: Investigating complexity, accuracy and fluency in SLA
(pp. 21–46). Amsterdam/Philadelphia: John Benjamins.
Huot, B. (1990). Literature of direct writing assessment: Major concerns and prevailing trends.
Review of Educational Research, 60, 237–263.
Hyland, K. (2005). Stance and engagement: A model of interaction in academic discourse.
Discourse Studies, 7, 173–192.
Hyland, K. (2008). Disciplinary voices: Interactions in research writing. English Text
Construction, 1, 5–22.
Institute of International Education (IIE). (2016). Open doors 2016. Retrieved from
http://www.iie.org/Research-and-Publications/Open-Doors/Data/FastFacts#.WKDq0rYrJo4
Ishikawa, T. (2007). The effect of manipulating task complexity along the [+/ Here- and-Now]
dimension of L2 written narrative discourse. In G. M. M. del Pilar (Ed.), Investigating
tasks in formal language learning (pp. 136–156). Clevedon, UK: Multilingual Matters.
Jackson, D. O., & Suethanapornkul, S. (2013). The cognition hypothesis: A synthesis and metaanalysis of research on second language task complexity. Language Learning, 63, 330–
367.
Jarvis, S., Grant, L., Bikowski, D., & Ferris, D. (2003). Exploring multiple profiles of highly
rated learner compositions. Journal of Second Language Writing, 12, 377–403.
Jeffery, J. V. (2009). Construct of writing proficiency in U.S. state and national writing
assessments: Exploring variability. Assessing Writing, 14, 3–24.
Jeon, M. (2009). Globalization and native English speakers in English programme in Korea
(EPIK). Language, Culture and Curriculum, 22, 231–243.

144

Jeong, H. (2017). Narrative and expository genre effects on students, raters, and performance
criteria. Assessing Writing, 31, 113–125.
Johansson, R., Wengelin, Å., Johansson, V., & Holmqvist K., (2010). Looking at the keyboard or
the monitor: Relationship with text production processes. Reading and Writing, 23, 835–
851.
Johns, A. M. (1995). Teaching classroom and authentic genres: Initiating students into academic
cultures and discourses. In D. Belcher & G. Braine (Eds.), Academic writing in a second
language: Essays on research and pedagogy (pp. 277–293). Norwood, NJ: Ablex.
Johnson, M. D., Mercado, L., & Acevedo, A., (2012). The effect of planning sub-processes on L2
writing fluency, grammatical complexity, and lexical complexity. Journal of Second
Language Writing, 21, 264–282.
Kang, J. Y. (2005). Written narratives as an index of L2 competence in Korean EFL learners.
Journal of Second Language Writing, 14, 259–279.
Kegley, P. H. (1986). The effect of mode discourse on student writing performance: Implications
for policy. Educational Evaluation and Policy Analysis, 8, 147–154.
Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S. Ransdell
(Eds.), The science of writing: Theories, methods, individual differences and applications
(pp. 57–72). Mahwah, NJ: Lawrence Erlbaum.
Kikuchi, K. (2006). Perspectives: Revisiting English entrance examinations at Japanese
universities after a decade. JALT Journal, 27, 77–96.
Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. Proceedings of the 41st
Meeting of the Association for Computational Linguistics, 423–430.
Knudson, R. E. (1995). Writing experiences, attitudes, and achievement of first to sixth graders.
Journal of Educational Research, 89, 90–97.
Kormos, J. (2011). Task complexity and linguistic and discourse features of narrative writing
performance. Journal of Second Language Writing, 20, 148–161.
Kormos, J. (2014). Differences across modalities of performance: An investigation of linguistic
and discourse complexity in narrative tasks. In H. Byrnes & R. M. Manchón (Eds.), Taskbased language learning: Insights from and for L2 writing (pp. 193–216). Amsterdam:
John Benjamins.
Kormos, J., & Trebits, A. (2012). The role of task complexity, modality, and aptitude in narrative
task performance. Language Learning, 62, 439–472.
Kuiken, F., Mos, M. & Vedder, I. (2005). Cognitive task complexity and second language writing
performance. In S. Foster-Cohen, M.P. García Mayo, & J. Cenoz (Eds.), Eurosla
Yearbook. Vol. 5 (pp. 195–222). Amsterdam: John Benjamins.
145

Kuiken, F., & Vedder, I. (2007). Task complexity and measures of linguistic performance in L2
writing. International Review of Applied Linguistics, 45, 261–284. 
Kuiken, F., & Vedder, I. (2008). Cognitive task complexity and written output in Italian and
French as a foreign language. Journal of Second Language Writing, 17, 48–60.
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis.
Discourse Processes, 25, 259–284. 
Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written
production. Applied Linguistics, 16, 307–332.
Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to
analyze and visualize writing processes. Written Communication, 30, 358–392.
Lo, J., & Hyland, F. (2007). Enhancing students’ engagement and motivation in writing: The case
of primary students in Hong Kong. Journal of Second Language Writing, 16, 219–237.
Long, M. (2015). Second language acquisition and task-based language teaching. Oxford, UK:
John Wiley & Sons.
Loschky, L., & Bley-Vroman, R. (1993). Grammar and task-based methodology. In G. Crookes
& S. Gass (Eds.), Tasks and language learning: Integrating theory and practice (pp. 123–
167). Philadelphia: Multilingual Matters.
Lu, X. (2010). Automatic measurement of syntactic complexity in child language acquisition.
International Journal of Corpus Linguistics, 14, 3–28.
Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of collegelevel ESL writers’ language development. TESOL Quarterly, 45, 36–62.
Malicka, A., & Levkina, M. (2012). Measuring task complexity: Does L2 proficiency matter? In
A. Shehadeh & C. Coombe (Eds.), Task-based language teaching in foreign language
contexts: Research and implementation (pp. 43–66). Amsterdam: John Benjamins.
Malvern, D. D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical diversity and language
development: Quantification and assessment. Basingstoke, UK: Palgrave Macmillan.
Manchón, R. M., & Roca de Larios, J. (2007). On the temporal nature of planning in L1 and L2
composing: A study of foreign language writers. Language Learning, 57, 549–593.
Matsuda, P. K. (2015). Identity in written discourse. Annual Review of Applied Linguistics, 35,
140–159.
Mazgutova, D., & Kormos, J. (2015). Syntactic and lexical development in an intensive English
for Academic Purposes programme. Journal of Second Language Writing, 29, 3–15.
McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of

146

sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42,
381–392.
McCutchen, D., & Perfetti, C. A. (1982). Coherence and connectedness in the development of
discourse production. Text-Interdisciplinary Journal for the Study of Discourse, 2, 113–
140.
McDonough, K., & Trofimovich, P. (2011). Using priming methods in second language research.
New York, NY: Routledge.
McNamara, D. S., Crossley, S. A., & McCarthy, P. M. (2010). Linguistic features of writing
quality. Written Communication, 27, 57–86.
McNamara, D. S., Graesser, A. C., McCarthy, P., & Cai, Z. (2014). Automated evaluation of text
and discourse with Coh-Metrix. Cambridge: Cambridge University Press.
Mei, W. S. (2006). Creating a contrastive rhetorical stance: Investigating the strategy of
problematization in students’ argumentation. RELC, 37, 329–353.
Melendy, G. A. (2008). Motivating writers: The power of choice. The Asian EFL Journal, 10,
187–198.
Norris, J., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed
SLA: The case of complexity. Applied Linguistics, 30(4), 555–578.
Ojima, M. (2006). Concept mapping as pre-task planning: A case study of three Japanese ESL
writers. System, 34, 566–585.
Olinghouse, N. G., & Graham, S. (2009). The relationship between the discourse knowledge and
the writing performance of elementary-grade students. Journal of Educational
Psychology, 101, 37–50.
Oller, J. W., & Conrad, C. A. (1971). The cloze technique and ESL proficiency. Language
Learning, 21, 185–195.
Ong, J. (2013). Discovery of ideas in second language writing task environment. System, 41,
529–542.
Ong, J. (2014). How do planning time and task conditions affect metacognitive processes of L2
writers? Journal of Second Language Writing, 23, 17–30.
Ong, J., & Zhang L. J. (2010). Effects of task complexity on the fluency and lexical complexity
in EFL students’ argumentative writing. Journal of Second Language Writing, 19, 218–
233.
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A
research synthesis of college-level L2 writing. Applied Linguistics, 24, 492–518.

147

Parkinson, J., & Musgrave, J. (2014). Development of noun phrase complexity in the writing of
English for Academic Purposes students. Journal of English for Academic Purposes, 14,
48–59.
Peterson, C., & McCabe, A. (1983). Developmental psycholinguistics: Three ways of looking at
a child’s narrative. New York: Plenum.
Plakans, L. (2010). Independent vs. integrated writing tasks: A comparison of task
representation. TESOL Quarterly, 44, 185–194.
Plakans, L., & Gebril, A. (2013). Using multiple texts in an integrated writing assessment:
Source text use as a predictor of score. Journal of Second Language Writing, 22, 217–
230.
Plonsky, L., & Kim, Y. (2016). Task-based learner production: A substantive and
methodological review. Annual Review of Applied Linguistics, 36, 73–97.
Plonsky, L., & Oswald, F. L. (in press). Multiple regression as a flexible alternative to ANOVA in
L2 research. Studies in Second Language Acquisition.
Polio, C. (2013). Revising a writing rubric based on raters' comments: Does it result in a more
reliable and valid assessment? Midwest Association of Language Testers, East Lansing,
MI.
Polio, C., & Yoon, H. (2016). Task and genre differences in L2 writing research. Invited
colloquium (Colloquium title: Researching written task complexity in diverse contexts
organized by Lawrence Zhang) presented at American Association for Applied
Linguistics (AAAL) 2016, Orlando, FL.
Polio, C., & Yoon, H. (under review). The use of two automated tools to examine ESL learners’
syntactic complexity across two genres. International Journal of Applied Linguistics
[Special issue: Perspectives and challenges for research on grammatical complexity in
SLA: The case of variation].
Qin, J., & Karabacak, E. (2010). The analysis of Toulmin elements in Chinese EFL university
argumentative writing. System, 38, 444–456.
Qin, W., & Uccelli, P. (2016). Same language, different functions: A cross-genre analysis of
Chinese EFL learners’ writing performance. Journal of Second Language Writing, 33, 3–
17.
Quinlan, T., Loncke, M., Leijten, M., & Van Waes, L. (2012). Coordinating the cognitive
processes of writing: The role of the Monitor. Written Communication, 29, 345–368.
Ravid, D. (2005). Emergence of linguistic complexity in later language development: Evidence
from expository text construction. In D. Ravid & H. B. Shyldkrot (Eds.), Perspectives on
language and language development: Essays in honor of Ruth A. Berman (pp. 337–356).
London: Kluwer Academic.
148

Révész, A. (2009). Task complexity, focus on form, and second language development. Studies
in Second Language Acquisition, 31, 437–470.
Révész, A. (2014). Towards a fuller assessment of cognitive models of task-based learning:
Investigating task-generated cognitive demands and processes. Applied Linguistics, 35,
87–92.
Révész, A., Kourtali, N., & Mazgutova, D. (in press). Effects of task complexity on L2 writing
behaviors and linguistic complexity. Language Learning.
Révész, A., Michel, M., & Gilabert, R. (2016). Measuring cognitive task demands using dualtask methodology, subjective self-ratings, and expert judgments: A validation study.
Studies in Second Language Acquisition, 38, 703–737.
Révész, A., Sachs, R., & Hama, M. (2014). The effects of task complexity and input frequency
on the acquisition of the past counterfactual construction through recasts. Language
Learning, 64, 615–650.
Rezaei, A. R., & Lovorn, M. (2010). Reliability and validity of rubrics for assessment through
writing. Assessing Writing, 15, 18–39.
Robinson, P. (2001a). Task complexity, task difficulty, and task production: Exploring
interactions in a componential framework. Applied Linguistics, 22, 27–57.
Robinson, P. (2001b). Task complexity, cognitive resources, and syllabus design: A triadic
framework for examining task influences on SLA. In P. Robinson (Ed.), Cognition and
second language instruction (pp. 287–318). Cambridge, UK: Cambridge University
Press.
Robinson, P. (2003). The cognition hypothesis of adult, task-based language learning. Second
Language Studies, 21, 45–107.
Robinson, P. (2005). Cognitive complexity and task sequencing: Studies in a componential
framework for second language task design. International Review of Applied Linguistics,
43, 1–32.
Robinson, P. (2007). Task complexity, theory of mind, and intentional reasoning: Effects on L2
speech production, interaction, uptake and perceptions of task difficulty. International
Review of Applied Linguistics, 45, 193–213.
Robinson, P. (2010). Situating and distributing cognition across task demands: The SSARC
model of pedagogic task sequencing. In M. Putz & L. Sicola (Eds.), Cognitive processing
in second language acquisition: Inside the learner’s mind (pp. 243–268). Amsterdam,
The Netherlands: John Benjamins.
Robinson, P. (2011). Task-based language learning: A review of issues. Language Learning, 61
(Suppl. 1), 1–36.

149

Ruiz-Funes, M. (2014). Task complexity and linguistic performance in advanced college-level
foreign language writing. In H. Byrnes & R. M. Manchón (Eds.), Task-based language
learning: Insights from and for L2 writing (pp. 163–192). Amsterdam: John Benjamins.
Ruiz-Funes, M. (2015). Exploring the potential of second/foreign language writing for language
learning: The effects of task factors and learner variables. Journal of Second Language
Writing, 28, 1–19.
Sakamoto, M. (2012). Moving towards effective English language teaching in Japan: Issues and
challenges. Journal of Multilingual and Multicultural Development, 33, 409–420.
Shim, R. J., & Baik, M. J. (2004). English education in South Korea. In W. K. Ho & R. Y. L.
Wong (Eds.), English language teaching in East Asia today (pp. 241–261). Singapore:
Eastern Universities Press.
Skehan, P. (1998). A cognitive approach to language learning. Oxford, UK: Oxford University
Press.
Skehan, P. (2009). Modelling second language performance: Integrating complexity, accuracy,
fluency and lexis. Applied Linguistics, 30, 510–532.
Skehan, P., & Foster, P. (1997). The influence of planning and post-task activities on accuracy
and complexity in task based learning. Language Teaching Research, 1, 185–211.
Skehan, P., & Foster, P. (2001). Cognition and tasks. In P. Robinson (Ed.), Cognition and second
language instruction (pp. 183–205). Cambridge: Cambridge University Press.
Slobin, D. (2004). The many ways to search for a frog: Linguistic typology and the expression of
motion events. In S. Strömqvist & L. Verhoeven (Eds.), Relating events in narrative,
volume 2: Typological and contextual perspectives (pp. 219–257). Mahwah, NJ:
Lawrence Erlbaum.
Smith, W. L., Hull, G. A., Land, R. E., Moore, M. T., Ball, C., Dunham, D. E., Hickey, L. S., &
Ruzich, C. W. (1985). Some effects of varying the structure of a topic on college students’
writing. Written Communication, 2, 73–89.
Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen
(Ed.), Language typology and lexical description, volume 3: Grammatical categories and
the lexicon (pp. 36–149). Cambridge: Cambridge University Press.
Talmy, L. (2000). Toward a cognitive semantics, volume 1: Concept structuring systems.
Cambridge: MIT Press.
Tavakoli, P. (2014). Storyline complexity and syntactic complexity in writing and speaking tasks.
In H. Byrnes & R. M. Manchón (Eds.), Task-based language learning: Insights from and
for L2 writing (pp. 217–236). Amsterdam: John Benjamins.
Tedick, D. J. (1990). ESL writing assessment: Subject-matter knowledge and its impact on
150

performance. English for Specific Purposes, 9, 123–143.
Toutanova, K., Klein, D., Manning, C., & Singer, Y. (2003). Feature-Rich Part-of-Speech
Tagging with a Cyclic Dependency Network. Proceedings of HLT-NAACL 2003, 252–
259.
Tremblay, A. (2011). Proficiency assessment standards in second language acquisition research:
“Clozing” the gap. Studies in Second Language Acquisition, 33, 339–372.
van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. New York:
Academic Press.
Watanabe, Y. (1996). Does grammar translation come from the entrance examination?
Preliminary findings from classroom-based research. Language Testing, 13, 318–333.
Way, P., Joiner, E. G., & Seaman, M. (2000). Writing in the secondary foreign language
classroom: The effects of prompts and tasks on novice learners of French. The Modern
Language Journal, 84, 171–184.
Wengelin, Å., Torrance, M., Holmqvist, K., Simpson, S., Galbraith, D., Johansson, V., &
Johansson, R. (2009). Combined eyetracking and keystroke-logging methods for studying
cognitive processes in text production. Behavior Research Methods, 41, 337–351.
Wolfe-Quintero, K., Inagaki, S., & Kim, H. (1998). Second language development in writing:
Measures of fluency, accuracy, and complexity. Second Language Teaching &
Curriculum Center, University of Hawaii at Manoa.
Wu, X. (2003). Intrinsic motivation and young language learners: The impact of the classroom
environment. System, 31, 501–517.
Yang, W. (2014). Mapping the relationships among the cognitive complexity of independent
writing tasks, L2 writing quality, and complexity, accuracy and fluency of L2 writing.
Doctoral dissertation. Retrieved from: http://scholarworks.gsu.edu/alesl_diss/29
Yang, W. Lu, X., & Weigle, S. C. (2015). Different topics, different discourse: Relationships
among writing topic, measures of syntactic complexity, and judgments of writing quality.
Journal of Second Language Writing, 28, 53–67.
Yang, W., & Sun, Y. (2012). The use of cohesive devices in argumentative writing by Chinese
EFL learners at different proficiency levels. Linguistics and Education, 23, 31–48.
Yoon, H. (2017a). Textual voice elements and voice strength in EFL argumentative writing.
Assessing Writing, 32, 72–84.
Yoon, H. (2017b). Linguistic complexity in L2 writing revisited: Issues of topic, proficiency, and
construct multidimensionality. System, 66, 130–141.

151

Yoon, H., & Polio, C. (2017). ESL students’ linguistic development in two written genres.
TESOL Quarterly.
Yuan, F., & Ellis, R. (2003). The effects of pre-task planning and on-line planning on fluency,
complexity and accuracy in L2 monologic oral production. Applied Linguistics, 24, 1–27.
Zhang, L. J. (2013). Second language writing as and for second language learning. Journal of
Second Language Writing, 22, 446–447.
Zhao, C. G. (2012). Measuring authorial voice strength in L2 argumentative writing: The
development and validation of an analytic rubric. Language Testing, 30, 201–230.
Zhao, C. G., & Llosa, L. (2008). Voice in high-stakes L1 academic writing assessment:
Implications for L2 writing instruction. Assessing Writing, 13, 153–170.
Zimmerman, B. J., & Bandura, A. (1994). Impact of self-regulatory influences on writing course
attainment. American Educational Research Journal, 31, 845–862.

152