CORRECTIVE FEEDBACK IN PERSPECTIVE: THE INTERFACE BETWEEN
FEEDBACK TYPE, PROFICIENCY, THE CHOICE OF TARGET STRUCTURE, AND
LEARNERS’ INDIVIDUAL DIFFERENCES IN WORKING MEMORY AND
LANGUAGE ANALYTIC ABILITY
BY
SHAOFENG LI

A DISSERTATION
Submitted to
Michigan State University
In partial fulfillment of the requirements
For the degree of
DOCTOR OF PHILOSOPHY
Second Language Studies
2010

ABSTRACT
CORRECTIVE FEEDBACK IN PERSPECTIVE: THE INTERFACE BETWEEN
FEEDBACK TYPE, PROFICIENCY, THE CHOICE OF TARGET STRUCTURE, AND
LEARNERS’ INDIVIDUAL DIFFERENCES IN WORKING MEMORY AND
LANGUAGE ANALYTIC ABILITY
BY
SHAOFENG LI
This study investigates the interaction between feedback type, proficiency, the
choice of target structure, and learners’ individual differences in working memory and
language analytic ability in the learning of Chinese as a foreign language. Seventy-eight
L2 Chinese learners from two large U.S. universities participated in the study. The
participants were divided into two proficiency levels according to their performance on a
standardized proficiency test. At each proficiency level, they were randomly assigned to
three conditions: implicit (recasts), explicit (metalinguistic correction), and control.
Treatment effects were measured by means of a grammaticality judgment test (GJT) and
an elicited imitation (EI) test. Learners’ working memory was measured by means of a
listening span test, and the Words in Sentences subtest of the MLAT (Carroll & Sapon,
2002) was used to gauge learners’ language analytic ability. The study had four sessions.
In session 1, the learners took the proficiency test and the GJT pretests; in sessions 2 and
3, the learners took the EI pretest, received implicit or explicit feedback on their
nontargetlike use of Chinese classifiers and the perfective –le in dyadic interaction, and in
the end took the immediate posttest; in the final session (one week after session 3), the
learners took the delayed posttests, the working memory test, and the test of language
analytic ability.
Results revealed that implicit feedback had limited impact on low-level learners

in the learning of the perfective –le, but it was effective for high-level learners; implicit
feedback was effective for the learning of Chinese classifiers at both proficiency levels.
Explicit feedback was more effective than implicit feedback for low-level learners, but
the two types of feedback were equally effective for more advanced learners. At the high
proficiency level, the effects of implicit feedback were more sustainable than explicit
feedback in the learning of the perfective –le. It was also found that in general, the effects
related with classifiers were larger than the effects for the perfective –le and that EI tests
showed larger effects than GJTs. With regard to the interaction between feedback type,
the choice of target structure, and the two cognitive factors, language analytic ability
correlated with the effects of implicit feedback in the learning of classifiers and the
effects of explicit feedback in the learning of the perfective –le; working memory
correlated with the effects of explicit feedback in the learning of classifiers.
Interpretations for these results were sought from multiple perspectives and with
reference to previous feedback research.

Copyright by
Shaofeng Li
2010

To my family

v

ACKNOWLEDGEMENTS
This dissertation project has benefited from many individuals, to whom I would like
to extend my sincere gratitude. Dr. Susan Gass, chair of my advisory committee, has been
unwavering in her support throughout the duration of my study in the SLS program. Her
expertise, insights, inspirations, encouragement, and generous help are critical to my
academic growth and will have a life-long impact on my professional development. Dr.
Shawn Loewen provided invaluable comments on the research instruments as well as
other aspects of my dissertation study; I have greatly benefited from his expertise in
form-focused instruction and statistics. Dr. Paula Winke is a versatile and talented scholar,
and the way she conducts research and helps students has made her a role model for me
in terms of professorship. Dr. Patti Spinner’s expertise in theoretical linguistics has
allowed me to avoid pitfalls and stay on the right track in my search for the ideal target
structures for my study. Dr. Xiaoshi Li has always encouraged me and provided
emotional support whenever I experienced setbacks and difficulties. Her knowledge
about Chinese linguistics is indispensible to the successful completion of this project. In
addition to my advisory committee, I have received selfless assistance from Dr. Debra
Friedman, Dr. Debra Hardison, and Dr. Charlene Polio. Joan Reid, secretary of the SLS
program, has been very helpful with regard to the logistic aspects of my study.
My gratitude also goes to the Chinese instructors at Michigan State University and
the University of Michigan, who encouraged their students to participate in my study.
Among them, Shi Liren, Wang Qiongyao, Shi Taiheng, and Teng Chunhong are from
Michigan State University; Chen Qinghai, Liu Wei, Tang Le, Yin Haiqing, and Laura
Grande are from the University of Michigan. I would also like to thank all the

vi

participants of my study, without whom the study would have been impossible.
Another group of individuals I must thank are my colleagues in the SLS and TESOL
programs, who have helped me in various ways. They have provided me with either
emotional support or academic assistance. I feel fortunate to be able to have the
opportunity to complete my study together with such an excellent group of colleagues;
they make my life more enjoyable and my study experience more fruitful. These people
include Junkyu Lee, Luke Plonsky, Tomoko Okuno, Grace Lee Amuzie, Jennifer Behney,
Soo Hyon Kim, Fei Fei, Kimi Nakatsukasa, Yeon Heo, Allyssa Chamberlain, and Mariah
Shafer, to name only a few.
Last but by no means least, I am indebted to my wife Hong Wang and my daughter
Ye Li. They are always behind me, and their support, help, and understanding have been
a constant incentive for my study. My wife was directly involved in this project: She
served as a second coder and provided interrater reliability for the data analysis. I would
like to thank her for always being there to help.

vii

TABLE OF CONTENTS
LIST OF TABLES ............................................................................................................. x
LIST OF FIGURES ......................................................................................................... xii
CHAPTER 1 INTRODUCTION ....................................................................................... 1
CHAPTER 2 REVIEW OF THE LITERATURE ............................................................12
Corrective Feedback ................................................................................................. 12
Theoretical Background ................................................................................... 12
Taxonomy of Feedback .....................................................................................15
Recasts ..............................................................................................................18
Metalinguistic Feedback .................................................................................. 20
The Effectiveness of Corrective Feedback: Toward an Integrated Model ...... 22
Feedback, Proficiency, and the Target Structure ..................................................... 26
Proficiency and the Choice of Target Structure ............................................... 26
Chinese Perfective -le and Chinese Classifiers ................................................ 29
Chinese perfective –le .............................................................................. 29
Chinese classifiers .................................................................................... 38
Chinese perfective –le versus Chinese classifiers .................................... 42
Feedback, Language Analytic Ability, and Working Memory ................................ 49
Language Aptitude .......................................................................................... 49
Aptitude-Treatment Interaction ....................................................................... 51
Language Analytic Ability and Feedback ....................................................... 53
Working Memory and Feedback ..................................................................... 57
Research Questions .................................................................................................. 64
CHAPTER 3 METHOD .................................................................................................. 66
Participants and Grouping ........................................................................................ 66
Feedback Operationalization .................................................................................... 69
Implicit Feedback .......................................................................................…. 69
Explicit Feedback ............................................................................................ 72
Target Structures....................................................................................................... 77
Tasks ........................................................................................................................ 79
Treatment Tasks for Classifiers ….................................................................... 79
Treatment Tasks for the Perfective –le ............................................................. 83
Testing ...................................................................................................................... 86
Proficiency Test ............................................................................................... 88
Tests for Treatment Effects .............................................................................. 90
Tests of implicit and explicit knowledge ................................................. 90
Elicited imitation test ............................................................................... 93
Grammaticality judgment test .................................................................. 95
Validity and reliability of the GJT and EI tests........................................ 98
Test of Language Analytic Ability................................................................... 98
Test of Working Memory ................................................................................ 99

viii

Procedure ................................................................................................................ 101
Scoring and Coding ................................................................................................ 104
GJTs and EI tests ............................................................................................ 104
GJTs ....................................................................................................... 104
EI tests ................................................................................................... 109
Inter-coder reliability ............................................................................. 113
The Working Memory Test ........................................................................... 113
Analysis ................................................................................................................. 114
CHAPTER 4 RESULTS ................................................................................................ 118
Results on the Perfective –le .................................................................................. 118
GJT Results .................................................................................................... 118
EI Test Results ............................................................................................... 124
Summary of the Results on the Perfective –le ............................................... 128
Results on Classifiers ............................................................................................. 129
GJT Results .................................................................................................... 129
EI Test Results ............................................................................................... 131
Summary of the Results on Classifiers .......................................................... 138
The Perfective –le vs. Classifiers ........................................................................... 138
Results on Language Analytic Ability and Working Memory .............................. 140
CHAPTER 5 DISCUSSION .......................................................................................... 145
Implicit Feedback ................................................................................................... 147
The Perfective –le .......................................................................................... 147
Classifiers ....................................................................................................... 151
Explicit-Implicit Comparison ................................................................................ 154
The Effects of Target Structure and Testing .......................................................... 159
Feedback, Linguistic Structure, and Aptitude Components .................................. 165
Language Analytic Ability ............................................................................ 166
Working Memory ........................................................................................... 170
Language Analytic Ability vs. Working Memory ......................................... 174
Aptitude and Testing ...................................................................................... 177
CHAPTER 6 CONCLUSION ....................................................................................... 181
NOTES .......................................................................................................................... 188
APPENDICES ............................................................................................................... 191
REFERENCES .............................................................................................................. 196

ix

LIST OF TABLES
Table 1. Schematization of the semantic properties of verb types .................................. 31
Table 2. Schemes on structural difficulty ........................................................................ 48
Table 3. Descriptive statistics for groups ......................................................................... 68
Table 4. Chinese classifiers and the Chinese perfective –le ............................................ 78
Table 5. Measures and descriptive statistics..................................................................... 87
Table 6. An illustration of the HSK test .......................................................................... 89
Table 7. Procedure of the study ..................................................................................... 103
Table 8. Coding and scoring of GJTs ............................................................................ 106
Table 9. Additional criteria regarding GJT data on –le ................................................. 108
Table 10. Scoring of EI Data ......................................................................................... 111
Table 11. Tests of normality........................................................................................... 115
Table 12. Perfective –le: Descriptive statistics on GJT scores ...................................... 119
Table 13. Perfective –le: ANOVA results related to GJT scores .................................. 122
Table 14. Perfective –le: Descriptive statistics on GJT gain scores .............................. 123
Table 15. Perfective –le: Post hoc contrasts related to GJT scores ............................... 123
Table 16. Perfective –le: Descriptive statistics on EI test scores ...................................124
Table 17. Perfective –le: ANOVA results related to EI test scores ............................... 126
Table 18. Perfective –le: Descriptive statistics related to gain scores on EI tests ......... 127
Table 19. Perfective –le: Post hoc contrasts related to EI test scores ............................ 128
Table 20. Classifiers: Descriptive statistics on GJT scores ........................................... 130
Table 21. Classifiers: ANOVA results related to GJT scores ........................................132
Table 22. Classifiers: Descriptive statistics on GJT gain scores ................................... 133

x

Table 23. Classifiers: Post hoc contrasts related to GJT scores ..................................... 133
Table 24. Classifiers: Descriptive statistics on EI test scores ........................................ 134
Table 25. Classifiers: ANOVA results related to GJT scores ........................................ 136
Table 26. Classifiers: Descriptive statistics related to gain scores on EI tests .............. 137
Table 27. Classifiers: Post hoc contrasts related to EI test scores ................................. 137
Table 28. Effect sizes associated with perfective –le and classifiers .............................140
Table 29. Effects of Feedback Shown on Different Tests ............................................. 140
Table 30. Scores of language analytic ability ................................................................ 141
Table 31. Raw scores of working memory .................................................................... 142
th

Table 32. Descriptive statistics for 4 semester learners: Gain scores ..........................142
Table 33. Feedback, aptitude, and the target structure: Correlation results................... 144
Table C-1 Perfective –le: Descriptive statistics related to raw scores ........................... 194
Table D-1 Classifiers: Descriptive statistics related to raw scores ................................ 195

xi

LIST OF FIGURES
Figure 1. Taxonomy of feedback ......................................................................................17
Figure 2. An integrated model of corrective feedback ..………………...……………... 25
Figure 3. An illustration of the one-le and two-le controversy ........................................ 33
Figure 4. Perfective –le: GJT score changes .................................................................. 119
Figure 5. Perfective –le: EI score changes ..................................................................... 125
Figure 6. Classifiers: GJT score changes ....................................................................... 130
Figure 7. Classifiers: EI score changes .......................................................................... 134

xii

CHAPTER 1 INTRODUCTION
This study is conducted in response to Ellis and Sheen’s call for investigating
variables constraining the effectiveness of corrective feedback (2006) and to the research
gaps identified in recent meta-analyses related to the effectiveness of corrective feedback
(Li, 2010; Lyster & Saito, 2010; Mackey & Goo, 2007; Russell & Spada, 2006). Ellis and
Sheen conducted a comprehensive review on previous research on recasts and pointed out
that notwithstanding the abundance of research into the effectiveness of recasts, it
remains to be seen how the efficacy of this corrective strategy is affected by variables
such as the target structure and learners’ individual differences. Corroborating Ellis and
Sheen’s statements, quantitative research syntheses by Li, Lyster and Saito, and Mackey
and Goo showed that the effects of corrective feedback are constrained by both learnerinternal and learner-external factors. Li’s analysis also identified some gaps to be filled in
feedback research including the need to examine how feedback facilitates the learning of
non-Indo-European languages such as Chinese. This study seeks to address such issues
and aims to answer the question of whether the effects of implicit and explicit feedback
(operationalized as recasts and metalinguistic correction respectively) are mediated by
learners’ proficiency, the choice of target structure, and two components of language
aptitude—grammatical sensitivity and working memory.
An Overview of Feedback Research in SLA
Corrective feedback in SLA takes the form of responses to learners’ erroneous
utterances. The responses may indicate that an error has been made, provide the correct
linguistic form, supply metalinguistic information about the nature of the error, or contain
any combination of these moves (Ellis, Loewen, & Erlam, 2006). There has been

1

controversy in the field of SLA as to whether corrective feedback plays a facilitative role
in developing the learner’s interlanguage. The anti-feedback camp (Krashen, 1981;
Schwartz, 1993) takes a nativist approach to SLA and identifies L2 acquisition with L1
acquisition. They contend that children learn their first language through exposure to
available input and make linguistic generalizations and computations using an inherently
built-in Language Acquisition Device; adults learn a second language in the same manner.
The L1-L2 equation leads to the argument that L2 acquisition is realized through mere
exposure to input in the form of positive evidence (what is acceptable in the target
language). Negative evidence (what is unacceptable in the target language) afforded by
corrective feedback does not lead to linguistic competence.
Another group of researchers (DeKeyser, 1993; Ellis, 2008; Gass, 1997, 2003;
Loewen & Philp, 2006; Long, 1996, 2007; Lyster & Ranta, 1997; Mackey, 2007; Sheen,
2010) voiced their opposition and pointed out that unlike L1 acquisition, adult L2
acquisition requires both positive and negative evidence. Positive evidence is available in
the form of a “set of well-formed sentences to which learners are exposed” (Gass, 1997, p.
36). Negative evidence can be provided preemptively through rule-explanation or
reactively through feedback to the learner’s erroneous utterance. These scholars went on
to argue that an optimal condition for L2 learning is negotiated interaction (between
learners or between a learner and a language expert such as a native speaker) where the
learner notices the gap between his/her erroneous production and the target form and
makes subsequent interlanguage modifications. This argument constitutes the core of the
Interaction Hypothesis (Gass, 1997; Long, 1996; Mackey, 2007) and lays the ground for
the investigation of the effects of feedback. Interactional feedback affords opportunities

2

for both positive and negative evidence, noticing, and pushed output, all of which are
essential to L2 development.
To resolve the controversy over the usefulness of corrective feedback, researchers
conducted numerous studies. Studies conducted in laboratory as well as classroom
settings (Ammar & Spada, 2006; Han, 2002; Li, 2009; Long, Inagaki, & Ortega, 1998;
Lyster, 2004; Mackey & Philp, 1998; Sheen, 2008) have shown that corrective feedback
is facilitative to L2 acquisition. Recently, several meta-analyses (Li, 2010; Lyster & Saito,
2010; Mackey & Goo, 2007; Norris & Ortega, 2000; Russell & Spada, 2006) have been
conducted on the empirical studies examining the effectiveness of corrective feedback
and they all showed that feedback, be it oral or written, does benefit L2 learning. These
findings undermine the nativist argument against the utility of feedback in L2 instruction.
Focuses of This Study
Corrective Feedback in L2 Chinese
Li’s meta-analysis (2010) shows that the effectiveness of corrective feedback varies
across different L2s. It is found that English, Spanish, and French are the most frequently
investigated target languages in feedback-related research, and L2 Spanish studies
showed larger effects than studies involving other target languages. Among the 33
retrieved primary studies in the meta-analysis, there is only one study (unpublished
dissertation; Chen, 1996) that examines L2 Chinese. Despite the fact that some
interesting findings were obtained, some methodological issues rendered the results less
robust. For instance, the study did not include a pretest, and it had a small sample size.
After the cut-off date for data collection of the meta-analysis, one study (Li, 2009) was
published on how feedback enhanced the learning of Chinese by L1 English speakers.

3

However, due to the small sample size and failure to include a control group, the
generalizability of the results is limited. Therefore, further empirical studies on how
corrective feedback fares in L2 Chinese learning is warranted; such studies will enrich
and complement interaction-driven SLA research and contribute to the understanding of
L2 Chinese learning.
Feedback and Proficiency
Most feedback studies have only examined the effectiveness of one or more
feedback types without taking learners’ proficiency level into consideration. Mackey and
Philp (1998) and Ammar and Spada (2006) are the only studies that investigated the role
of developmental readiness in affecting the effects of corrective feedback. Mackey and
Philp found that more advanced learners or learners with better mastery of English
question formation benefited more from recasts in learning the target structure. Ammar
and Spada found that learners with less previous knowledge about the English possessive
determiners his and her benefited more from prompts; for students with more previous
knowledge about the structure, prompts and recasts worked equally well.
In both studies, developmental readiness refers to learners’ previous knowledge about
the target structure. However, it is speculated that learners’ general proficiency of the
target language may also impact the effectiveness of feedback. Learners with greater
proficiency have more attentional resources at their discretion and might therefore benefit
more from corrective feedback (Li, 2009). More importantly, as with Ammar and
Spada’s findings about the interaction between feedback types and learners’ previous
knowledge about the target structure, different types of feedback may have differential
effects on learners at different proficiency levels. Therefore, it would seem misleading

4

and arbitrary to make claims about the effectiveness of certain feedback types per se as
each feedback type possesses characteristics that may work for learners at one
proficiency level but not another. This study will investigate how the effects of implicit
and explicit feedback (recasts and metalinguistic correction) are affected by learners’
general L2 proficiency.
Linguistic Structure
There has been empirical evidence that corrective feedback worked differently for
different linguistic structures (Ellis & Sheen, 2006). For instance, in Havranek and
Cesnik’s study (2001), feedback worked better for verb inflections and auxiliary use than
for prepositions and tense choice; in Ishida’s study (2004), recasts were more beneficial
to the learning of the resultative meaning of the perfective form –te i-(ru) than to the
learning of the progressive meaning of this structure. However, most of these studies did
not examine the nature of linguistic structure as an independent variable.
One study that singled out linguistic structure as an independent variable affecting the
effectiveness of corrective feedback is by R. Ellis (2007), who examined the effects of
two feedback types (recasts and metalinguistic feedback) on the learning of two
structures: the past form –ed and the comparative forms in English. He found that recasts
did not differentially affect the learning of the two structures, but metalinguistic feedback
did: it worked better for the comparative than for the past form. This, according to Ellis,
was attributable to the less metalinguistic knowledge the learners had about the
comparative than the past form prior to the treatment, which left more room for the
increase of metalinguistic knowledge.
In light of the lack of research on the interface between feedback types and different

5

linguistic structures, this study includes two target structures: Chinese classifiers and
perfective aspect marker -le, to ascertain whether implicit feedback and explicit feedback
have differential effects on the learning of the two structures by learners at two
proficiency levels.
Feedback and Language Aptitude
Ellis and Sheen (2006) pointed out that there has also been a paucity of research on
how individual difference variables affect the effectiveness of corrective feedback.
Among the individual difference variables, aptitude has been shown to be a strong
predictor of second language achievements (Dörnyei, 2005). For instance, Oxford (1995)
found that among all the individual difference variables she examined, aptitude had the
strongest correlation with L2 proficiency. This study will investigate if the effects of
implicit and explicit feedback are mediated by learners’ language aptitude.
Aptitude-treatment interaction. Early language aptitude testing lays emphasis on its
predictive function: It provides information about an individual’s likelihood of success or
rate of progress in attaining L2 proficiency. Traditionally, language aptitude is viewed as
being fixed and is independent of learning conditions or teaching methods (Carroll, 1973,
1993). However, some researchers (Snow, 1994; Segalowitz, 1997) argue that aptitude
should not be considered a static characteristic. Rather, it is situated in complex, dynamic,
and communicative learning environments that have different processing demands on the
learner’s cognitive abilities. Robinson (2001) pointed out that the information-processing
demands of different learning conditions might facilitate or inhibit learners’ cognitive
abilities. Similarly, Snow (1987, 1991) advanced the aptitude-treatment interaction
hypothesis and suggested a link between aptitude and learning conditions.

6

Following this line of thinking, researchers have investigated how L2 learners’
language aptitude interacted with different input conditions or instruction methods.
Wesche (1981) found that learners with high analytic ability achieved more when they
were exposed to an analytic teaching approach and that those with good memory and
auditory abilities did better under a memory-based functional approach. Robinson (1997)
examined the correlation between aptitude as measured by the MLAT (Carroll & Sapon,
1959, 2002) and different learning conditions: incidental, implicit, and explicit. It was
found that aptitude correlated with the implicit and explicit conditions but not with the
incidental condition. However, in a later study (2002) where he used another set of tests,
aptitude correlated with incidental learning. Erlam (2005) found that deductive
instruction that gave students opportunities for output minimizes the effect of aptitude
variation on learning outcome and that learners with higher language analytic ability and
greater working memory capacity benefited the most from instruction that focused on
input and that did not require them to engage in language production.
Based on the above arguments and research findings, there is reason to believe that
aptitude would correlate differently with the effects of implicit and explicit feedback, two
very different learning conditions that supposedly implicate different cognitive processes.
Also, aptitude might affect the learning of different linguistic structures as they may set
different processing demands on learners. To date, there has been no research that
examines the interface between aptitude, feedback, and the nature of the linguistic
structure.
Language analytic ability and working memory as aptitude components. L2 aptitude
researchers mostly use Carroll and Sapon’s aptitude battery (1959, 2002), the MLAT,

7

which consists of five parts examining four constituent abilities of language aptitude:
phonetic coding ability, grammatical sensitivity, rote learning ability, and inductive
language learning ability. Accordingly, a composite score based on these four parts of the
battery is usually used to gauge learners’ language aptitude. However, it is suggested that
separate components of aptitude should be examined as they relate to different stages of
SLA and are sensitive to different learning conditions (Robinson, 2002; Skehan, 2002).
Dörnyei and Skehan (2003) created a scheme that illustrates the aptitude constructs
involved in different stages of SLA. For instance, phonemic coding ability is required in
the “noticing” stage, and grammatical sensitivity is involved in the “pattern
identification” stage.
This study examines the relationship between two feedback types (implicit and
explicit) and two aptitude components: language analytic ability and working memory.
Language analytic ability, or grammatical sensitivity, is measured by the ‘Words in
Sentences’ subtest of the MLAT. In general, it has been found that language analytic
ability has a greater role in classroom learning contexts than in naturalistic settings
(Reves, 1983), that adult learners benefit more from analytic ability than child learners
(DeKeyser, 2000; Harley & Hart, 2002; Rose, Yoshinaga, & Sasaki, 2002), and that it
affects learners in explicit conditions more than learners in implicit conditions (Robinson,
1997).
There have been two studies that examined the interface between corrective feedback
and language analytic ability. DeKeyser (1993) found that students with high language
aptitude benefited the most from error correction. Sheen (2007a) questioned whether the
effectiveness of recasts and metalinguistic correction in the learning of English articles

8

was mediated by learners’ language analytic ability. She found that learners with higher
analytic ability benefited more from metalinguistic feedback, but the performance of the
recast group did not correlate with language analytic ability.
Among all the aptitude components included in the MLAT battery, the most studied
is memory (Skehan, 2002). The MLAT was developed in the context of audiolingual
teaching which relies heavily on rote memorization, hence the ‘Paired Associates’ part
that measures associative memory. However, with the advent of communicative language
teaching, which requires the learner to attend to form and meaning simultaneously, the
predictive power of the memory part of the MLAT has been called into question. Instead
of associative memory, working memory has been claimed to have better construct
validity and is more predictive of language learning outcome (Robinson, 2002).
Working memory “involves the temporary storage and manipulation of information
that is assumed to be necessary for a wide range of complex cognitive activities”
(Baddeley, 2003, p.189). Miyake and Friedman stated that working memory is “one (if
not the) central component of language aptitude” (1998, p.340). Working memory has
been measured by means of digit- or word-span tests, where learners are required to
repeat a sequence of digits, words, or syllables; it has also been thorough (reading or
listening) sentence span tests that tax the processing as well as storage components (Juffs,
2004). It has been found that results based on sentence span tests are a better indicator of
working memory than results based on digit- or word-span tests (Daneman & Carpenter,
1980; Harrington & Sawyer, 1992; Waters & Caplan, 1996).
With regard to working memory in SLA, it has been found that L2 working memory
capacity as measured by reading and/or listening span tasks predict reading and listening

9

comprehension abilities (Osaka & Osaka, 1992; Harrington & Sawyer, 1992; Miyake &
Friedman, 1998). Mackey, Philp, Egi, Fujii, and Tatsumi (2002) examined how working
memory affected noticing and L2 development as a result of the provision of recasts to
learners’ erroneous production of English question formation. They found a positive
correlation between working memory and noticing. In terms of L2 development, learners
with low working memory capacity showed initial improvement and those with high
working memory scores achieved more in delayed posttests. Mackey, Adams, Stafford,
and Winke (2010) found that working memory is a strong predictor of modified output in
dyadic L2 interaction.
To sum up, different feedback types, because of their unique characteristics, set
different processing demands and involve different cognitive processes. Therefore, the
effects of different types of feedback are likely to interact differently with different
cognitive factors involved in SLA such as analytic ability and working memory.
However, these factors have been under studied in feedback research and warrant further
investigation (Ellis & Sheen, 2006).
In sum, notwithstanding a plethora of research on corrective feedback, it remains to
be seen how factors such as proficiency, the target structure, and individual difference
variables such as language analytic ability and working memory mediate the effects of
feedback. In addition, second language Chinese learning is an understudied area (at least
in terms of how feedback impacts the learning of this language), which is in
disproportion with the rapid growth of the number of L2 Chinese learners. The need to
address the research gaps in feedback research and the lack of L2 Chinese studies
necessitates and justifies this study, which probes into how the included variables

10

contribute, jointly and independently, to the effects of feedback in the learning of a
language that is typologically distinct from alphabetic languages such as English or
Spanish.
This dissertation report has the following layout. The next chapter, Chapter 2,
consists of a review of the literature related to the variables included. Chapter 3 reports
on the research methodology of the study including the bio-data of the participants and
information on the testing materials, treatment tasks, procedure, data coding, and
analyses to be performed. The obtained results appear in Chapter 4, followed by Chapter
5, where the results are discussed and interpretations are sought with reference to
previous research and SLA theories. The final chapter, Chapter 6, draws conclusions.

11

CHAPTER 2 REVIEW OF THE LITERATURE
This chapter provides an overview of previous research on the variables and
constructs examined in this study and establishes the rationale underlying the current
investigation. The research areas to be reviewed include corrective feedback, the relation
of two aptitude components, language analytic ability and working memory, to corrective
feedback, and the two target structures included in this study: Chinese classifiers and the
Chinese perfective –le.
Corrective Feedback
Theoretical Background
Corrective feedback in SLA refers to the response a learner receives to his/her
erroneous utterance in the target language, and the response, whether it is from a native
speaker or nonnative speaker, is intended to correct the nontargetlike use of a particular
linguistic structure. A distinction should be made between corrective feedback and
feedback—whereas the former is corrective in nature and is often approached from a
pedagogical point of view, the latter is an umbrella term that refers to any response
following an erroneous utterance, regardless of whether it is intended to corrective or not.
For instance, in either classroom or naturalistic conversations, it is by no means rare that
a response occurs following a flawed utterance as a result of the failure to understand the
message, in which case the response is a communication move that is not intended to be
corrective despite the possibility that the nonnative speaker may perceive it to be.
Therefore, corrective feedback in this study is approached from the interlocutor’s
perspective, that is, its purpose is for the nonnative speaker to be aware that (part of) an
L2 utterance deviates from the correct form and/or to modify that utterance based on the

12

positive and/or negative evidence contained in the feedback.
Whether corrective feedback is useful for second language development is
essentially a question of what type of input is necessary for learning to happen. The
learner has access to two types of input (Gass, 1997): positive evidence and negative
evidence. Positive evidence refers to what is acceptable in the target language and
negative evidence informs learners of what is unacceptable. Corrective feedback contains
negative evidence (although some feedback types might also contain positive evidence).
Therefore, to acknowledge the role of corrective feedback is to endorse the value of
negative evidence in second language learning. While the importance of input is
recognized in all language learning theories, researchers and theorists are divided on
whether input in the form of positive evidence is sufficient or both positive evidence and
negative evidence are necessary.
The nativists (Krashen, 1995; Schwartz, 1993) insist that language is acquired as an
abstract system of mental representation that is realized through a language acquisition
device (Universal Grammar [UG]) that is inherent to human beings. It is argued that
adults learn a second language in the same way as children learn their first language:
Exposure to positive evidence is sufficient and no negative evidence is necessary.
Therefore, any attempt to draw the learner’s attention to linguistic forms, by either
preemptive rule explanation or corrective feedback following learners’ errors, is futile
and should be avoided. As Krashen (1995) pointed out, “a safe procedure is simply to
eliminate error correction entirely” (p. 76). An immersion class that is based on this
model would be one where students read and listen to materials in the target language or
learn the language through the subject matter and where the instructor does not address

13

linguistic forms or give any feedback to students’ errors.
The interactionists (Gass, 1997; Long, 1996; Pica, 1988) believe that adults learn a
second language differently from the way children learn their first language. In their view,
both positive evidence and negative evidence are necessary, and hence the need to attend
to linguistic form. In fact, the term “form-focused instruction” (FFI) is created in
response to or contrast with meaning-based instruction that suppresses any attention to
form (Ellis, 2001; Doughty & Williams, 1998; Spada, 1997). FFI refers to any attempt to
draw the learner’s attention to linguistic form (Spada, 1997). While there are various
options of FFI (Loewen, 2005), one optimal condition in FFI, according to the Interaction
Hypothesis (Gass, 2004; Long, 2007), is negotiated interaction where the learner notices
the gap (such as through corrective feedback) between his/her wrong L2 production and
the target form and makes subsequent modifications to his/her interlanguage. As Long
(1996, p.414) stated:
It is proposed that environmental contributions to acquisition are mediated by
selective attention and the learner’s developing L2 processing capacity, and that
these resources are brought together most usefully, although not exclusively, during
negotiation for meaning [emphasis original]. Negative feedback [emphasis added]
obtained during negotiation work or elsewhere may be facilitative of L2
development, at least for vocabulary, morphology, and language-specific syntax,
and essential for learning certain specifiable L1-L2 contrasts.
The usefulness of corrective feedback is also backed up by other SLA theories.
According to Schmidt’s Noticing Hypothesis (1990, 2001), unlike first language
acquisition, second language acquisition is conscious. Schmidt stated that “subliminal

14

language learning is impossible…[and] noticing is the necessary and sufficient condition
for converting input to intake” (p. 129). Corrective feedback contributes to the noticing of
linguistic form. Another benefit of corrective feedback is the learner’s responses
following feedback (referred to as “uptake”) (Loewen, 2004). Learner uptake is one form
of output, which, according to Swain (1995, 2005), has three functions:
noticing/triggering, hypothesis testing, and metalinguistic reflection. The effect of
corrective feedback is also grounded in Socio-Cultural Theory, which holds that
corrective feedback serves as a form of regulation in the zone of proximal development
that can “be appropriated by learners to modify their interlanguage systems” (Aljaafreh &
Lantolf, 1994, p. 480). Recently, the role of corrective feedback has been associated with
skill acquisition theory (DeKeyser, 2007, 2008; Ellis, 2010; Lyster & Iquierdo, 2009),
according to which L2 acquisition involves the transition from declarative knowledge to
procedural knowledge and ultimately to automatic knowledge. And corrective feedback
affords practice opportunities that contribute to this transition.
Taxonomy of Feedback
Empirical research on corrective feedback has mushroomed since the 1990s when
the theoretical rationale had been established for its role in SLA. Early feedback research
was conducted from the perspective of interaction, that is, how negotiated interaction
where feedback is embedded facilitates second language development (Gass & Varonis,
1994; Mackey, 1999; Polio & Gass, 1998). Though feedback was not teased out as an
independent variable in interaction studies, they supplied empirical evidence for the
usefulness of feedback and provided impetus for subsequent feedback research because
to a large extent, negotiated interaction contributes to L2 learning due to the presence of

15

feedback.
In terms of how feedback types are categorized, there are two schemes. Lyster and
his colleagues (Lyster, 1998, 2001; Lyster & Ranta, 1997) conducted extensive research
on the occurrence of corrective feedback in some French immersion classes in Canada
and identified seven types of feedback: recasts, elicitation, clarification, metalinguistic
comments, repetition, and explicit correction. These seven types of feedback are further
divided according to whether they encourage learner repair: Elicitation, clarification,
metalinguistic comments, and repetition are collectively called prompts; recasts and
explicit correction provide the correct form and therefore lead to less learner repair.
Sheen (2010) and Ellis (2010) made a similar distinction by pointing out that feedback
can be input-providing (recasts and explicit correction) and output-prompting (prompts).
In the other scheme, feedback is classified as implicit or explicit, depending on whether a
feedback type explicitly draws the learner’s attention to linguistic form (DeKeyser, 1993;
Ellis, Loewen, & Erlam, 2006). Following this scheme, recasts, clarification, elicitation,
and repetition are implicit; explicit correction and metalinguistic feedback are explicit
(Lyster, 1998; Li, 2010).
While both categorization schemes are reasonable in their own right, they might
have their respective problems. The implicit-explicit dichotomy is undermined by the fact
that some implicit feedback types such as recasts can be explicit (e.g., Doughty & Valera,
1998). The taxonomy of feedback based on how much repair is generated masks the
explicitness of feedback. For instance, in the “prompts” category, metalinguistic feedback
and elicitation are explicit, but clarification and repetition are implicit. Another problem
with prompts is that all four types of feedback are placed under this umbrella category,

16

which makes one question the extent to which it is reasonable to compare multiple
corrective moves with a single move such as recasts (e.g., Lyster, 2004; Ammar & Spada,
2006). It is not easy to find a solution to the controversy or find a perfect way to
categorize feedback types. Probably the best researchers can do is to maximize the
implicit-explicit contrast when implicitness/explicitness is a key variable, and to interpret
the differential effects of prompts and recasts based on the different cognitive processes
involved as well as the amount of generated repair when these two feedback types are
investigated (Yang & Lyster, 2010).
The relationship between the two categorization schemes (explicit vs. implicit and
input-providing vs. output-prompting) is illustrated in Figure 1 (also see Loewen & Nabei,
2007). However, it is evident that the implicitness or explicitness of feedback stands in a
continuum and is contingent upon many factors. Therefore, the position of a certain
feedback move as illustrated does not necessarily indicate it is more or less implicit or
explicit than the feedback type next to it. Note that in Figure 1, metalinguistic correction
(metalinguistic clue + correct form) is added to the list of feedback types identified by
Lyster and Ranta (1997). Metalinguistic correction has been investigated in previous
research (Sheen, 2007a) and it is also one type of feedback included in this study.
Implicit Clarification request
Recasts

Input-providing

Elicitation
Metalinguistic clue
Explicit correction
Explicit

Metalinguistic correction

Output-prompting

Figure 1. Taxonomy of feedback

17

Recasts
This study investigates the effects of two major feedback types: implicit feedback in
the form of recasts and explicit feedback operationalized as metalinguistic feedback.
Recasts refer to partial or full reformulation of the learner’s erroneous L2 utterance.
Among the various corrective strategies, the recast is the most studied, which is not
surprising given its high frequency in the classroom as well as the sound theoretical
justification for its usefulness. Classroom descriptive studies (Lyster, 1998, 2001; Lyster
& Mori, 2006; Lyster & Ranta, 1997; Sheen, 2004; Sheen, 2006) showed that the recast
was the most frequent feedback type in all instructional settings including immersion
classes and classes of ESL and EFL. Long (1996, 2007) argued that the recast is optimal
for form-focused instruction because it addresses linguistic forms when the primary focus
is on meaning; it shifts the learner’ attention away for a brief focus on form and
juxtaposes the erroneous form with the target form, allowing for a cognitive comparison
and priming the learner to notice the difference between the two. Also, the recast makes
both types of input, positive evidence and negative evidence, available to the learner.
This makes it possible for the learner to retrieve and rehearse pre-existing linguistic
knowledge or benefit from the exposure to a provided language exemplar if the target
form is unavailable or fails to be retrieved from the interlanguage repertoire.
Recasts have been shown to be effective in laboratory studies (Carroll & Swain,
1993; Egi, 2007; Ishida, 2004; Long, Inagaki & Ortega, 1998; Iwashita, 2003; Leeman,
2003; Li, 2009; Mackey & Philp, 1998; Lyster & Izquierdo, 2009; McDonough, 2007;
Sagarra, 2007). These studies are typically carried out in dyadic interaction (except for
Sagarra’s study where feedback was provided through the computer) where learners

18

received intensive recasts on a single structure. Methodological features such as the lab
setting, provision of feedback on a one-on-one basis, and targeting one structure might
have made recasts relatively salient and therefore benefited L2 development. One might
argue that the generalizability of laboratory findings to classroom contexts is
questionable, which is to some degree legitimate. However, in laboratory studies,
variables can be easily teased out and better controlled, distractions are minimized, and
the obtained results might therefore be more reliable.
Quasi-experimental studies where students received feedback as a class or group
showed that recasts were less effective than more explicit feedback types such as prompts
and metalinguistic feedback (Ammar & Spada, 2006; Ellis, 2007; Ellis et al., 2006;
Lyster, 2004; Sheen, 2007; Yang & Lyster, 2010). However, Han (2002) and Doughty
and Varela (1998) showed that recasts can be very effective when targeting multiple
learners if the feedback was intensive, targeted a single structure, and was made salient.
In Han’s study, learners received treatment in 11 sessions on the learning of past tense
consistency. In Doughty and Varela’s study, recasts were operationalized as repetition of
the learner’s nontargetlike utterance with a rising tone followed by a recast.
There has also been research on the level of uptake recasts generate and the
characteristics of recasts that affect uptake (Ellis, Basturkmen, & Loewen, 2001; Loewen,
2004; Loewen & Philp, 2006; Lyster, 1998, 2001; Lyster & Mori, 2006; Lyster & Ranta,
1997; Panova & Lyster, 2002; Sheen, 2004, 2006). Uptake refers to the learner’s
response following feedback. Taken together, these studies showed that recasts led to
more uptake in language programs than in immersion programs, and that uptake and
successfulness of uptake related to the characteristics of recasts and the characteristics of

19

the form-focused episode that contains the recast. Also, uptake might relate to the nature
of the target structure. For instance, recasting lexical or phonological errors might
generate more uptake than recasting morphosyntactic errors.
Metalinguistic Feedback
The other feedback type this study investigates is meta-linguistic feedback. Metalinguistic feedback takes two forms: It may refer to comments on the well-formedness of
the learner’s L2 production (metalinguistic comments/clues) (Carroll & Swain, 1993;
Ellis et al., 2006) or to the provision of the correct form followed by metalinguistic
comments (Li, 2009; Sheen, 2007). Some researchers (e.g., Ellis et al., 2006) opted for
the former operationalization probably because the nature of the target structure is such
that the provision of some metalinguistic comments was sufficient for the learner to
retrieve and/or internalize the rule and make a correction (as in “You need past tense
here”). Other researchers chose the latter operationalization probably because the target
structure has some variants and providing some comments alone may not lead to the
modification of the wrong form (such as in Li’s study where the target structure was
classifiers and a comment such as “You used a wrong classifier” was unlikely to make
the learner use the correct classifier if it is not part of their interlanguage). Sheen (2007)
justified her decision to combine metalinguistic comments with the provision of the
correct form by claiming that it is more effective than supplying metalinguistic comments
alone. Further support for Sheen’s operationalization comes from Ellis (2007), who
suggested the principle of “bias for best”, that is, operationalizing a feedback type to
maximize its potential effect. There is also empirical evidence that a brief metalinguistic
comment may not lead to any linguistic development (Loewen & Nabei, 2007).

20

Researchers have studied the effectiveness of metalinguistic feedback as compared
with recasts and/or other implicit feedback types (Carroll & Swain, 1993; R. Ellis et al.,
2006; Loewen & Nabei, 2007, Sheen, 2007). Carroll and Swain (1993) investigated the
effects of four feedback types—metalinguistic feedback, explicit hypothesis rejection,
explicit utterance rejection, and recasts—on 100 ESL learners in the learning of English
dative alternation. They found that metalinguistic feedback worked better than all other
included feedback types. Kim and Mathes (2001) replicated Carroll and Swain’s study
but only included metalinguistic feedback and recasts in the replication. They failed to
find any differences between the two feedback types, which might be attributable to the
small sample size of the study (n = 10 in each group vs. n = 20 in each group in Carroll &
Swain’s study). Ellis et al. (2006) examined the differential effects of metalinguistic
feedback and recasts on the learning of past tense –ed by 34 low-intermediate ESL
learners. A superior effect was found for metalinguistic feedback over recasts and overall
feedback contributed more to learners’ implicit knowledge than their explicit knowledge.
Loewen and Nabei’s study (2007) included three feedback types: metalinguistic feedback,
recasts, and clarification. Participants were two intact classes of Japanese EFL learners
and the target structure was the English question formation. Results revealed that the
learners only showed improvement on the timed grammaticality judgment test (the other
measures are untimed grammaticality judgment test and oral production test) and that no
differences were found between the experiment groups. The failure to find a superior
effect for metalinguistic feedback was attributed to the insufficient metalinguistic
information provided to the learner given the complex nature of the target structure.
Sheen (2007a) studied the effects of metalinguistic feedback and recasts on the learning

21

of English article use by 80 ESL learners. A significant effect was found for
metalinguistic feedback, but not for recasts.
Studies following Lyster’s taxonomy of feedback (1998, 2001) classified
metalinguistic feedback (provision of metalinguistic information) as a prompt (other
prompts include clarification, elicitation, and repetition) and investigated prompts as one
type of feedback in comparison with recasts. In general, these studies (Ammar & Spada,
2006; Lyster, 2004; Yang & Lyster, 2010) showed that prompts were more effective than
recasts. Since metalinguistic feedback was conflated with other feedback types as
prompts in these studies, it is difficult to know the extent to which it contributed to
learning as a single corrective move.
The Effectiveness of Corrective Feedback: Toward an Integrated Model
There has been increasing evidence that the effectiveness of corrective feedback is
subject to multiple factors and therefore should not be approached from the perspective
of the properties of feedback per se. The accumulation of empirical research has made it
possible for several meta-analyses to be conducted on how the role of feedback in SLA is
constrained by various learner-internal and learner-external factors. Russell and Spada’s
meta-analysis showed that oral feedback was more effective than written feedback (2006).
Lyster and Saito (2010) meta-analyzed 15 classroom-based studies and found that
prompts worked better than recasts and feedback showed larger effects on oral production
tests. The meta-analysts also found that younger learners benefited more from the effects
of feedback.
Li (2010) included both published studies (n = 22) and Ph.D. dissertations (n = 11)
in his meta-analysis and found that explicit feedback showed larger short-term effects but

22

the effects of implicit feedback were better retained. More importantly, the meta-analysis
identified multiple factors mediating the effects of feedback. Specifically, studies
conducted in foreign language contexts showed larger effects than studies conducted in
second language contexts; lab-based studies shower larger effects than classroom-based
studies; feedback provided in mechanical drills yielded a larger effect than feedback
provided in communicative tasks; and similar to what Lyster and Saito found, feedback
showed larger effects on free production tests (such as oral production) than on
constrained production tests (such as grammaticality judgment test). The study also
demonstrated a possible effect of interlocutor type (native speaker vs. nonnative speaker),
mode of delivery (face-to-face vs. virtual), duration/intensity of treatment (short vs. long),
and cross-linguistic differences on the effects of feedback.
Aside from the factors mentioned above, there has been evidence that the effects of
corrective feedback are mediated by learners’ proficiency (Mackey & Philp, 1998;
Ammar & Spada, 2006), nature of the target structure (Ellis, 2007; Yang & Lyster, 2010),
noticing (Egi, 2007; Philp, 2003) and individual learner differences (DeKeyser, 1993;
Mackey, Philp, Egi, Fujii, & Tatsumi, 2002; Sheen, 2007, 2008). Narrative reviews by
Nicholas, Lightbown, and Spada (2001) and Ellis and Sheen (2006) provided
comprehensive and in-depth discussion of feedback related studies and the constructs
involved in feedback research. These scholars also called for an integrated approach to
the investigation of corrective feedback.
Based on the findings of quantitative and narrative syntheses on feedback research
as well as those of primary research, I propose an integrated and interactive model on the
constructs and variables underlying the effectiveness of corrective feedback (Figure 2).

23

This model recognizes the independent and joint effects of various factors affecting the
role of feedback in L2 learning. These factors relate to the characteristics of feedback
proper, linguistic properties of the target structure and target language, the context in
which feedback is supplied, and learner differences. Acknowledging the interaction
between these factors has two implications: One is to alert researchers to the possibility
and necessity of interpreting the obtained results with reference to other relevant
variables when a single variable is examined; the other is to prompt researchers to
investigate the interaction effects of multiple variables.

24

Nature of Feedback
 explicitness/implicitness
 input-providing vs.
output-prompting
 evidence
 …
Linguistic Factors
 target structure
 cross-linguistic
influence
 linguistic knowledge:
explicit vs. implicit
 …

Effects of Feedback

Contextual Factors
 instructional context
 interlocutor
 research setting
 duration of treatment
 …

Figure 2. An integrated model of corrective feedback

25

Learner Factors
 age
 cognitive factors
 anxiety
 motivation
 …

Feedback, Proficiency, and the Target Structure
Proficiency and the Choice of Target Structure
Among the various variables that potentially mediate the effects of feedback, one
that needs further investigation is learner’s proficiency level. Philp (2003) found that
more advanced learners were more likely to notice the reformulation of their wrong L2
production. Mackey and Philp (1998) found that ESL learners who were more
developmentally ready benefited more from recasts in learning English question
formation. Ammar and Spada (2006) investigated the differential effects of prompts and
recasts on the learning of third-person possessive determiners in English (his/her) by 64
Francophone students. The study also examined whether students who scored higher on
the pretests benefited more from feedback. It was found that lower-level learners
benefited more from prompts but higher-level learners benefited equally from prompts
and recasts. These studies showed that the effectiveness of feedback related to the
learner’s previous knowledge about the target structure.
While the effects of feedback have been shown to be mediated by how much the
learner already knows about the target structure, the learner’s general proficiency in the
target language may also play a role. To date, there has been only one study that included
the learner’s proficiency level as a variable. Li (2009) investigated how recasts and
metalinguistic feedback facilitated the learning of Chinese classifiers. The participants
were 23 students from second- and fourth-year Chinese classes at a U.S. university. It
was found that metalingistic feedback was more beneficial to the second-year students,
but there was no significant difference between the two feedback types as far as the
fourth-year students were concerned. The generalizability of the results is limited because

26

of the small sample size, failure to include a control group, and group assignment based
on the students’ enrollment status rather than a proficiency test.
The effects of feedback can also be constrained by the target structure. In other
words, different feedback types may work differently for different structures (Nicholas,
Lightbown, & Spada, 2001; Ellis & Sheen, 2006), and there is empirical evidence for this
claim. For instance, in Long, Inagaki and Ortega (1998), recasts were effective for adverb
placement but not for object topicalization in L2 Spanish learning. In Iwashita (2003),
recasts benefited the learning of te-form verbs but not of the two locative-initial targets.
Also, feedback worked differently in different studies although the research settings were
similar, which might be attributable to, among other factors, the fact that different target
structures were included. For instance, both Ammar and Spada (2006) and Lyster (2004)
investigated the effects of recasts and prompts in immersion classes. While Ammar and
Spada found that recasts facilitated the learning of English third person possessive
determiners (his/her), participants in Lyster’s study did not benefit from recasts when
learning French gender agreement. Sheen (2007) conducted a classroom study on the
effectiveness of metalinguistic feedback and recasts, and found that recasts did not work
for the learning of English articles. It should be noted that in these studies, the choice of
target structure is not an independent variable, despite the conflicting findings that are
likely to result from the different linguistic structures they included.
To date there have been two studies that examined the choice of target structure as a
variable affecting the effectiveness of corrective feedback. One is by Ellis (2007), and the
other is by Yang and Lyster (2010). Ellis investigated whether recasts and metalinguistic
feedback have differential effects on the learning of the English past tense –ed and

27

comparative –er. 34 adult ESL learners were randomly assigned to three conditions:
recasts (n = 12), metalinguistic (n = 12), and control (n = 10). The study is quasiexperimental in that it was conducted in the classroom. Results showed that recasts did
not promote the learning of either structure, but metalinguistic feedback did. Also, the
effects of metalinguistic feedback on the comparative were immediate and its effects for
the past tense –ed were delayed. Ellis speculated that this was because prior to the
treatment, the learners did not have much explicit knowledge about the comparative but
they did about the past tense –ed.
Yang and Lyster (2010) investigated the effects of prompts and recasts with 72
Chinese EFL learners. The target structures were regular and irregular English past tense.
It was found that prompts showed an advantage over recasts on 8 measures and that while
prompts worked better than recasts in assisting the acquisition of regular past-tense forms,
both feedback types worked equally well for the learning of irregular past-tense forms.
The researchers stated that the general superiority of prompts lied in the fact that they led
to more self-repair and were more salient than recasts. Recasts were more effective for
the learning of irregular past forms than that of regular past forms because of the greater
saliency the former had. Prompts outperformed recasts in the learning of regular past
forms due to the negative evidence and opportunities for self-repair afforded in prompts.
The researchers continued to argue that the reason why learners benefited equally from
prompts and recasts in learning irregular past forms was probably because the structure
was item-based. Item-based learning profited from either the positive evidence available
in recasts or negative evidence coupled with self-repair, which prompts entailed.
Taken together, these two studies as well as studies that did not include the choice of

28

target structure revealed that the effects of feedback indeed related to the nature of the
target structure. The differential effects resulted from multiple factors that may include
the unique attributes of different feedback types (explicitness/implicitness, evidence, and
learner repair), the linguistic features of the target structure (saliency and/or rule-based vs.
exemplar-based), and learners’ individual differences (aptitude, anxiety, motivation, and
so on). To date, there has been no study on how learner-related factors might affect the
differential effects of different types of feedback on the learning of different linguistic
structures. This study seeks to fill this gap by examining how two aptitude components,
language analytic ability and working memory, might impact the effects of recasts and
metalinguistic feedback in the learning of two different Chinese structures.
Chinese Perfective -le and Chinese Classifiers
Chinese perfective –le. Typologically, Chinese is different from Indo-European
languages. One distinctive feature of the Chinese language is that it has a limited number
of functional categories, among which the most studied is the aspect markers: the
perfective –le and –guo, and the progressive zheng- and –zhe. Because of its prominence
in the language, aspect has been considered one of the most defining features of
Mandarin Chinese and it has been frequently utilized to exemplify aspect languages
(Comrie, 1976; Smith, 1997; Xiao & McEnery, 2006). Probably the most extensively
studied Chinese aspect marker is –le, which is mainly due to its high frequency and the
controversy over its syntactic and/or semantic interpretations. This study investigates
how two types of feedback impact the learning of –le by second language Chinese
learners. Before exploring how instruction affects the acquisition of this structure, it is
necessary to provide a detailed description about its linguistic characteristics. To have a

29

full understanding of how -le is used, it is important to define the concept of aspect and
make a distinction between grammatical aspect and lexical aspect.
Not to be confused with tense, which indicates the relationship between event time,
the time when the event actually takes place, and speech time, the time when the event is
addressed, aspect is concerned with the relationship between event time and reference
time, the time which is used as a reference point for the event. Aspect can be represented
either grammatically or lexically. Grammatical aspect refers to aspectual distinctions
realized through linguistic devices such as the use of auxiliaries and affixation (Li &
Shirai, 2000). Lexical aspect, alternatively known as situation aspect, inherent aspect, or
Aktionsart, is marked by the inherent characteristics of lexical items. To identify the
lexical aspectual features of a verb, a binary system has been developed using such
dimensions as telicity, punctuality, and dynamicity.
According to Vendler (1957), verbs are classified into four types according to the
temporal attributes they display, which are states, activities, accomplishments, and
achievements. Smith (1997) modified Vendler’s system by adding “semelfactive” verbs.
States verbs are used to describe situations that are homogeneous and have no successive
phases or endpoints; activities verbs describe situations with successive phases but
without endpoints; accomplishment verbs encode situations with successive phases and a
natural endpoint; achievement verbs are also used to encode situations with a natural
endpoint, but they are different from accomplishment verbs in that the events are
punctual, instantaneous, and without time duration; semelfactives are punctual but they
have no endpoint. In addition, the unique groups of verbs called resultative verb
constructions (RVCs), according to Li and Shirai (2000), should be considered

30

achievements. RVCs are, as it were, combinations of accomplishments and achievement.
To better understand the semantics of the five types of verbs, the visual
representation (Table 1) by Anderson (1990, in Li & Shirai, 2000) might be of assistance.
To the original scheme, I added an illustration for RVCs and semelfactives.

Table 1. Schematization of the semantic properties of verb types
Type

Illustration

Example

State

love, contain, know

Activity

-----------------

Accomplishment

--------------X

Achievement

paint a picture, build a house

X

Achievement (RVCs) -------X-----X
Semelfactive

run, walk, swim

fall, drop, win the race
dǎkāi (push + open), shuāidǎo (slip + fall)

---X---X---X---

*

cough, tap, knock

*

These two examples are given in Pinyin, the Romanization system of the Chinese
characters.
As a perfective aspect marker, -le encodes an event in its entirety. It occurs with
situations that are [+bounded] or [+telic]. As to the interaction between lexical aspect and
grammatical aspect, -le is naturally compatible with accomplishment and achievement
verbs. For verbs (states, activity, and semelfactive verbs) that encode atelic situations,
that is, situations without an endpoint, to be used with –le, an external device (usually a
quantifier) needs to be added to set a beginning and end point or a boundary for the event.
The following sentences illustrate how –le is used to indicate perfectivity.
(a) tā shuāidǎo le.

31

他 摔倒 了。
He fall-Perf
He fell.
(b) tā pǎo le shíwǔ
fēnzhōng.
他 跑 了
十五
分钟。
He run-Perf fifteen
minutes.
He ran for fifteen minutes.
In sentence (a), the verb shuāidǎo (fall) is an achievement verb and has a natural endpoint.
In sentence (b), the verb pǎo (run) is an activity verb without a natural endpoint, but the
time duration shíwǔfēnzhōng (15 minutes) delimits the situation to license the use of –le.
It must be pointed out that although theoretically a delimiting device can be added to a
situation encoded by a state verb such as xǐhuān (like) to allow the use of the perfective
marker –le, the combination of –le with state verbs is rare in actual communication.
There has been a controversy over le’s interpretations in relation to the distinction
between the verbal -le and the sentence final -le (Van den Berg & Wu, 2006). One view
holds that there is only one –le, which marks either termination or completion (Chang,
2002; Shi, 1990; Thompson, 1968; Yang, 2003); others maintain that there are two –les: a
verbal –le which marks perfectivity and a sentence final –le which marks inchoativity or
change of state of affairs (Li & Thompson, 1981; Liu, 2001; Van den Berg, 1989; Xiao &
McEnery, 2004) (see Figure 3 for an illustration of the controversy over the verb –le and
the sentence final -le).

32

-le

one -le

marking the
boundary of an event

two -les

sentence final -le

inchoativity or
change of state of affairs

verbal -le

perfectivity

Figure 3. An illustration of the one-le and two-le controversy

The two-le view is more reasonable and the following examples show how the
verbal –le differs from the sentence-final -le. As shown, in (a), (c), and (e), the verbal –le
indicates completion and the sentence final –le in (b) and (d) describes current relevance.
Sentence (a) suggests that “I did not eat dinner at first, but now I have.” In (b), -le
indicates that the activity of buying was completed. (c) means that it did not rain at first,
but now, it has started to rain. In (d), the event of raining lasted for three days and it was
completed some time in the past. (e) in fact involves a future event, indicating that the
situation will change from “guests staying” to “guests leaving”.
a. tā chī fàn
le.
她 吃 饭
了。
He eat meal Perf
He has eaten dinner.
b. tā mǎi le
yī běn
他 买 了
一 本
He buy-Perf one-CL
He has bought a book.

shū.
书。
book.

33

c. xià yǔ le.
下 雨 了。
Fall rain-Perf
It has started to rain.
d. xià le
sāntiān yǔ.
下 了
三天
雨。
Fall-Perf three day rain.
It rained for three days.
e. kèrén yào zòu le.
客人 要
走 了。
Guest will leave-Asp.
The guests are leaving.
It is not a goal of this paper to resolve the one-le versus two-le controversy. The
objective of this project is to explore the effectiveness of corrective feedback in the
learning of –le and the target structure is the verbal –le as the usage of –le is so complex
that it would be difficult for the learner to acquire all the uses of the structure through a
short instructional treatment. The following section reviews the previous studies on the
acquisition of –le by second language speakers of Chinese.
There have been some studies on the acquisition of –le, all of which are descriptive
and none concerns how instruction facilitates the learning of the structure (Duff & Li,
2002; Christensen, 1994; Wen, 1995, 1997; Yang, Huang, & Sun, 1999, 2000). These
studies either investigated –le alone or the acquisition of –le with other aspect markers.
Wen’s studies dealt mainly with the acquisition of the aspect markers, and Yang et al.’s
two studies examined the underuse of markers and the interaction between aspect
markers and verb types. Wen’s first study investigated L2 learners’ acquisition of the
perfective –le. The subjects were 14 L2 Chinese students at an American university who
were L1 English speakers. Among them, six had studied Chinese for 14 months and were

34

considered beginners, and eight were advanced learners who had been in the program for
26 months. They were interviewed three times and engaged in three tasks: two
conversational tasks and one picture-description task. It was found that there was no
difference between the two levels of learners with regard to the accuracy rate in their use
of the verbal –le, but the advanced learners performed better than the beginners in the use
of the sentence final –le. The author claimed that this was because of the complex nature
of the sentence final -le. It was further pointed out that the learners’ correct use of the
marker had to do with whether two events were involved, whether there was a duration of
time, whether a sentence ended in a mono-syllabic verb, and whether the adverb yǐjīng
(already) was present. The results were discussed in relation to L1 transfer. For instance,
at the beginning level, the learners only used the perfective marker with past events and
avoided using it with future events (to indicate inchoativity) because it was not allowed in
their L1; some beginners also omitted the sentence final –le in obligatory contexts
because there was not such a feature in their L1.
In another study (1997), Wen investigated L1 English speakers’ acquisition of the
perfective marker –le, the experiential marker –guo, and the durative marker –zhe. The
participants were 19 students who were studying Chinese at an American university, and
they were split into two levels: 10 from the lower level (who had studied Chinese for 15
months at the time of the study) and 9 from the higher level (who had studied Chinese for
27 months). The data was elicited through the use of two tasks: an interview and a picture
description task. The results showed that in general there was no difference between the
two levels in the number of the three aspect markers produced, but learners of higher
proficiency were more accurate in using the markers. Furthermore, while the advanced

35

learners were more accurate in their use of –zhe, they did not outperform the less
proficient learners in using –le or –guo. Wen also found that the perfective markers were
acquired before the imperfective aspect and she argued that this is because of the
semantic salience, syntactic simplicity, and pragmatic consistency of the former. One
caveat about the study is that at the beginning of both tasks, as a data-soliciting prompt, a
question was asked that contained the aspect marker to be used to perform the task. It is
suspected that the question was likely to serve as a model for the use of the aspect marker,
calling into question the reliability of the results.
Whereas Wen’s studies investigated L2 Chinese learners at American universities,
Yang et al.’s studies (1999, 2000) involved learners studying Chinese in mainland China,
a second language setting. Yang et al.’s first study examined the use of three aspect
markers –le, -guo, and -zhe, and the data was extracted from a corpus containing the
1

narrative writings of the students at a 4-year Chinese program at Beijing Languages and
Cultures University. The narratives were from all levels of students (eight levels were
identified) and were assorted: they were either timed or untimed, and were collected
either inside or outside of class. It was found that the learners’ use of –le did not improve
with the increase of their proficiency, but they did make fewer errors in using –guo and –
zhe when they reached higher levels. Among the three markers, -le had the highest
frequency, followed by –zhe and –guo. Yang et al. also examined how the three markers
encoded lexical aspect. It was found that most of the errors in the learners’ use of –le
occurred in situations where the marker was used with stative verbs. The imperfective –
zhe was mostly used with statives and activity verbs, and –zhe was never used with
achievement verbs. The researchers did not provide enough information about the use of

36

the experiential marker –guo. In discussing the results, the researchers mentioned the
overuse of –le and –zhe in the learners’ written production.
In a second study, Yang et al. (2000) investigated the underuse of aspect markers by
L2 Chinese learners. The study included two types of data. One is based on the
performance of 26 L1 Korean and L1 Japanese learners of Chinese at a Chinese
university on a cloze test; the learners were divided into four levels. 120 narrative
writings from the learners served as the basis for the naturalistic production data.
According to the cloze test data, there was no difference between the four levels in terms
of the frequency of -le, but the accuracy rate steadily improved with the increase of
proficiency. The same pattern was found for the use of the durative marker –zhe and the
experiential marker –guo. The correct rate of the use of –guo was higher than that of
other markers across all the four levels, but its frequency was the lowest among the three.
Regarding error types, the researchers found that overuse of the aspect markers decreased
from lower to higher levels but underuse of the markers did not decrease as much as
overuse. At higher levels, underuse occurred more frequently than overuse. The
experimental data also showed that among the three markers, -le was underused the most,
followed by –zhe and –guo. Compared with the cloze test data, the naturalistic data
showed higher accuracy rate of the students’ use of the three aspect markers. And overall,
the experimental data was characterized by underuse whereas the naturalistic data by
overuse. The differential results from the two types of data indicated the effect of task
differences in the investigation of aspect acquisition.
Both Duff and Li’s (2002) and Christensen’s (1997) studies examined the use of –le
by nonnative speakers of Chinese as compared with that by native speakers. In Duff and

37

Li’s study, 9 native speakers and 9 nonnative speakers of Chinese were asked to complete
three tasks: “an oral retelling of the Pear Story shown on the video (Chafe, 1980), a
personal vacation narrative of vacation travel, and a written editing task of a past
narrative that contained no aspect marking on verbs” (p.428). It was found that nonnative
speakers tended to undersupply –le in oral narratives but oversupply it in the written
cloze task. Christensen (cited in Duff & Li, 2002) found that more advanced learners
used more perfective –les and resultative verb compounds than beginners.
Taken together, these studies have identified the following facts about L2 learners’
use of –le: (1) There is no difference between high- and low-proficiency learners in the
accurate use of this aspect marker; (2) overall, learners tended to overuse –le in written
tasks but underuse it in oral narratives; (3) as a perfective marker, -le is acquired earlier
than progressive markers; (4) failure to correctly use –le resulted from learners’
ignorance of the compatibility of –le with bounded situations; (5) cross-linguistic
influence existed in the acquisition of –le: Native speakers of English tended to use –le as
a past tense marker.
Chinese classifiers. Chinese is a classifier language. A classifier is a word that is
used between a determiner (that is typically a number but can also be a demonstrative or
quantifier) and a noun. The classifier is one of the most striking features of the Chinese
language (Li & Thompson, 1981). The Chinese people started to use classifiers as early
as 1400 B.C. (Erbaugh, 1986), and there are over 900 classifiers in the language (Zhang,
2007). The use of a classifier is both semantically and syntactically driven, and the choice
of classifier depends on the noun. Semantically, a classifier is used to categorize and
quantify a set of objects with the same or similar physical properties or characteristics.

38

The semantic representation of classifiers reflects how human beings perceive the world
(Craig, 1986). Erbaugh (1986) divided Chinese classifiers into shape classifiers and
function classifiers. There are two possibilities regarding the semantic motivation for the
use of classifiers: (a) The construction has is fully predictable from the context, and (b)
the construction is not fully predictable. For instance, the classifier zhāng is normally
used with flat, smooth, and thin objects, hence yī zhāng bǐng (one zhāng-CL pancake, a
pancake), or yī zhāng zhuōzi (one zhāng-CL table (because it has a thin, flat top), a table).
In this case, the use of zhāng is predictable from the context since the referent following
the classifier has all the perceptual features of the category of objects it refers to.
However, in Chinese, the word “sofa” also takes the classifier zhāng although by no
means a sofa is flat or thin (in which case the classifier is not predictable from the
context). This might be due to the fact that when the furniture sofa (both the object and
the word denoting it) was imported from the West, due to the absence of an appropriate
classifier for it, the classifier that is used for table (which is also furniture), zhāng, was
employed for this alien object. All classifiers, as Ahrens (1994) pointed out, have
semantic connections with the physical properties or functions of the objects they refer to
although the use of some appears to be arbitrary as a result of the evolution of the
language.
Syntactically, “classifiers are units of enumeration employed to mark countability;
their occurrence makes the semantic partitioning of nouns visible” (Wu & Bodomo, 2009,
p.490). A nominal classifier is a bound morpheme that must occur with a determiner or
quantifier. There are three permutations with respect to classifier use in the Chinese

39

language and in each permutation the use of a noun is optional if the referent is inferable
from the discourse context (Li & Thompson, 1984):
(1) Number + Classifier + Noun
e.g.
yī gè
rén.
一 个
人。
One-CL person.
One person.
(2) Demonstrative + Classifier + Noun
e.g.
zhè pǐ mǎ.
这 匹 马。
This-CL horse.
This horse.
(3) Quantifier + Classifier + Noun
e.g.
měi liàng
chē
每
辆
车。
Every-CL
vehicle.
Every vehicle.
Traditionally, no distinction is made between measure words and classifiers (Chao,
1968; Li & Thompson, 1984). At times, measure words are referred to as measure/count
classifiers and classifiers are called special /mass classifiers (Chien, Lust, & Chiang,
2003; Erbaugh, 1986). However, this is not reasonable. A measure word often
accompanies non-count nouns whose referents are not quantifiable as in a glass of water.
When used with count nouns, the function of a measure word is to quantify, such as a
basket of pears. A classifier is always used with count nouns to categorize as in liǎng kē
shù (two-CL trees, meaning “two trees”). Also, while there is no semantic connection
between a measure word and the accompanying noun, such connection exists between a
classifier and the noun it co-occurs with. Measure words have equivalents in English but
classifiers do not. It is the presence or absence of classifiers, not that of measure words,

40

that distinguishes classifier languages from non-classifier languages—classifiers are
language specific but measure words are language universals (Erbaugh, 1986; Li, 2000;
Tai & Wang, 1990).
The use of classifiers also relates to discourse factors. For example, Erbaugh (1986)
explored adult classifier use based on the narratives of 19 native Mandarin speakers in
Taiwan about a loosely-plotted seven-minute color film with sound but no dialogue (The
Pear Film) (Chafe, 1980). She also examined classifier use in 877 utterances in casual
conversations. It was found that the use of the general classifier gè dominated the
subjects’ classifier use. When special classifiers were used, they were used “to specify 1)
the first mention of a 2) non-present object which was 3) unfamiliar or unclear to the
hearer, especially in reference to a 4) new creation or as part of a 5) narrative, 6) pretend
play scheme, or a 7) request” (p.425). Li (2000) also approached classifier use from a
discourse perspective, arguing that classifiers served as a grounding mechanism to mark
the salience of the related noun phrases.
How Chinese classifiers are used and acquired by second language speakers has
been insufficiently studied. One representative study on L2 learners’ use of Chinese
classifiers is by Polio (1994). Her study involved 21 English and 21 Japanese speakers
studying Chinese in Taiwan. They were students from three different levels of
proficiency as measured through class placement, native speaker ratings, and an elicited
imitation test. As in Erbaugh’s study about L1 speakers’ use of classifiers, the data in
Polio’s study were also collected using the Pear Film narratives. It was found that the
nonnative speakers rarely omitted a classifier in obligatory contexts. However, when
omission errors happened, they were invariably committed by English speakers (eight out

41

of 21 English speakers did), and the Japanese speakers never made such errors. This
seemingly insignificant finding was not given further explanation but it is without doubt
worth attending to in light of the fact that English is a non-classifier language whereas
Japanese is a classifier language, although in Japanese a classifier is used after a noun but
in Chinese it precedes a noun.
Chinese perfective –le versus Chinese classifiers. The choice of target structure is
an independent variable in this study because it is speculated that different feedback types
may have differential effects on the learning of different linguistic structures. As
previously mentioned, there has been empirical evidence that feedback worked
differently for different structures across studies (Ammar & Spada, 2006; Lyster, 2004;
Sheen, 2007), but thus far, Ellis (2007) and Yang and Lyster (2010) are the only studies
that have examined the choice of structure as an independent variable. Ellis’s study
included the English past tense –ed and comparative –er as the target structures, but the
effects of the two feedback types (recasts and metalinguistic clues) did not differ
substantially. Yang and Lyster’s study investigated the differential effects of recasts and
prompts on the learning of English regular and irregular past forms in a classroom setting.
This study investigates whether implicit and explicit feedback, operationalized as recasts
and metalinguistic correction, facilitates the interlanguage development of two very
different Chinese structures in a lab setting.
Previous sections provided extensive, separate discussions on the two included
target structures of this study. This section seeks to juxtapose the two structures and offer
a comparison between them along various dimensions. DeKeyser (2005) pointed out that
difficulty in grammar learning may relate to at least three factors: form, meaning, and

42

form-meaning mapping (see Table 2 for different schemes on structural difficulty).
Difficulty of form may result from the competition between available choices when the
learner is selecting the right morpheme and allomorph. The meaning of a form may
constitute a source of difficulty because of its novelty or abstractness or both. A major
source of difficulty is form-meaning mapping, that is, the link between form and meaning
is not transparent. There are three contributing factors to the difficulty related to formmeaning mapping: 1) redundancy—the form is not semantically necessary, 2) optionality
— the supply of the form is not obligatory, and 3) opacity — a morpheme has different
allomorphs and the same form stands for different meanings. DeKeyser went on to state
that if form-meaning mapping is clear, minimal exposure may be enough for acquisition;
if it is obscure, the structure may pose a great challenge for adult learners. Goldschneider
and DeKeyser (2005) explored the factors contributing to the sequence of L2 English
morpheme acquisition through a meta-analysis. A large portion of variance in acquisition
order was accounted for by five determinants: perceptual salience, semantic complexity,
morphophonological regularity, syntactic category, and semantic complexity. The authors
pointed out that all these factors related to saliency to varying degrees.
Ellis (2007) developed a set of criteria to determine the difficulty of a linguistic form.
These criteria include (a) grammatical domain—whether a form is morphological or
syntactic, (b) input frequency, (c) learnability/processibility (Pinneman, 1998)—linguistic
forms are processable at different stages of development, (e) explicit knowledge—the
complexity of the rule explanation of a structure, (f) scope (Hustijn & De Graaff, 1994)—
the scope of a rule is large if it covers more than 50 cases, (g) reliability—a rule has high
reliability if it applies to more than 90% of all cases, and (h) formal semantic

43

redundancy—forms that are not necessary for meaning processing are semantically
redundant.
In light of the difficulty of approaching structural difficulty theoretically, some
researchers based their judgment on the ratings by language instructors. For instance,
Robinson (1997) asked 15 ESL teachers to rate the complexity of some selected rules and
identified an easy rule and a hard rule for instructional treatment. Drawing on the
schemes developed by previous researchers and given the characteristics of the two target
structures of this study, a comparison is made between them based on the following
criteria:
(1) Redundancy. Redundancy refers to the fact that the use of a linguistic structure is
semantically superfluous although it might be syntactically obligatory. Whether a
certain structure is redundant is dependent upon whether it is indispensible to the
accurate interpretation of the utterance that contains the structure. Alternatively, one
can determine the redundancy of a linguistic feature by examining whether the
meaning the feature encodes can also be encoded through other linguistic features or
devices in the utterance. A classifier is critical to the accurate interpretation of the
determiner phrase where the classifier is situated. Missing a classifier or using a
wrong classifier is likely to distort the meaning of the utterance or make the utterance
*

unintelligible. For instance, the utterance sān hé (three rivers), where the classifier
*

tiáo is missing, is hardly intelligible to a Mandarin speaker. And sān zuò hé, where
the classifier for “bridge” is used for rivers, sounds equally foreign.
Unlike the classifier, the perfective –le is redundant in many cases. This is because
the Chinese language is heavily discourse- or topic-oriented, and semantic

44

interpretation is largely dependent upon the context rather than the syntax of the
utterance (Yang, 1995; Duff & Li, 2002). The absence of the perfective –le can be
compensated for by the use of time expressions, the sequence of events, and other
*

discourse or linguistic devices. For example, in the sentence zuó tiān wǒ chī le sān
gè píng guǒ (Yesterday I ate three apples), -le is used with the activity verb chī (eat)
to encode perfectivity. However, the time expression and the number can jointly mark
perfectivity in the absence of -le. Furthermore, linguists noticed that –le is more often
used with monosyllabic words than disyllabic words to meet the disyllabic feature of
the Chinese language (Chang, 1986; Yang, Huang, & Cao, 2000). The use of –le in
these cases is obviously phonologically rather than syntactically or semantically
driven. In conclusion, classifiers are more meaning-loaded than the perfective –le as
incorrect classifier use is a source of communication breakdown but absence of the
perfective -le may not impede information exchange.
(2) Perceptual saliency. According to Goldschneider and DeKeyser (2005), perceptual
saliency refers to the ease in hearing or perceiving a given structure. The perfective –
le is always affixed to the verb in the post-verbal position and is always pronounced
in a neutral tone (Li & Shirai, 2000). A Chinese classifier always precedes the noun
and its tone is not neutralized. Therefore, classifiers would seem to be more
perceptually salient than the perfective –le.
(3) Form-meaning mapping. Form-meaning mapping can be transparent or opaque. In the
case of the perfective –le, the form-meaning mapping is opaque because of the fact
the form has two variants that have different interpretations. The verbal –le encodes
completion and boundedness, and the sentence-final –le indicates current

45

relevance/change of situation. So this same form may occur in different positions of a
sentence and stands for different meanings. The form-meaning mapping of classifiers
is transparent in that a certain classifier is usually used with one object or objects that
fall into the same category because of the physical properties they have in common.
(4) Explicit knowledge. The rule explanation for the use of the perfective –le is complex
because it involves at least two components: (a) the event is completed, and (b) the
situation must be bounded or have an endpoint. Thus, the rule is difficult to
understand and learn as explicit knowledge. The rule of classifiers, conversely, is
relatively easy: It only states that a certain classifier must be used with a particular
noun.
(5) Learnability /teachability. Using Pienemann’s Processiblity Theory (1998), Zhang
(2005) investigated the emergence of five Chinese structures in the speech production
of three L2 Chinese learners, who were enrolled in a first-year Chinese course at an
Australian university. Among the five structures were two aspect markers
(progressive and experiential) and classifiers. Zhang argued that according to the
processibility theory, aspect markers are lexical morphemes and require the Category
Procedure (Stage 2; Stage 1 is called Word/Lemma) to implement. The processing of
the classifier, however, involved the numeral, the classifier proper, and the head noun.
Thus, it was processed through the Phrasal Procedure (Stage 3). However, contrary to
the prediction of the Processibility Theory, classifiers emerged earlier than the aspect
markers. Zhang failed to find a reasonable explanation for the finding. Both Wen
(1997) and Yang et al. (2000) showed that there was no difference between high- and
low-proficiency learners in terms of their accuracy in using –le. This was ascribed to

46

the difficulty in acquiring the structure. With regard to classifiers, Li (2009) found
that fourth-year Chinese learners did not differ from second-year learners in the use
of classifiers in their pretest scores. Li conjectured that this might be due to the
reduced occurrence of classifiers in the textbook for the advanced learners. Taken
together, these studies showed 1) both structures emerged very early in learners’
interlanguage, 2) there seemed to be no difference between beginners and advanced
learners in their accurate use of the two structures, but this counter intuitive finding
was likely caused by different factors: -le was difficult and classifiers were less
frequent in textbooks for learners at the higher level.

47

Table 2. Schemes on structural difficulty

DeKeyser ‘05
form: choice between
morphemes and allomorphs
meaning: novelty/abstractness

form-meaning mapping:
a. redundancy
b. optionality
c. opacity

Goldschneider & DeKeyser ‘05

Ellis ‘07

Robinson ‘97

morphonological regularity:
extent to which a form is
affected by its phonological
environment
frequency: number of times a
form occurs

grammatical domain:
whether a form is
morphological or syntactic

complexity of the structure
described by pedagogical rules

input frequency: how
frequently a form occurs in
the input

complexity of pedagogical
rules describing the structure

semantic complexity: number of
meanings a form expresses

learnability: extent to which
a form is processable

expert opinion from instructors

perceptual salience: ease in
hearing or perceiving a given
structure

explicit knowledge:
complexity of rule
explanation

syntactic category: whether a
form is lexical or syntactic

scope: number of cases a rule
covers
reliability: percentage of all
cases a rule applies to
formal semantic redundancy:
indispensability in expressing
meaning

48

Feedback, Language Analytic Ability, and Working Memory
Language Aptitude
As previously mentioned, the effects of corrective feedback have to do with learnerinternal as well as learner-external factors. Whereas learner-external factors include the
characteristics of feedback, the target structure, and the instructional context, learnerinternal factors relate to age, proficiency, noticing (how learners perceive feedback or
whether learners notice feedback), and individual differences in aptitude, anxiety,
motivation, attitude toward feedback, and the like. As Ellis and Sheen (2006) noted,
among the various factors affecting the effects of feedback, there had been very little
research on the moderating effects of individual difference variables. Among the various
individual difference variables, language aptitude is worthy of special attention.
According to Robinson (2005), “second language (L2) aptitude is characterized as
strengths individual learners have—relative to their population—in the cognitive abilities
information processing draws on during L2 learning and performance in various contexts
and at different stages” (p.46). Carroll and Sapon claimed that learners’ language aptitude
is stable and not subject to training or environmental factors; it is “largely independent of
intelligence, and is distinct from motivations and attitudes of the learner” (2002, p.24).
Also, as Sawyer and Ranta (2001) observed, aptitude is not susceptible to learners’
previous language learning experience and is not “a matter of skill development” (p. 334).
Aptitude has received much attention in SLA because it is the most predictive of L2
proficiency among individual difference variables (Ehrman & Oxford, 1995; Hummel,
2009; Reves, 1983; Robinson, 2005; Skehan, 1998; Sparks, Patton, Canschow &
Humbach, 2009). Studies using the Modern Language Aptitude Test (MLAT) developed

49

by Carroll and Sapon (1959) showed that the correlation between aptitude and L2 success
ranged from .4 to .6 (Robinson, 2005).
The MLAT (Carroll & Sapon, 1959, 2002) has been used as a standard measure of
second language aptitude. It consists of five parts that measure four dimensions of
aptitude including learners’ phonetic coding ability, grammatical sensitivity, rote learning
ability, and inductive language learning ability. It should be noted that the distinction
between grammatical sensitivity and inductive language learning ability is fuzzy, and that
the latter in fact is not measured in the MLAT (Carroll, 1962; Erlam, 2005; Sawyer &
Ranta, 2001). Skehan (1998) contended that these two abilities have the same underlying
construct: language analytic ability. Hence, in this study, the term “language analytic
ability” is adopted for grammatical sensitivity. Other test batteries have been developed
such as the PLAB (Pimsleur, 1966) and the DLAB (Petersen & Al-Haik, 1976), but the
MLAT has so far proved to be the best instrument to measure language aptitude (Sawyer
& Ranta, 2001; Dörnyei, 2005). Carroll and Sapon (2002) stated that the MLAT can be
use to select students for foreign language courses, estimate individual students’
probability of L2 learning success so that counselors can provide appropriate guidance,
achieve placement purposes, and diagnose students’ learning abilities so as to match
learner types with instructional approaches.
Initially, the primary purpose of L2 aptitude research was to examine the extent to
which aptitude could differentiate learners in terms of the rate at which to achieve L2
gains. However, aptitude research waned in the 1970s because of two reasons (Robinson,
2002): (a) By emphasizing the role of aptitude, learners’ individual efforts are diminished,
and (b) some researchers (Cook, 1996; Gardner, 1985; Spolsky, 1989) claimed that with

50

the advent of communicative language teaching, the predictive power of aptitude tests,
which were developed in audio-lingual instructional contexts, no longer obtained. Despite
a temporary slowdown, aptitude research has resurrected in recent years. On one hand,
researchers found that aptitude predicted L2 success in all sorts of learning environments
including communicative language classes (Ehrman & Oxford, 1995; Ranta, 2002),
meaning-based immersion classes (Harley & Hart, 1997), informal learning settings
2

(Reves, 1983), and the laboratory (Robinson, 1997) . On the other hand, acknowledging
the limitation of using aptitude measures only for prediction or selection purposes,
researchers embarked on exploring new venues of investigation.
Aptitude-Treatment Interaction
Snow (1987, 1991; Cronbach & Snow, 1977) argued that aptitude should not be
only considered from the learner’s perspective but also from the perspective of how it
interacted with situational constraints. Building on Snow’s concept of aptitude-treatment
interaction, researchers (Robinson, 2005; Segalowitz, 1997) pointed out that aptitude
should not be viewed as a fixed characteristic because the learner is situated in a complex,
dynamic environment that imposes different cognitive demands on learners with different
experiences and/or at different stages of learning. Also, instead of a monolithic construct,
aptitude is composed of multiple components, and these (sets of) components interact
differently with different learning conditions (Robinson, 1997, 2002) and are drawn upon
at different stages of learning (Dörnyei & Skehan, 2003; Skehan, 2002).
Empirical research has shown support for the above claims. For instance, research
showed that the role aptitude and aptitude components played varied depending on the
stage the learner was at. DeKeyser (2000) and Sasaki (1996) found that aptitude had little

51

to do with pre-critical period language learning and that it only affected post-critical
period learners. Harley and Hart (1997) found that pre-critical period learning related
with memory and post-critical learning had to do with analytic ability. In terms of
aptitude-treatment interaction, Wesche (1981) found that learners benefited more from
the instruction that matched their cognitive strengths than from the instruction that did
not. For instance, students with high analytic ability benefited more from the analytical
teaching approach than from other approaches.
Robinson’s work (1997, 2002) provided further evidence for the importance of
examining the interaction between aptitude and different learning conditions. In the 1997
study, Robinson investigated whether aptitude related to awareness and learner gains in
four conditions: (a) implicit (memorizing examples only), (b) incidental (processing
examples for meaning), (c) rule-search (trying to find rules), and (d) instructed (applying
a rule explanation to examples). 104 intermediate ESL learners participated in the study.
Aptitude measures included the Paired-Associates Subtest (for memory) and the Words
in Sentences Subtest (for grammatical sensitivity) of the MLAT. The combined scores of
the two subtests were used as the global score of aptitude. It was found that in the implicit
condition, learners’ posttest scores and awareness correlated with grammatical sensitivity;
in the instructed condition, memory was positively related to awareness; in the rulesearch condition, grammatical sensitivity was positively related to awareness. The only
condition that was unaffected by aptitude in terms of learning or awareness is the
incidental condition. Robinson speculated that it was because the incidental condition did
not draw on the learner’s rote memorization ability. To address this issue, he included a
working memory test in the 2002 study and found significant correlations between

52

working memory and learning in the incidental condition. Robinson concluded that (a)
adult learning in all conditions is similar and is sensitive to individual differences, (b) the
extent to which learning is affected by individual differences is determined by whether
the processing demands of the learning condition match the cognitive ability under
question, and (c) different learning conditions draw on different aptitude complexes or
different sets/combinations of aptitude components.
In sum, to move aptitude research forward, researchers (Robinson, 2005; Skehan,
2002) have called for more research into the interaction between language aptitude and
learning conditions. To echo this call from aptitude researchers, scholars in feedback
research (e.g., Ellis & Sheen, 2006) called for more research into the role of individual
difference variables such as language aptitude in affecting the efficacy of corrective
feedback. The need for more research into individual differences is evident in Ellis’s
statement that “[t]he vast bulk of CF studies has ignored learner factors, focusing instead
on the relationship and the effect of specific CF strategies and learning outcomes” (2010,
p. 339). This study addresses the relevant issues by investigating the relationship between
corrective feedback and two aptitude components: language analytic ability and working
memory.
Language Analytic Ability and Feedback
Language analytic ability is often measured with the Words in Sentences subtest of
the MLAT. Carroll defined language analytic ability (grammatical sensitivity) as “the
ability to recognize the grammatical functions of words (or other linguistic entities) in
sentence structures” (1981, p.105) or “the individual's ability to demonstrate his
awareness of the syntactical patterning of sentences in a language and of the grammatical

53

functions of individual elements in a sentence” (1973, p.7). Previous research showed
that among the four components included in the MLAT, language analytic ability is
probably the most predictive of L2 proficiency (Ehrman & Oxford, 1995; Hummel, 2009;
Ranta, 2002).
Research has demonstrated that language analytic ability interacted with contextual
factors and affected learners at different acquisition stages. Reves (1983) found that
language analytic ability played a greater role in formal classroom contexts than in
naturalistic learning settings. Robinson (1997) found that language analytic ability did not
affect the incidental learning condition but it related to learning in both the implicit and
explicit conditions. Erlam (2005) studied the relationship between language analytic
ability and three learning conditions: deductive instruction, inductive instruction, and
structured input instruction. Language analytic ability was found to be positively
correlated with learning in the inductive condition and the structured input condition, but
it was not related to learning in the deductive condition. Erlam also found that the
correlations between language analytic ability and learning conditions were subject to test
format. Finally, there is also evidence that language analytic ability did not affect child
learners but it related to adult L2 learning (DeKeyser, 2000; Harley & Hart, 2002; Rose,
Sasaki, & Yoshinaga, 2002).
In terms of how language analytic ability interacts with the effectiveness of
corrective feedback, the picture is far from clear because there have been only a few
relevant studies (DeKeyser, 1993; Sheen, 2007a, 2007b; Trofimovich, Ammar, &
Gatbonton, 2007, which will be reviewed below in the “Working memory and feedback”
section). DeKeyser’s longitudinal study investigated the relationship between two

54

feedback types (implicit and explicit) and three individual difference variables: language
analytic ability, extrinsic motivation, and anxiety. Participants were two classes of Dutchspeaking high school learners of French (n = 19; n = 16). During a full school year, the
instructor of one class was told to correct mistakes as frequently and explicitly as possible,
and the instructor of the other class was directed to avoid error correction. Posttest results
showed that the class which received feedback did not outperform the no-feedback class
and that language analytic ability did not correlate with the effectiveness of feedback,
despite the fact that feedback correlated with the other two individual difference variables.
The absence of a link between feedback and language analytic ability was ascribed to the
strong effect of anxiety, which might have neutralized the role of aptitude.
Sheen (2007a; 2007b) conducted two similar studies to explore how the effects of
feedback were mediated by learners’ language analytic ability. In one study (2007b), she
investigated the extent to which ESL learners benefited from two types of written
feedback: direct-only correction (provision of correct form) and direct metalinguistic
correction (correction + metalinguistic explanation), in the learning of two uses of
English indefinite and definite articles: a as first mention and the as anaphoric reference.
The study involved 92 students from 6 classes and these students formed two experiment
groups and one control group. There were two treatment sessions, during which the
students read a story, the instructor discussed the moral of the study, and finally the
students rewrote the story. Students’ writings were turned in to the researcher, who
provided different types of feedback or no feedback to the mistakes the students made in
using the target structures. 2-4 days later, the students attended a feedback session where
they went over the comments provided on their writings. Three tests were used to

55

measure the effects of feedback: speeded dictation, narrative writing, and error correction.
Language analytic ability was measured with a test used by Schmitt et al. (2003) that
consisted of 14 multiple choice questions asking the students to choose the correct
translation for a sentence in an artificial language after they were familiarized with a list
of exemplars in the artificial language and the English equivalents. The results indicated
that language analytic ability correlated with the gains of both feedback groups, but that
the correlations were stronger for the delayed effects for feedback.
In the other study (2007a), Sheen replicated the afore-reviewed study in the oral
mode, that is, participants received oral rather than written feedback on their mistakes in
using English articles. The two feedback types provided in this study were recasts and
metalinguistic correction. The results obtained regarding the relationship between
feedback and language analytic ability were somewhat different from those in the other
study. Whereas language analytic ability related to the effects of both feedback types in
that study, it only correlated with the effects of metalinguistic feedback in this study,
indicating different results for oral and written feedback. What merits attention is that a
negative, albeit insignificant, correlation was found between the effects of recasts and
language analytic ability.
As shown, there is a very limited amount of research on how the effects of feedback
are constrained by the learner’s language analytic ability, a major component of language
aptitude. The few previous studies were carried out in the classroom, which might not be
an ideal setting to investigate individual difference variables. Also, previous research
only addressed certain aspects of the relation between feedback and language analytic
ability, and many questions remained to be answered. DeKeyser’s study (1993) examined

56

two broad categories of feedback (implicit and explicit); how specific feedback types
interact with aptitude is not clear. Sheen investigated how recasts and metalinguistic
correction, provided as written and oral feedback in two respective studies (2007a,
2007b), related to language analytic ability. The results from the two studies were
different. Also, the target structure in both studies was English definite and indefinite
articles, a non-salient linguistic feature. One question that needs further exploration is
whether aptitude interacts with different feedback types in the learning of different
linguistic structures. This study seeks to answer the question.
Working Memory and Feedback
The term “working memory” has been adopted for short-term memory to reflect the
fact that instead of being merely a warehouse to store incoming data, it is also responsible
for information processing. Miyake and Friedman (1998) rightly pointed out the
difference between working memory and the traditional conception of short-term
memory:
“Unlike the traditional conception of short-term memory (STM) as a fixed set of
slots that passively store to-be-maintained information…the conception of WM is
more closely tied to the dynamic nature of the processing and storage activities, such
as executing various language processes and maintaining intermediate products of
the processing”. (p.341)
There are two views on the architecture of the working memory construct (Conway,
Jarrold, Kane, Miyake, & Towse, 2007; French, 2006): the unitary approach and the
multicomponential or multifaceted approach. Researchers embracing the unitary
approach believe that working memory is a single construct that performs both storage

57

and processing functions (Daneman, 1991; Daneman & Carpenter, 1980). There are
others who hold that working memory consists of a central executive and several slave
systems (Baddeley, 2003, 2006, 2007; Baddeley & Hitch, 1974). The central executive is
responsible for the control and regulation of the working memory system, and the
subcomponents include the phonological loop that stores phonological/auditory
information, the visuospatial sketchpad that involves the generation and storage of visual
information, and the episodic buffer that integrates information from a variety of systems
and from long-term memory.
Working memory is operationalized in two ways and is measured accordingly. One
way is to define it as phonological working memory, which is measured through digit
span or nonword repetition tests where the learner is asked to repeat a sequence of digits,
words, or nonsense syllables (e.g., Baddeley et al., 1998). However, some researchers
argued that digit span or nonword repetition tests only measure the storage function of
working memory and a good working memory test should also measure the processing
function (Daneman & Carpenter, 1980; Walters & Caplan, 1996). In their seminal study,
Daneman and Carpenter (1980) developed a reading span test that taps into both the
storage and processing components. The test has been used in numerous studies as a
standard measure of working memory. During the test, subjects were required to read sets
of unrelated sentences and recall the sentence-final words in each set. The researchers
also developed a listening span test as a variant of the reading span test, where
participants were asked to listen to some sentence stimuli read by the presenter and recall
the final words of the sentences. The rationale behind the reading/listening test is that
participants had to process the meaning of a sentence when reading or listening to it and

58

at the same time memorize the final word of the sentence. Daneman and Carpenter found
that the reading and listening span test scores correlated with college students’ reading
comprehension ability and verbal SAT scores, but traditional word span measures did not.
The finding that complex span tests that tax both processing and storage are better
predictors of L1 and L2 learning was also obtained by other researchers (Harrington &
Sawyer, 1992; Lehto, 1996; Miyake & Friedman, 1998; Waters & Caplan, 1996).
Although sentence span tests have proven to be one step forward compared with
traditional word- or digit-span tests, they are not unproblematic. The problem lies in the
way the tests are scored. Despite the claim that sentence span tests measure both the
processing and storage functions of working memory, it is usually only the recall
component that is scored. Researchers argued that learners might trade off between
processing and recall accuracy, that is, they might sacrifice the speed and accuracy of
processing to achieve better recall scores (Waters & Caplan, 1996; Leeser, 2007). To
verify this hypothesis, Waters and Caplan administered a test during which subjects were
asked to view some sets of sentence stimuli, judge whether each sentence made sense in
the real world, and recall the sentence-final words in each set. Scores for reaction time,
plausibility judgment, and recall accuracy were all calculated, and negative correlations
were found between the three scores, showing that the subjects did trade off between the
three components. It was also found that the global score of the three components was a
better predictor than the score for recall accuracy alone.
Working memory has been found to correlate with both L1 and L2 learning whether
it is measured as phonological short-term memory using word/digit span tests or as a
construct that is responsible for processing and storage activities and that is measured

59

with sentence span tests. In L1 research, it is found that phonological short-term memory
is a strong predictor of vocabulary acquisition (Avons, Wragg, Cupples, & Lovegrove,
1998; Gathercole, Frankish, Pickering, & Peaker, 1999; Michas & Henry, 1994), and that
learners’ performance on sentence span tests is a strong predictor of reading
comprehension ability (Daneman & Carpenter, 1980; Daneman & Merikle, 1996; Just &
Carpenter, 1992; Waters & Caplan, 1996). In L2 research, phonological short-term
memory is found to be associated with vocabulary learning (Papagno, Valentine, &
Baddeley, 1991; Service & Kohonen, 1995) and grammar learning (N. Ellis & Sinclair,
1996; Williams & Lovatt, 1999; Hummel, 2009). Working memory measured with
sentence span tests are shown to predict reading comprehension ability (Harrington, 1991;
Harrington & Sawyer, 1992; Leeser, 2007), listening comprehension ability (Miyake &
Friedman, 1998), acquisition of morphosyntax (Mackey et al., 2002; Sagarra, 2007;
Trofimovich, Ammar, & Gatbonton, 2007), and modified output in L2 interaction
(Mackey, Adams, Stafford, & Winke, 2010).
In L2 research, there has been a call to investigate working memory as an aptitude
component (Miyake & Friedman, 1998; Robinson, 2005; Skehan, 1982). Robinson
argued that aptitude as measured by the MLAT and other test batteries were developed in
audiolingual teaching where rote-learning was a major feature. However, in
communicative language teaching, linguistic forms are addressed in meaning-focused
instruction, and the processing demands of this type of instruction are different from
those of audiolingual classes. Thus, “for these learning conditions, a measure of aptitude
that reflects the processing demands of simultaneous attention to form and meaning, with
its attendant demands on working memory [emphasis added] would seem to be

60

necessary” (p.215). Skehan also argued that the MLAT subtest that is concerned with the
memory component of aptitude measures learners’ associative memory, which may not
be most predictive of language learning. Furthermore, there is empirical evidence to
justify working memory as an aptitude component. For instance, Robinson (1997)
examined the correlation between aptitude and four learning conditions and found that
aptitude did not relate to the treatment effects in the incidental condition. However, in a
later study (2002), a working memory test was used and aptitude was found to correlate
with the incidental condition.
With respect to the relationship between working memory and the effectiveness of
corrective feedback, there have been three published studies (Mackey et al., 2002;
Sagarra, 2007; Trofimovich, Ammar, & Gatbonton, 2007). Mackey et al. investigated the
relationship between working memory, noticing of recasts, and the effects of recasts in
the learning of English question formation. The participants were 30 Japanese EFL
learners. The learners’ working memory capacities were based on their scores on three
measures: a non-word recall test, an L1 listening span test, and an L2 listening span test.
The noticing data were based on the learners’ metalinguistic comments during a
stimulated recall and the learners’ responses to an exit questionnaire. Learning of the
target structure was determined by way of the production of targetlike higher-stage
questions after the learners received recasts on their nontargetlike production of English
questions while engaged in communicative tasks. Results showed that more noticing was
reported by learners with higher working memory capacities (the result was only obtained
for the composite working memory score) and by learners at lower developmental level
of the target structure (the result was obtained only for the non-word recall working

61

memory subtest). In terms of the contribution of working memory to learner outcome,
learners with lower working memory capacities showed more improvement at the
immediate posttest; learners with higher working capacities demonstrated more
interlanguage development at the delayed posttest. Mackey et al.’s study is important in
that it is the first attempt to address the relationship between working memory and the
effects of corrective feedback in SLA research. However, due to the small sample size,
the authors cautioned against the generalizability of their findings.
Trofimovich, Ammar, & Gatbonton (2007) investigated the role of attention,
memory, and analytic ability in affecting the effects of computerized recasts. During the
study, 32 adult Francophone learners of English were presented with some pictures on a
one-on-one basis, the description of which required the use of the target structures. The
learner’s description of each picture was followed by a recorded native speaker response
that served as a recast. Two memory measures were used: one was a non-word repetition
test measuring phonological short-term memory, and the other, called a working memory
test, was the Letter-Number Sequencing subtest of the Wechsler Adult Intelligence test
(Psychological Corporation, 1997). The Words in Sentences subtest of the MLAT was
used to measure analytic ability, and attention control was tested using the Trail Making
Test of the US Army Individual Test Battery. It was found that recasts were effective and
that learners’ individual differences in attention control, analytic ability, and phonological
working memory were predictive of the learners’ interlanguage development; working
memory was not a significant predictor.
Similar to Trofimovich, Ammar, & Gatbonton (2007), Sagarra (2007) examined the
effects of recasts that were provided via the computer and the effect of working memory

62

on the effectiveness of recasts. 82 L1 English speakers enrolled in first-semester Spanish
classes at a U.S. university participated in the study. They were asked to fill in the blanks
in some Spanish sentences using the correct forms of the given adjectives. A recorded
recast was provided when an error was made. The effects of recasts were tested with a
written test as well as an oral production test. The working memory test was adapted
from Waters and Caplan’s sentence span tests (1996), and scores were computed only
based on the items where the learner was accurate in plausibility judgment, the reaction
time was not an outlier, and the final word of the sentence was correctly recalled. The
results revealed that recasts were effective and the effects were associated with the
learners’ working memory capacities.
Previous research has established a link between corrective feedback in the form of
recasts and working memory. However, further research is warranted to address
remaining issues. Mackey et al.’s study revealed some interesting and thought-provoking
findings, but these findings need to be verified and tested with more learners and in
different contexts. Trofimovich, Ammar, and Gatbonton (2007) and Sagarra (2007)
obtained some valuable results, but in both studies, recasts were provided in the computer
mode and in discrete item practice. How working memory interacts with feedback in
meaningful communication remains to be seen. All three studies investigated recasts, so
how working memory relates to the effects of other feedback types needs further
exploration. Also, in previous research, working memory was either operationalized as
phonological short-term memory, or when it was measured using complex, sentence-span
tests, a score that included all three components of the measure (reaction time,
plausibility judgment, and word recall) was not used to reflect both the processing and

63

storage functions of working memory. Finally, it is speculated that the learning of
different linguistic structures might impose different processing demands on the learner’s
working memory. To date, no study has examined the interaction between the choice of
target structure and working memory in feedback research. This study was undertaken to
address this gap by including two very different structures: Chinese perfective –le and
Chinese classifiers.
2.5 Research Questions
The review of the literature shows that the facilitative role of corrective feedback is
theoretically justified (Gass, 1997, 2003, 2004; Long, 1996, 2007) and empirically
verified (Li, 2010; Lyster & Saito, 2010; Mackey & Goo, 2007; Norris & Ortega, 2000;
Russell & Spada, 2006); it is also abundant in second language classes (Lyster & Ranta,
1997; Loewen, 2004; Sheen, 2006). Now that the effects of feedback have been
established, the question arises as to what factors, be they learner-external or learnerinternal, mediate the effects. The identification of the constraining factors of the effects
of feedback is equally, if not more, important than the establishment of its effects. This
study investigates how the effectiveness of implicit and explicit feedback is affected by
the learner’s proficiency, working memory capacity, and language analytic ability in the
learning of two Chinese structures. The following research questions are formed:

RQ1: Do explicit feedback and implicit feedback facilitate the learning of Chinese
perfective –le? If so, do they have differential effects on learners at different
proficiency levels in the learning of the structure?
RQ 2: Do explicit feedback and implicit feedback facilitate the learning of Chinese

64

classifiers? If so, do they have differential effects on learners at different
proficiency levels in the learning of the structure?
RQ 3: Do the two feedback types work differently in the learning of the two target
structures?
RQ 4: What is the relationship between feedback type, the nature of linguistic structure,
and learners’ language analytic ability?
RQ 5: What is the relationship between feedback type, the nature of linguistic structure,
and learners’ working memory capacity?

65

CHAPTER 3 METHOD
The previous chapter laid out the theoretical framework and provided the rationale
for the investigation of the variables included in this study. Previous studies on corrective
feedback were discussed and issues were identified that need to be addressed in further
research. This chapter details how the study was conducted with regard to the
characteristics of the participants, tasks where the target structures were elicited and
feedback was provided, the procedure, testing materials, coding schemes, and the
statistical analyses that were performed.
Participants and Grouping
The participants of this study were 78 learners of Chinese from two large
Midwestern U.S. universities. Among them, 75 were native speakers of English and 3
3

reported Korean as their native language . Heritage speakers of Chinese were not
included in the study and they were identified by being asked to respond to the question
of whether their parents were Chinese and whether they spoke Chinese at home. The
instructors of the classes that contributed participants were also consulted to verify the
participants’ linguistic background. At the time of data collection, the learners were in
th

th

th

their 4 (n = 41), 6 (n = 20) and 8 (n = 17) semesters of their Chinese study. 34 of the
learners were female and 44 were male. With respect to the learners’ enrollment status, 6
were freshmen, 20 were sophomores, 28 were juniors, 21 were seniors, and 3 were
graduate students. They were aged between 18 and 38, and the average age was 20.78
(SD = 2.48). The learners volunteered to participate in the study and were provided
monetary compensation and extra credit points in return for their time commitment.
A standardized Chinese proficiency test named HSK (see the “testing” section for

66

details about this test) was administered to each participant because proficiency is an
independent variable in this study and a major goal of this study is to explore whether
different types of feedback affect high-proficiency and low-proficiency learners
differently. Using a proficiency test also made it possible to recruit students from two
academic institutions. Based on their performance on the proficiency test, the learners
were divided into two large groups: high and low. The full score of the test is 60 and the
median of the learners’ scores, 29, was set as the cut-off point for the high-low division:
Learners who scored 29 or higher were labeled “high”, and those who scored 28 or lower
were labeled “low”. An Independent-Samples t-test was performed and showed that the
two resultant proficiency groups were significantly different in terms of their test scores,
t(76) = -11.65, p < .00 (the statistics for grouping information appear in Table 3).
At each level (high and low), the learners were divided into three subgroups:
implicit, explicit, and control, depending on the type of feedback they received.
Consequently, six groups were generated, three at each proficiency level: low-implicit,
low-explicit, low-control, high-implicit, high-explicit, and high-control. One-way
ANOVAs were conducted to make sure that the three groups at each level were
comparable in terms of proficiency. The analyses showed no significant difference
between the three groups at the low level, F(2, 36) = .71, p = .51, or at the high level, F(2,
36) = 0.36, p = .70.

67

Table 3. Descriptive statistics for groups
Low proficiency

High proficiency

n

M

SD

n

M

SD

39

23.31

3.01

39

36.67

6.49

LI

LE

LC

HI

HE

HC

n

M

SD

n

M

SD

n

M

SD

n

M

SD

n

M

SD

n

M

SD

14

24.07

3.29

15

22.80

3.00

10

23

2.67

14

35.64

5.83

14

36.71

5.70

11

37.91

6.51

Note. LI = low implicit; LE = low explicit; LC = low control; HI = high implicit; HE = high explicit; HC = high control

68

Feedback Operationalization
Implicit Feedback
Implicit feedback was operationalized as recasts, that is, the reformulation of the
learner’s nontargetlike production of the target structures (Chinese classifiers and the
Chinese perfective -le). There are several issues regarding the implicitness/explicitness of
recasts. Recasts can be explicit if the interlocutor uses linguistic (such as repeating the
wrong utterance in Doughty and Varela (1998)) and/or paralinguistic signals (such as
through prosodic features) to convey the corrective intention. The second factor relates to
the characteristics of recasts, that is, whether they are partial or full, whether they involve
a single move or multiple moves, and so on (Loewen & Philp, 2006; Sheen, 2006). Still
another factor has to do with the receiver of recasts, that is, whether the learner notices
the corrective force of the feedback move. Of course whether recasts are noticed is, at
least partly, contingent on the corrective intention of the interlocutor and the
characteristics of recasts, but it is not entirely so. Other factors may also contribute to
noticing such as the context where they are provided. For instance, recasts provided in
mechanical drills are more noticeable than recasts provided in communicative tasks
where the primary focus is on the exchange of information; recasts that target only one
structure throughout are more noticeable than recasts directed toward multiple structures.
The above caveats do not undermine the relative implicit nature of recasts, at least in
comparison with other corrective strategies such as metalinguistic feedback and explicit
correction (Long, Inagaki, & Ortega, 1998; Long, 2006; Lyster, 1998). In this study,
recasts have the following characteristics:
(1) They were provided in meaning-focused tasks where the target structures were

69

attended to in information exchange (details on treatment tasks are provided below).
(2) At the intra-utterance level, the recasts were mostly partial recasts reformulating the
errors related to the target structures. However, it was not always possible to isolate
the parts that contained the target structures, especially when local reformulation did
not lead to a meaningful utterance. Therefore, recasts that involved the reformulation
of the whole utterance were not rare in the dataset.
(3) At the inter-utterance level, aside from the utterances containing the target structures,
utterances that subsumed errors related to non-target structures were responded to
with recasts as well as other feedback types when the errors caused communication
breakdown or misinterpretation. Attending to forms other than the target forms helped
maintain the flow of communication and mask the linguistic foci.
(4) No linguistic or paralinguistic signals were utilized to convey the corrective intention
on the interlocutor’s part.
The following two episodes, which were extracted from the dataset of this study,
exemplify how recasts were provided.

Episode 1
*

NNS: wŏ zuótiān
我
昨天

wănshang zhĭ shuì
晚上
只
睡

*

wŭ gè xiăoshí
五 个 小时。

I
yesterday night
only sleep- [missing Perf] five-CL hour.
I only slept for five hours last night.
NS: zhĭ shuì le
wŭ gè xiăoshí
只 睡 了
五 个 小时。
Only sleep-Asp five-CL hour.
Only slept five hours.
NNS: shuì le

wŭ gè xiăoshí

70

睡 了
五 个 小时。
Sleep-Asp five-CL hour.
Slept five hours.
Episode 2
*

NNS: zhè gè
zhàopiàn shì
这 个
照片
是
This-CL [wrong] photo
be
This photo is two pigs.

NS:

*

liăng gè
zhū
两 个
猪。
two-CL [wrong] pig.

liăng tóu zhū
两
头 猪。
Two-CL pig.
Two pigs.

NNS: tóu zhū. zhū hĕn pàng
头 猪。猪 很
胖。
CL pig. Pig very fat [inappropriate word choice].
(Two) pigs. The pigs are fat.

NS (laughs): zhū hĕn féi dòngwù yīnggāi yòng féi
猪 很
肥。 动物
应该
用 肥。
Pig very obese. Animals should use obese.
The pigs are very obese. For animals, we should use obese.
In episode 1, the learner (NNS = nonnative speaker) failed to use the perfective –le
to mark the completed and bounded event slept for five hours. The interlocutor (NS =
native speaker) responded by reformulating the part that contained the error and adding
the aspect marker. The learner then repeated the reformulation and incorporated the
correct form in her utterance. Episode 2 is more complex: it has four moves and contains
several errors and corrections. In the first move, the learner’s utterance contains three
errors: the wrong classifier for photos, the inappropriate use of the be verb, and the
inappropriate use of gè as the classifier for pigs. In the second move, the native speaker
only reformulated the noun phrase headed by pigs and replaced the wrong classifier with

71

the correct one; the nontargetlike use of the be verb and the wrong use of the classifier
for photo (which is not on the list of target classifiers) were ignored. In the next utterance,
the learner repeated the correct classifier and the noun, followed by a descriptive
statement about the pigs in the photo where pàng (fat), a word used for human beings,
was used for animals. The error relates to pragmatics and the native speaker responded by
providing some metalinguistic explanation. This episode exemplified a recast on the use
of classifiers as well as an additional corrective move that was utilized for the sake of
natural communication and to hedge the linguistic focus.
Explicit Feedback
Carroll and Swain (1993) defined explicit feedback as “any feedback that overtly
states that a learner’s output was not part of the language-to-be-learned” (p.361). In their
study, explicit feedback includes explicit hypothesis rejection, where a learner was told
that she/he made a mistake followed by rule explanation, and explicit utterance rejection,
where the learner was simply told that she/he made a mistake. Ellis et al. (2006) stated
that explicit feedback can take two forms: metalinguistic feedback and explicit correction.
Metalinguistic feedback refers to the linguistic information on the well-formedness of the
learner’s utterance (Lyster & Ranta, 1997) such as in “You need past tense” (Ellis et al.,
2006). In this case, metalinguistic feedback is similar to Carroll and Swain’s explicit
hypothesis rejection. Explicit correction entails the message that the utterance is incorrect
followed by the provision of the correct form as in “No, not goed—went”. In Sheen’s
(2007a) and Li’s (2009) studies, explicit feedback referred to the combination of explicit
correction and metalinguistic feedback, that is, supplying the correct form followed by
explicit rule explanation.

72

It is obvious that explicit feedback, whatever form it takes, must include a beacon
message that unequivocally informs the learner about the ill-formededness of his or her
L2 production. The message can be conveyed by simply stating that a mistake occurred
and/or providing some sort of rule explanation, which can be brief or detailed. While it is
without doubt that all forms of explicit feedback discussed above are “explicit”, it is
worth noting that they might facilitate L2 learning in different ways because of the
different types of input they provide. Signaling the presence of a mistake and providing
metalinguistic comments constitute negative evidence, but obviously the two types of
negative evidence are different. Even when providing metalinguistic rule explanation,
there can be much variation: it can be very brief or very detailed—“You need present
perfect tense” vs. “You need past perfect tense because it is completed and has some
effect on the present”. Providing the correct form, on the other hand, constitutes positive
evidence and does not involve the retrieval and processing of previously acquired forms.
The point here is not to deny the legitimacy of the different ways to operationalize
explicit feedback, but to bring to light the fact that it is critical to realize the different
learning processes involved in varied forms of this feedback type.
Following Sheen (2007) and Li (2009), explicit feedback was operationalized as
metalinguistic correction, that is, the provision of the correct form followed by explicit
rule explanation. This operationalization is motivated by several reasons. First, as Sheen
pointed out, metalinguistic correction is potentially more effective than metalinguistic
feedback because of the availability of positive evidence in the former. Metalinguistic
feedback, which only contains some rule explanation, might facilitate the learning of
forms that are easily acquired, that do not involve complex rule explanation, and that

73

involve transparent complex form-meaning mapping (e.g. English regular past forms
(Ellis et al., 2006)). It may not work for forms that involve complex form-meaning
mapping and that may also require positive evidence in addition to negative evidence for
the development of linguistic competence (e.g., English question formation (Loewen &
Nabei, 2007)).
The two structures included in this study are Chinese classifiers and the Chinese
4

perfective –le. The metalinguistic explanation for classifiers is simple , but informing the
learner that a wrong classifier is used is unlikely to lead to the learner’s use of the correct
classifier if it is not part of his/her current interlanguage. This is because classifier use is
to a large extent exemplar-based and the rule that a classifier must be used between a
determiner and a noun only addresses the underuse, not the correct use, of classifiers. In
the case of the perfective –le, because of its complex form-meaning mapping and because
of the fact that it can appear in multiple positions of a sentence, the metalinguistic
comment that a –le should be used does not guarantee the correct use of the structure.
The second reason behind the inclusion of the correct form plus metalinguistic
explanation in the feedback type in question is that a major goal of this study is to explore
the effects of implicit and explicit feedback and their constraining factors. Combining
explicit correction and metalinguistic explanation, two explicit feedback types, would
make the resultant feedback more explicit, hence increasing the contrast in the implicitexplicit dichotomy. One might argue that the addition of more information in the
feedback or explicit feedback proper is likely to interrupt the flow of communication and
that explicit feedback may only result in the development of explicit knowledge.
However, as Ellis et al. (2006) pointed out, “the metalinguistic time-outs from

74

communicating afforded by explicit correction constitute a perfect context for melding
the conscious and unconscious processes involved in learning” (p.343). Ellis et al. also
demonstrated that explicit feedback in the form of metalinguistic feedback led to the
learning of implicit knowledge.
Episode 3 shows how explicit feedback was provided to a learner’s misuse of a
classifier. As shown, when saying that there is a cigarette on the table, the learner
misused the classifier for cigarettes. The native speaker reformulated the noun phrase
with the classifier, followed by the metalinguistic information. It should be noted that the
term “measure word” is used to refer to “classifier” in pedagogical Chinese grammar and
in Chinese classes, although, as previously discussed, measure words are different from
classifiers. To be consistent with the learners’ classroom language, in this study the term
“measure word”, rather than “classifier” was used in providing metalinguistic correction
on classifier use. It should also be noted that the explicit feedback was provided in
English to make it accessible to the learners and to prevent the possibility of the nonincorporation of the feedback as a result of the learners’ failure to comprehend the
information the feedback contains. This is especially true of the explicit feedback for the
perfective –le, which will be discussed in the next section.
Episode 3
*

NNS: zài zhuōzi shàng
在

桌子

上

On table-Prep

yī gè

yān

一 个

烟。

one-CL cigarette.

On the table there is a cigarette.
NS: yī zhī

yān. The measure word is zhī.

75

一 支

烟。 Provided feedback in English.

One-CL cigarette.
A cigarette.
NNS: yī zhī yān

xièxie

一 支 烟。 谢谢。
One-CL cigarette. Thanks.
A cigarette. Thanks.
While it was relatively easy to provide metalinguistic comments on the use of
classifiers, phrasing the metalinguistic information for the usage of the perfective (verbal)
–le posed a challenge. As discussed in the literature section, verbal –le is used in
completed, bounded situations. There are two ways to delimit a situation: One is through
the use of a number to atelic verbs (as in sleep for two hours, eat three apples) and the
other is through the use of telic verbs that encode instantaneity and that have a natural
endpoint (such as die, drop, fall, etc.). Based on Chinese pedagogical grammar (Li &
Thompson, 1981), the metalinguistic feedback for the verbal –le was provided in two
ways: one, for atelic verbs, is to inform the learner that –le is used with a number; the
other, for telic verbs, is to inform the learner that –le is used with instantaneous verbs.
Once again, in either situation, the metalinguistic information was provided in English to
make it accessible to the learner. The following two episodes illustrate the two situations
where metalinguistic correction was provided in response to the learners’ wrong use of
–le.
Episode 4
*

NNS: nóngfū zài zhāi lí,

yī gè

lí

diào

76

农夫 在 摘 梨。 一 个 梨
掉。
Farmer Prog-pick pear. one-CL pear drop-[missing Perf]
A farmer was picking pears. A pear dropped.
NS: diào le. You need to use a –le here because it is completed and the verb diào is
instantaneous.
掉了。Followed by feedback provided in English.
Drop-Perf.
Dropped.
Episode 5
*

NNS: qùnián wŏ zài nàlǐ gōngzuò
sān gè yuè.
去年 我 在 那里 工作
三 个
月。
Last year I at there work-[missing Perf] three-CL month.
Last year I worked there for three months.

NS: gōngzuò le sān gè

yuè. You should use a le because it is completed and there is
a number here.
工作
了 三 个
月。Followed by feedback in English.
Work-Perf three-CL month.

NNS: duì, le
对，了。
Yeah, le.
Yeah, (I should’ve used a) le.
In episode 4, the learner did not use –le with drop, a telic verb. The native speaker
corrected the mistake by adding –le, followed by the provision of the metalinguistic
explanation. In episode 5, the learner failed to use –le with an atelic situation that was
bounded through duration of time. The native speaker added the aspect marker in his
correction and the learner acknowledged the correction before the metalinguistic
information was supplied.
Target Structures
The two target structures are Chinese classifiers and the Chinese perfective –le. The
choice of these two structures is because they are different, they emerge early in learners’

77

interlanguage, and they pose challenges for learners at all stages of their study. As
outlined in the literature section, the two structures differ on various dimensions:
redundancy, saliency, form-meaning mapping, and explicit knowledge. These differences
are summarized in Table 4. Another consideration in target structure selection is the
amount of previous knowledge the learner has about the structure. It is speculated that
feedback works best for structures the learner already has some knowledge about but has
not yet fully mastered (Han, 2002; Mackey & Philp, 1998). Both Chinese classifiers and
the perfective –le appear in the textbooks during the learners’ first semester of study in
the two programs where this study was conducted. The instructors were also consulted to
make sure that the learners had some exposure to the two structures prior to the data
collection. Previous research (Wen, 1995; Zhang, 2005) also demonstrated that the two
structures appeared early in learner language.
Table 4. Chinese classifiers and the Chinese perfective -le
Dimensions
Redundancy

Chinese Classifiers

Perfective -le

 Not redundant—wrong
 Other features can be used
classifier use likely causes
to compensate for its
communication breakdown
absence

Saliency

 Salient

 Non-salient

Form-meaning mapping

 Transparent

 Opaque

Explicit knowledge

 Simple

 Complex

Learnability

 Relatively easy given
sufficient input

 Difficult—advanced
learners are not more
accurate than beginners

78

Despite the early emergence of the two structures in L2 Chinese learners’
interlanguage, they pose challenges for learners throughout their study. Wen (1995, 1997),
Yang et al. (1999, 2000), and Li (2009) found that advanced learners did not outperform
learners at beginning levels in their use and knowledge of the perfective –le or classifiers.
This happened for different reasons: the usage of –le was complex and the input
frequency of classifiers was low in textbooks for advanced learners. The fact that learners
at both beginning and advanced stages have difficulty learning the two structures serves
as another justification for selecting them for feedback treatment.
Recall that the perfective –le has two variants: the verbal –le and the sentence final –
le. They appear in different positions and have different interpretations. Due to the
sophisticated usage of this aspect marker and the limited amount of treatment each
learner received, this study only focused on the effects of feedback on the learning of the
verbal –le.
Tasks
Treatment Tasks for Classifiers
5

Two tasks were used where obligatory contexts for classifier use were provided .
The first task is called picture description in which the learner was asked to describe
seven pictures that contained 15 cases of classifier use (see Appendix A for a sample
picture). The pictures had different numbers of various objects (such as two trees, a river,
three horses, etc.) so that the learner would have to use classifiers when they described
the objects and reported how many of them there were. Distracter objects were included
in addition to the objects related to the use of the selected classifiers. A vocabulary list
(with Chinese characters, the Pinyin, and their English equivalents) was provided in each

79

picture that contained the nouns that accompany the classifiers the learner was expected
to produce. Providing the vocabulary list also facilitated the flow of communication,
especially for less advanced learners who did not have sufficient linguistic resources at
their disposal. Also, the learner was allowed to ask the native speaker researcher
vocabulary questions but not grammar questions. The sequence of the pictures was
randomized so that each learner described them in a different order. The native speaker
provided recasts or metalinguistic correction in response to the learner’s wrong classifier
use. The learners in the control group were asked to read a story about a Chinese idiom
shú néng shēng qiăo (Practice Makes Perfect) and retell the story by following some
clues. A vocabulary list was provided to assist the story retelling, which did not require
the use of the selected classifiers. No feedback was provided in the control condition.
At times, the native speaker must make conscious efforts to elicit the use of
classifiers. In situations where the learner did not describe certain objects as desired such
that the obligatory contexts for the use of the corresponding classifiers were not
established, the native speaker would ask questions such as zhàopiàn lĭ hái yŏu shénme?
( What else is in the picture?) to prompt the learner to talk about the objects related to the
target structure. There were also cases where the learner only pointed out that there was
something in a certain picture but did not state the quantity of the object, in which case
there was no context for the use of the classifier and the learner might have done so to
avoid using a classifier. The native speaker would then need to ask about the quantity so
as to construct the context. The following example illustrates.
Episode 6
NS: hái yŏu
还 有

shénme?
什么？

80

Still there be what?
What else is there?
NNS: mă
Horse.
马。
Horses.
NS: duōshăo
多少？
How many?
*

NNS: liăng gè
两 个？
Two-CL?
Two?

NS: liăng pĭ, liăng pĭ mă.
两 匹， 两 匹 马。
Two-CL, two-CL horse.
Two, two horses.
The second task is called spot the difference, where there were three sets of pictures.
Each set had two pictures that contained more or less the same items but the two pictures
were different in a number of aspects (see Appendix B for a sample picture used in this
task). The native speaker and the learner each held a picture, and the learner asked
questions to find out what the differences were. Completion of the task required the use
of the same 15 selected classifiers as appeared in task 1. As in task 1, the learner was
provided a vocabulary list for each picture and was allowed to ask vocabulary related
questions. The sequence of the three picture sets was randomized for each learner.
The selection of classifiers was based on the responses from 45 native speakers of
Chinese to a survey on classifier use. The survey serves two purposes. One is to select
appropriate “classifier + noun” combinations for treatment tasks. Although in most cases
of classifier use there is a one-to-one correspondence between a classifier and the

81

accompanying noun, there are situations where more than one classifier is compatible
with one object. For instance, there are two possible classifiers for dogs: zhī and tiáo. The
other purpose of the survey is to make sure that the selected special classifiers can not be
replaced by the general classifier gè. The general classifier can substitute for a special
classifier in many situations, which confuses L2 Chinese learners and which partly
explains why classifiers constitute a problematic structure.
The survey had 40 items, each providing a context for classifier use. For each item,
the respondent was asked to fill in the missing classifier and then decide whether the
classifier could be replaced by the general classifier. The surveyed classifiers were
mostly selected from the textbooks used in the Chinese programs contributing
participants for this study (Chou, Eagar, & Chiang, 1999; Zhang et al., 2002; Liu, Yao,
Bi, Ge, & Shi, 2009); some were from other commercial Chinese textbooks (Zhao, Li, &
Lin, 1999; Huang & Ao, 2002; Wu,, Yu, Zhang, & Tian, 2007) used in North America;
others were from Erbaugh’s list of core classifiers (1986) and the Chinese grammar book
by Li and Thompson (1981).The example below shows a sample item in the survey.

Example
房间里有四_______椅子。(There are four chairs in the room)
该量词可否用“个”代替？(Can the measure word be replaced by gè?)
A. 是

B. 否 (A. Yes

B. No)

The respondents were 45 Mandarin native speakers studying or working in the local
community where this study was conducted. Among them, there were three

82

undergraduate students, 20 graduate students, and 12 working at local companies or
government agencies. 20 of them had a bachelor’s degree, 19 had a master’s degree, and
6 had a doctoral degree. Their specializations were varied, including humanities, science,
and engineering. The average age was 32.08.
Altogether 15 cases of classifier use were selected out of the 40 surveyed items. In
order to be eligible to be included in the study, a classifier must reach an agreement rate
of 80% or higher among the respondents regarding the collocation of the classifier with
the accompanying noun and the insubstitudability of the general classifier for the special
one.
Treatment Tasks for the Perfective –le
Two tasks were used to elicit the production of the perfective –le: video narrative
and interview. In the video narrative task, the learner was asked to watch a 7-minute
video clip and tell what happened in the story. The video clip (with sound effects but no
words), which is called The Pear Film, was created by Chafe (1980) to elicit narrative
language samples. It started by showing a farmer picking pears. Then a boy came on a
bike and stole a basket of pears. He went through some adventures before the farmer
realized that the pears were missing (Erbaugh, 2001). The video is full of background and
foreground events and has been used in numerous previous studies to elicit Chinese
narratives (Christensen, 1994; Duff & Li, 2002; Yang, 2002) and investigate Chinese
aspectual marking.
The learner was required to follow some provided clues when they retold the story.
The clues are provided in English in the form of sentence fragments that contain
obligatory contexts for the use of the perfective –le. The Chinese equivalents (with

83

Pinyin) of some key words in the clues are also provided to minimize the difficulty the
learner is likely to encounter in finding the right vocabulary in the narrative. The learner
was asked to speak Chinese but was allowed to ask vocabulary questions in English. The
provision of clues serves two purposes. One is to free up the learner’s cognitive demands
from processing meaning so that linguistic forms can be attended to. VanPatten’s claim
(2002) that learners cannot process form and meaning simultaneously affords theoretical
support for this practice. Another rationale is that learners are likely to avoid producing
certain linguistic features if their knowledge about the features is incomplete and/or if
they are unable to accurately use the features in communication (Gass & Selinker, 2008).
Thus the requirement that the learner follow some clues that contain the target structure
prevents the potential problem of avoidance.
To establish the obligatory contexts for the use of the perfective –le, the scripts of
the oral narratives of the Pear Story from 40 native speakers of Chinese from Erbaugh,
(1986) and Christensen (1994) were examined. Identification of obligatory contexts was
also based on Li and Duff’s detailed description of the use of –le by 9 nonnative speakers
and 9 native speakers of Chinese in their narratives of the Pear Story (2002). After the
obligatory contexts were established, they were matched with the corresponding contexts
in the English scripts of the oral narratives of 20 native speakers of English from
Erbaugh’s study. The English clues are therefore from the speech data of native speakers
of English.
In Task 2, which is called Interview, the learner was asked to answer 16 questions
related to his/her recent experiences. The questions were written on flash cards in English
and the Chinese translations of one or two potential new words were provided for each

84

question. Asking the questions in English prevents the modeling of the target structure as
would happen if the questions were asked in the target language. The task was created to
increase the number of tokens and types of the target structure. Recall that the Chinese
perfective (verbal) –le is used in post-verbal positions in bounded situations;
boundedness is encoded by either the inherent features of verbs (e.g., achievements) or,
in the case of atelic verbs such as activity and accomplishment verbs, the addition of
some external devices such as expressions of duration. Examination of native speakers’
narratives of the Pear Story showed that the obligatory contexts for the use of the
perfective –le were unevenly distributed among different verb types, a large number of
which being verbs of achievements (such as “fall”, “appear”, “spill”, etc.). Task 2 was
therefore intended to supply more contexts for verb types other than achievement verbs
such as activity and accomplishment verbs.
While performing the two tasks, learners in the experimental groups were provided
with either explicit feedback in the form of metalinguistic correction or implicit feedback
in the form of recasts in response to their wrong use of the perfective -le. While the target
structure of these two tasks is the perfective –le, feedback was at times directed toward
errors related to other structures. In Task 2 (see the sample interview question below),
each interview question has at least two parts, one of which asks the learner about
information other than that involving the use of the target structure. These moves were
performed to minimize the learner’s awareness of the target structure of the study.
Learners in the control group were asked to answer some questions about their everyday
life such as what type of food they like, whether they have a pet and why, and so on.
Answers to these questions do not involve the use of the perfective –le and no feedback

85

was provided on any error. Learners in all groups were allowed to ask vocabulary related
questions at any time in performing the tasks.
Example Interview Question in Task 2
How long do you sleep everyday? How long did you sleep last night?
sleep 睡 shuì

last night 昨天晚上 zuó tiān wǎn shàng

Testing
Table 5 provides the information on the different measures used in the study and the
related descriptive statistics including the number of items, possible points, mean,
standard deviation, range, and reliability coefficient. These measures include a
proficiency test, tests of treatment effects (grammaticality judgment and elicited
imitation), a language analytic ability test (the Words in Sentences subtest of the MLAT),
and a working memory test. Details on these measures are elaborated on in the following
sub-sections.

86

a

Table 5. Measures and descriptive statistics
Measure

f

Items

Points

Mean

SD

Range

60

60

29.99

8.39

36.00

Perfective -le

15

15

5.05

2.09

9.00

N/A
0.63

Classifiers

15

15

5.78

1.26

6.00

0.74

Perfective -le

15

15

3.84

3.50

14.00

0.87

Classifiers

15

15

3.24

2.22

10.00

0.68

45

45

24.25

6.37

27.00

0.81

Reaction time

72

Ave

Plausibility judgment

72

Recall

72

Proficiency—HSK
Grammaticality judgment
Elicited imitation

b

Language analytic ability
c

Working memory

d

e

Reliability
g

3769.53

523.63 2701.17

0.98

72

63.64

5.27

30.00

0.80

72

50.79

9.84

43.00

0.89

Note. a. The results are based on the data contributed by all participants (n = 78).
b. Descriptive statistics related to the measures of treatment effects are based on all participants’ pretest scores, and the
information regarding different groups and their respective performances at different time points is presented in the results
section.
c. The working memory score for each participant is the average of the z scores related to the three components of the test. The
standard deviation of z scores is 1 and the mean is 0. Therefore, the descriptive statistics are computed for each component
instead of the global working memory score.
d. Reaction time is the average of the reaction times related to all 72 items.
e. Reaction time was recorded in millisecond.
f. Cronbach’s α is used as the reliability coefficient.
g. The test is maintained by Beijing Language and Culture University and reliability for the sample was not available.
87

Proficiency Test
In light of the fact that learners’ proficiency is an independent variable and that the
participants were recruited from different levels at two different institutions, a proficiency
test was used to measure their linguistic competence. The test is a revised version of the
HSK, a standardized test of Chinese as a foreign language sponsored by Beijing
Languages and Cultures University and recognized by the People’s Republic of China
and numerous countries worldwide. It has three sections: listening, reading, and grammar.
The test has three versions, which are for beginners, intermediate learners, and advanced
learners respectively. In this study, the HSK basic test is used, which is for learners with
100-800 hours of classroom instruction and an accumulated vocabulary of 400-3,000
characters. The revised HSK basic test used in this study consists of 60 items: 30, 20, and
10 for listening, grammar, and reading respectively. Each item is assigned 1 point, the
total score being 60. More weight was given to listening comprehension and grammar
than to reading comprehension to match the format of the interventional treatment, where
feedback was provided orally to errors in oral production. Learners were required to mark
all answers on an answer sheet. Table 6 illustrates the items in each part of the test.

88

Table 6. An illustration of the HSK test
No. of Items
Part

Item Description
Part

Listening

Choose a picture

Choose a response
Choose an answer

Grammar

(1) Listen to a statement describing a scenario
(2) Choose one from the four given pictures that matches the scenario

8

(1) Listen to a question
(2) Choose an appropriate response to the question

7

(1) Listen to a dialog or passage
(2) Listen to a question about the dialog
(3) Choose the right answer

30

15

Choose the right
sentence

(1) Read four structurally similar sentences
(2) Choose the one that is grammatically correct

Choose the right word

Reading

Total

(1) One word of each sentence is left out
(2) Choose the one that can complete the sentence

10

(1) Read a short passage and a long passage
(2) Choose the right answer to each question

10

Choose the right answer

89

10
20

10

Tests for Treatment Effects
Tests of implicit and explicit knowledge. In order to measure the effects of feedback,
two tests were used: elicited imitation (EI) and grammaticality judgment test (GJT).
Previous research demonstrated that the two tests tapped into different types of
knowledge: implicit knowledge and explicit knowledge (Erlam, 2006; Ellis, 2004, 2005,
2006; Ellis et al., 2006; Ellis, Loewen, Elder, Erlam, Philp, & Reinders, 2009). Implicit
knowledge is unconscious, easily accessible, procedural, and intuitive; explicit
knowledge is conscious, accessible through controlled processing, declarative, and
verbalizable. Ellis et al. (2006) summarized how tests of implicit and explicit knowledge
should be operationalized:

Tests of implicit knowledge need to elicit use of language where
the learners operate by feel, are pressured to perform in real time, are focused
on meaning, and have little need to draw on metalinguistic knowledge. In
contrast, tests of explicit knowledge need to elicit a test performance in which
the learners are encouraged to apply rules, are under no time pressure, are
consciously focused on form, and have a need to apply metalinguistic knowledge.
(p.354)

When measuring the effects of interventional treatment, SLA researchers have the
tendency to use tests that bias toward explicit knowledge and therefore fail to provide a
complete picture of learners’ improvement, or lack thereof, as a result of instruction.
Research syntheses on the effectiveness of (different types of) L2 instruction also

90

revealed that different test formats yielded different results with regard to magnitude of
effect (Li, 2010; Lyster & Saito, 2010; Norris & Ortega, 2000), which explains why there
has been a call to include tests that measure both implicit and explicit knowledge in
empirical research (Ellis et al., 2009).
Previous research validated elicited imitation as a test that measures implicit
knowledge (Erlam, 2006). In an elicited imitation test, the learner is asked to listen to
some statements on a range of topics. In each item, after listening to the statement, the
learner decides whether the statement is true or not true for him/her and whether he/she is
6

not sure . Following the decision, the learner is asked to repeat the sentence correctly
regardless of whether the statement is true or whether the learner is not sure. Erlam
argued that such a test taps into learners’ implicit knowledge because of the following
characteristics:
(1) The primary focus of the test is on meaning rather than form. The test is described as
a “survey questionnaire” which asks test takers for their opinion on statements
relating to their everyday life.
(2) Learners’ production is reconstructive in nature. Test takers are asked to repeat the
sentence in a correct way. The repetition is reconstructive rather than rote imitation
because there is a delay (distracter) between the presentation of the target stimulus
and the reproduction of the sentence. Also, there is no significant correlation between
length of the stimuli and success rate of repetition.
(3) Learners do not rely on explicit knowledge. Because of the spontaneity of learners’
production, they are unlikely to draw on their metalinguistic knowledge about the
target structure. It has been demonstrated (Ellis, 2005) that learners’ performance

91

on an EI test and their performance on spontaneous oral tests are highly correlated,
indicating that they tap into the same construct.
(4) Learners’ reproduction is an indication of internalization. If learners successfully
repeat (in the case of grammatical sentences) or repair a stimulus (in the case of
ungrammatical sentences), it is evidence that they have internalized the target
structure.
Grammaticality judgment tests (GJT) are generally believed to measure explicit
knowledge. However, it may not necessarily be so. As Ellis pointed out (2004, 2005) and
demonstrated, whether or not there is time constraint is an important factor in
determining what type of knowledge a GJT measures. Whereas untimed GJTs tap into
explicit knowledge, timed GJTs measure implicit knowledge. Under pressure, learners
tend to rely on their hunch when making a judgment about whether a sentence is
grammatical or not. Another issue is what type of GJT is used. Learners may be asked to
simply make a decision regarding the grammaticality of a sentence, to identify the error,
to correct the error, to state the rule, to indicate the degree of certainty regarding their
judgment, or any combination of them (Ellis, 2004). In this study, an untimed GJT is
used, which asked the learner to make a grammaticality judgment and locate and correct
the error (details are provided below).
While it would be ideal to develop tests that measure distinctly different types of
knowledge, it is not easy to do so. For instance, learners may still access explicit
knowledge when performing online tasks under time pressure. By the same token,
learners may recourse to their “feel of the language” when performing untimed offline
tasks. For instance, native speakers or learners in naturalistic settings may not have

92

access to explicit knowledge. Learners’ general proficiency may also affect the validity
of a test. Less advanced learners, for instance, may possess less implicit knowledge than
advanced learners regardless of how they are tested. The interface between knowledge
type and test type is therefore complicated and may not be as clear-cut as speculated.
However, although more research should be done to validate existing tests of the two
types of knowledge, it may at least be safe to argue that in general elicited imitation tests
tap into more implicit knowledge than explicit knowledge; and untimed GJTs are more
likely to measure explicit knowledge than timed GJTs.
Elicited imitation test. An EI test was used in both the experiment for classifiers and
the experiment for the perfective –le to measure learners’ implicit knowledge as a result
of the provision of feedback. The test is called “Survey Questionnaire” to suggest that the
objective is to elicit for learners’ opinion rather than measure their linguistic competence.
During the test, learners were asked to listen to some statements related to their everyday
life or their personal experience. The stimuli, which were read at normal speed by the
researcher and were recorded on an audio disc, were presented manually by the
researcher using a disc player. After learners heard each statement, the disc was paused to
allow them to decide whether it was true, not true, or whether they were not sure.
Learners were then asked to repeat each statement in correct Chinese. To prevent the
likelihood that learners (especially low-level learners) fail to understand and repeat a
sentence because of their lack of knowledge about the vocabulary rather than about the
target structures, annotation was provided for some key words in each statement. The
annotation includes the character(s), Pinyin transcript, and English explanation. Since the
purpose of the test was not to measure learners’ vocabulary knowledge but to measure

93

their ability to use the target structures, it was considered appropriate and necessary to
supply some vocabulary explanation. Below are two example EI test items:

Example EI item for classifier use
Script: 我家附近有一个小河。(Near my house, there is a river)
A. True
B. Not True C. Not Sure
( fùjìn 附近 nearby; hé 河 river)
Example EI item for the perfective –le
Script: 我昨天晚上睡 7 个小时。(I slept for seven hours last night)
A. True B. Not True C. Not Sure
(shuì 睡 sleep)
The EI test has three versions: pretest, immediate posttest, and delayed posttest. The
three versions contain the same target items but different distracter items. The test has 15
target items, 7 of which are grammatical and 8 are ungrammatical. Both the pretest and
the immediate posttest have 8 distracter items; 4 of them are grammatical and 4 are
ungrammatical. Thus, the pretest and the immediate posttest each has a total of 23 items,
11 being grammatical and 12 ungrammatical. Among the 23 items, 15 are target items
and 8 are distracters. The delayed posttests for classifiers and the perfective –le were
administered in the same session and were therefore combined. The combined test has a
total of 40 items, of which 15 relate to classifiers, 15 to the perfective –le, and 10 are
distracters. The target items in the three versions of the EI test were randomized, so the
order in which the items appeared was different in each version. The item randomization,
combined with the fact that different distracter items were included in each version, was

94

intended to prevent the realization on the learner’s part that the target items were the
same across tests. During the exit interview (details of which will be provided later) after
the whole project was finished, learners indicated that some items in different tests were
similar but they were not exactly the same.
The 15 target items in the test for classifier use measure the 15 classifier uses
involved in the treatment tasks. The obligatory contexts for classifier use in the treatment
tasks (picture description and spot the differences) are the same as those in the test items.
In other words, the same classifiers and their accompanying nouns were targeted in the
treatment tasks and the test. Ungrammatical sentence stimuli were created by deleting a
classifier or substituting the general classifier gè or a wrong classifier for a correct
classifier. As to the 15 target items in the EI test for the perfective –le, the verb in each
item also appeared in the treatment tasks (video narrative and interview). Care was taken
to make sure that the verb types (activity, achievement, and accomplishment) used with –
7

le were evenly distributed among the 15 items (5 for each of the three verb types) .
Ungrammatical sentences were created by omitting –le where it should have been used.
Grammaticality judgment test. Grammaticality judgment tests were used to measure
learners’ explicit knowledge about the target structures. Unlike in some previous studies
where learners were only asked to judge whether a certain item was grammatical or
ungrammatical, in this study learners were asked to judge whether a sentence was
grammatical or ungrammatical or whether they were not sure. In cases where learners
judged a sentence to be ungrammatical, they were asked to locate the error and correct it.
Adding the choice of “not sure” or avoiding a binary choice minimizes the chances for
random guesses on the learner’s part and increases test validity. The addition of the

95

choice indicating the learner’s uncertainty proved to be necessary because in coding the
data, it was noticed that all the learners chose the option of “not sure” at least once on the
GJTs. The decision to ask the learner to locate and correct the error when a sentence is
ungrammatical is motivated by the speculation that the learner might judge the sentence
to be ungrammatical without knowing precisely what the error is or even if the error is
identified, how to correct it (Mackey & Gass, 2005). The data coders found that this was
indeed the case—it was very common that learners made the right judgment in indicating
that a sentence was ungrammatical when it was ungrammatical; but they either corrected
a non-target structure or failed to provide the correct form for the wrong target structure.
As with the EI test, the GJT has three versions, a pretest, an immediate posttest, and
a delayed posttest. The test has 15 target items and varying numbers of distracting items
depending on the timing of the test. Among the 15 target items, 8 are ungrammatical and
7 are grammatical. The sentence stimuli in the GJT are different from those in the EI test
except for the obligatory contexts for the use of the target structures, which involve
classifiers and their accompanying nouns and the target verbs extracted from the
treatment tasks for the perfective –le. The pretests for classifiers and the perfective –le
were combined and were taken in the same session as the proficiency test. The combined
test contains 15 target items for each target structure and 5 distracters, totaling 35 items.
In the immediate posttest for each target structure, there are 23 items, out of which 8 are
distracters. The delayed posttests for both target structures were merged and the resultant
test had a total of 40 items, out of which 30 were target items and 10 were distracters.
Different tests included a different set of distracters, which concern structures other than
the target structures such as the ba- structure, word order, and so on. Vocabulary

96

annotation was provided and learners were allowed to ask vocabulary related questions.
For each item, Pinyin is provided for each character. To avoid providing hints on how the
characters in the sentence should be clustered or combined to form words, which is
especially in the favor of less advanced learners, all characters and their corresponding
Pinyin representations are equally spaced instead of being arranged in word units. Each
test has 6 practice items modeling ways to correct mistakes in ungrammatical sentences,
such as addition, deletion, replacement, and relocation. Since the study concerns the
effectiveness of oral feedback and the test is not intended to measure character writing,
corrections in Pinyin (Romanized orthographic system of Chinese characters) are
acceptable. There was no time limit for the GJT. Here are two example GJT items:
Example GJT Item for -le
wǒ zài lù shang zǒu de shí hòu ， kàn dào yǒu yī gè rén de qián bāo diào
我 在 路 上

走

的 时 候， 看 到

有 一个 人 的

钱

包

掉。

[While I was walking, I saw someone’s wallet drop (The translation was not provided in
actual testing)]
( 钱包 qiánbāo wallet; 掉 diào fall)
A. Grammatical

B. Ungrammatical

C. Not sure

Example GJT Item for Classifiers
jīn tiān wǒ de yóu xiāng lǐ yǒu sān fēng xìn
今 天 我 的 邮

箱 里 有 三

封 信。（邮箱 yóuxiāng mailbox）

[Today my mailbox had three letters]
A. Grammatical

B. Ungrammatical

C. Not sure

97

Validity and reliability of the GJT and EI tests. In order to ensure test validity, that is,
the tests measure what they are intended to measure, all target items included in the GJT
and EI tests items for both target structures were piloted with native speakers of
Mandarin Chinese. To select 60 test items (15 for two target structures and two test
formats), a pool of 150 piloting sentences were created, most of which were revised from
sentences in the learners’ textbooks and other commercial Chinese learning materials.
The sentences involved the two target structures as well as other structures which are
considered to be problematic to L2 Chinese learners such as the ba (把) structure,
negation, sentence final question particles, and so on. The items were given to 15 native
speakers who were told to judge whether a certain item was correct and correct the error
if it was incorrect. The 15 L1 Chinese speakers were from different professions (graduate
student, journalist, teacher of Chinese as a foreign language, and civil servant) in the U.S.
and China, and they held at least a bachelor’s degree. In order to be eligible to be
included in the tests, an item must receive unanimous judgment in terms of its
grammaticality and if it is ungrammatical, how it should be corrected.
Test of Language Analytic Ability
Learners’ language analytic ability was tested using the Words and Sentences
subtest of the MLAT (Carroll & Sapon, 1959, 2002), a most widely used aptitude test in
SLA research. The subtest is used to measure language learners’ sensitivity to
grammatical structures or the “ability to handle the grammatical aspects of a foreign
language” (Carroll & Sapon, 2002, p.3). In each item, learners are provided with a key
sentence where a certain part is underlined and one or more comparison sentences with
five underlined parts. Learners choose the one part in the sentence(s) that matches the

98

function of the designated part in the key sentence. The test has 45 items and learners are
required to complete it within 15 minutes. One point is assigned for each item, so the
total score of this test is 45.
Test of Working Memory
Learners’ working memory capacity is measured using a listening span test. The
rationale behind the decision to use a listening span rather than a reading span test is that
the interventional treatment of this study involves oral feedback, which does not draw on
learners’ ability to store and process visual stimuli. The test was created by using the
stimuli from Waters and Caplan (1996). There are 72 sentences divided into 4 sets of
sentences at span sizes 3, 4, 5, and 6. The sentence stimuli have the following structures:

It was the woman that ate the apple.

(cleft subject: CS)

It was the damaged car that the mechanic fixed.

(cleft object: CO)

The police arrested the man that punched his dog.

(object-subject: OS)

The story that the man told amused the audience. (subject-object: SO)

These sentences differ in number of propositions and syntactic complexity. CS and CO
sentences have one proposition, but OS and SO sentences have two. CS and OS sentences
involve canonical assignment of thematic roles (Agent + Theme) and are therefore easier
to process than CO and SO sentences. Half of the sentences have verbs that require
animate subjects and half have verbs that require inanimate subjects. Half of the
sentences are plausible and half are implausible. Implausible sentences are constructed by
“inverting the animacy of the subject and object noun phrases” (Waters & Caplan, p.55)

99

(e.g., “It was the dissertation proposal that defended the man”). All four included
sentence types (CS, CO, OS, and SO) and two plausibility possibilities (“Good” or “Bad”)
are evenly distributed among the test stimuli. In each set, there are a mixture of sentences
with different structures and plausibility possibilities. The sequence in which sentence
sets of different span sizes is presented is randomized. All stimuli are read by a native
speaker of English who holds a master’s degree in education. The test is created by using
DMDX, free software used in psycholinguistic studies to measure reaction time when
visual and auditory stimuli are responded to.
During the test, the learner listens to each sentence in a certain set and decides
whether it is plausible, that is, whether it is about something that could happen in the real
world. When the whole set is finished, there is a pause; the learner recalls the final word
of each sentence in that set and writes down the words on a blank sheet before starting
the next set. Recalling is not subject to time constraint and the sequence in which words
are recalled is not taken into consideration in scoring (and learners were so informed).
Before responding to test stimuli, the learner is exposed to eight practice items. Reaction
time, plausibility judgment, and recall accuracy scores are all recorded and the learner is
informed that all three components are equally important. Unlike some previous studies
that only include recall scores, this study also includes reaction time and plausibility
scores because WM capacity should involve both the processing and storage functions
and because previous studies (Waters and Caplan, 1996; Leeser, 2007) showed that
learners traded off between different components, that is, they sacrifice one component
for a better performance in another (such as when learners process slower to achieve
more accuracy in word recall).

100

Procedure
The study has four sessions on four separate days. In session 1, the learner filled out
a background questionnaire and took a proficiency test (HSK), which was followed by a
GJT pretest with items targeting both classifier use and items that involve the use of the
perfective –le. Students’ performance scores on the proficiency test were used for group
assignment. The combined GJT pretest was used to provide baseline data to detect
treatment effects and for screening purposes: students who scored over 75% on items
related to a target structure were considered overqualified (based on the speculation that
there would be ceiling effects or that there would not be sufficient room for improvement)
and were excluded from the study thereafter. The proficiency test lasted 50 minutes, and
there was no time limit for the GJT. The time each learner took to complete the GJT
varied from 20 to 30 minutes. At the end of session 1, the participant was asked to
schedule the remaining three sessions in such a way that sessions 2 and 3 happen on two
consecutive days and session 4 happen one week after session 3.
In sessions 2 and 3, the learner received feedback (implicit or explicit) on their
erroneous use of the target structure (classifiers or perfective –le). A learner that was
assigned to a certain feedback condition received that type of feedback in both sessions 2
and 3 on two consecutive days. For instance, a learner in the implicit group received
recasts on his/her non-target-like use of classifiers and the perfective –le respectively in
the two sessions. The same principle applied to learners in the explicit condition. The
order in which the two tretment tasks (for either classifier use or the perfective –le) were
completed was randomized. Prior to the treatment tasks, an elicited imitation (EI) test
was administered, which served as a pretest. The EI test, which has 23 items, took around

101

10 minutes to complete. The treatment tasks lasted around 40 minutes. After the
instructional treatment, the learner took the EI test and the GJT, which served as
immediate posttests. It must be pointed out that the EI test always preceded the GJT to
minimize the potential modeling effect of the written test on the oral test. The GJT, which
has 23 items, took about 15 minutes. Each of the treatment sessions lasted approximately
80-90 minutes. It is to be noted that the order in which a learner participated in the two
treatment sessions was randomized, that is, half of the learners participated the classifier
session before participating the session on the perfective –le and half attended the session
on –le before the session on classifiers.
During the final session (seven days after session 3), the learner took a delayed EI
test (about 20 minutes) and GJT (both tests containing items for both target structures)
(20 minutes), the test of language analytic ability (Part IV of the MLAT), and the
working memory test (15 minutes). Finally, the learner participated in a semi-structured
exit interview asking about how she/he felt about the study and whether she/he
recognized the objectives of the study. Table 7 illustrates the procedure of the study.

102

Table 7. Procedure of the study
*

Session 1

Tasks

Session 3 (-le)

Session 2 (classifier)
Duration

Tasks

Duration

Tasks

Duration

Session 4

Tasks

Duration

 Proficiency test

50 min

 EI pretest

10 min

 EI pretest

10 min

 EI posttest 2

20 min

 GJT pretest

25 min

 Treatment tasks
 Picture description
 Spot the difference

40 min

 Treatment tasks
 Video narrative
 Interview

40 min

 GJT posttest 2

25 min

 EI posttest 1

10 min

 EI posttest 1

10 min

 Aptitude test

15 min

 GJT posttest 1

15 min

 GJT pposttest 1

15 min

 WM test

15 min

 Exit interview

5 min

*

Note. The order in which the leaner participated in the two treatment sessions was randomized, and so was the sequence for treatment
tasks.

103

Scoring and Coding
GJTs and EI Tests
GJTs. The GJTs used in this study asked the learner to judge whether a sentence is
grammatical or ungrammatical or whether he/she is not sure and then to correct the error
if it is ungrammatical. The availability of multiple options and the obligation to correct
errors in the case of an ungrammatical sentence led to a variety of possibilities in
responding to each test item. A complete list of possible responses is shown in Table 8.
However, some further elaboration is in order regarding the scoring scheme:
 If a sentence is grammatical but was judged to be ungrammatical, the answer received
1 point if a non-target structure was changed. But the answer received a zero if a
change was made to the target structure such that the sentence became ungrammatical.
For instance, a correct classifier was replaced by a wrong one or deleting the perfective
–le.
 If a sentence is ungrammatical, it was judged to be ungrammatical, and the error was
corrected, the answer received 1 point. However, if judgment was correct but a change
was made to a non-target structure, no credit was given. In cases where the learner
recognized the error (such as by marking the error) but failed to correct it, the answer
received no point.
 One might question whether it is reasonable to give credit in cases where a
grammatical sentence was considered ungrammatical but a change was made to a nontarget structure and to give no credit in cases where an ungrammatical sentence was
regarded as being ungrammatical but a change was made to a non-target structure (See
Mackey & Gass, 2005 for further discussion on the scoring of GJT tests). During the

104

exit interview, all learners indicated that when they performed a correction about a
certain part of a sentence, they believed that the rest of the sentence was correct. In
other words, for a grammatical sentence where the target structure is correctly used, if a
learner changed a part other than the target structure, it should be assumed that the
learner did not believe that the use of the target structure was problematic. Therefore it
can be concluded that he/she had the knowledge about how the target structure is
correctly used. By the same token, for an ungrammatical sentence where the target
structure is wrongly used, if the learner corrected a part other than the target structure,
it means that she/he did not have the knowledge about the structure even if she/he
made the correct judgment (ungrammatical judged as ungrammatical).
In addition to a generic coding and scoring scheme, additional criteria were
established for the test data related to each of the target structures because of their
idiosyncratic linguistic features. Recall that the perfective –le is used in bounded
situations and boundedness is encoded through numerical expressions (indicating
temporality or quantity) with atelic verbs (describing actions without a natural endpoint
such as “study”) or through the inherent instantaneous nature of telic verbs (describing
actions with a natural endpoint such as “fall”). Therefore, in the case of atelic verbs, the
correct use of this structure can be illustrated in this formula:

(1) Vatelic + le + Numeric Expression [+ Object (if the verb is transitive)]
In the case of telic verbs, a numeric expression is unnecessary because boundedness is
expressed by the verb itself. Hence this formula:
(2) Vtelic + le

105

Table 8. Coding and scoring of GJTs
Stimuli
Grammatical

Learner’s Judgment

Learner’s Correction

Score

No correction

1

Ungrammatical

Corrected a non-target structure

1

Ungrammatical

Replaced the correct structure with a wrong one

0

Not sure
Ungrammatical

Grammatical

No correction

0

Ungrammatical

Corrected the target structure

1

Ungrammatical

Corrected a non-target structure

0

Grammatical

No correction

0

Ungrammatical

Marked the error and/or made a wrong correction

0

Not sure

No correction or corrected a non-target structure

0

106

In cases involving formula (2), correctness can be easily determined by the absence
or presence of –le; cases involving formula (1) are more complex. When performing
corrections for these sentences, some learners showed the patterns as listed in Table 9,
making it challenging to score. The patterns are based on the following sentence:

zuótiān wǒ xué le
sān gè xiǎoshí
昨天
我 学 了
三 个 小时
Yesterday I study-Asp three-CL hours
Yesterday I studied Chinese for three hours.

zhōngwén
中文。
Chinese

These corrected sentences are still problematic after learners’ modifications, but half
a point was assigned based on the following rationale. Although the sentences are
ungrammatical, but as far as the target structure is concerned, the obligatory contexts
were established and the wrong modifications do not seem to result from a lack of
knowledge about the target structure. Problems in these cases mostly seem to relate to
sentence order, which is of no surprise given the cross-linguistic difference between
English and Chinese in this regard, especially in the location of temporal expressions.
Another problem lies with the use of an additional –le (3 and 4), which might result from
learners’ knowledge that there are two –les in Chinese: a verbal –le and a sentence-final –
le. The additional –le (presumably sentence-final –le) in (3) might have been accidentally
placed before the final word of the sentence. Regardless, it was considered appropriate to
assign partial credit to these cases because the obligatory contexts were created, the
morpheme was supplied, and the problems were not caused by the lack of knowledge
about the target structure.

107

Table 9. Additional criteria regarding GJT data on -le
Sentences after Correction

Problem

*

The temporal expression “three hours” does not follow –
le.

*

The temporal expression precedes the verb “study”.

*

An additional –le is used, which is after the temporal
expression “three hours”.

*

An additional –le is used at the end of the sentence.

*

The object “Chinese” precedes the verb “study”.

*

-le does not follow the verb.

1. zuótiān wǒ xuéle
zhōngwén sāngèxiǎoshí
Yesterday I study-Asp Chinese
three hours
Yesterday I studied Chinese for three hours
2. zuótiān
wǒ sāngèxiǎoshí xuéle
zhōngwén
Yesterday I three hours study-Asp Chinese
Yesterday I studied Chinese for three hours
3. zuótiān
wǒ xuéle
sāngèxiǎoshíle zhōngwén
Yesterday I study-Asp three hours-Asp Chinese
Yesterday I studied Chinese for three hours
4. zuótiān wǒ xuéle
zhōngwén sāngèxiǎoshíle
Yesterday I study-Asp Chinese
three hours-Asp
Yesterday I studied Chinese for three hours
5. zuótiān wǒ zhōngwén xuéle
sāngèxiǎoshí
Yesterday I Chinese study-Asp
three hours
Yesterday I studied Chinese for three hours
sāngèxiǎoshí
6. zuótiān wǒ xuézhōngwénle
Yesterday I study Chinese-Asp three hours
Yesterday I studied Chinese for three hours

108

Whereas the difficulty in scoring the GJT data concerning the perfective –le is
attributable to the complexity of the rules governing the use of the morpheme, a different
set of problems arose in scoring the test data on classifier use. The problems mainly lie in
learners’ difficulty in spelling out classifiers using the Romanized Pinyin system
(including tones) and in writing characters. Additional scoring criteria were created to
code with these problems. Among the following listed cases, the first four received half a
point and the last one received a full point:
(1) The correct Pinyin for a classifier is provided but with a wrong tone (e.g., “zhì” for
“zhī”).
(2) The correct Pinyin for a classifier is provided but without a tone (e.g. “tou” for “tóu”);
(3) A Pinyin is provided that differs from the correct Pinyin by one sound but that is close
enough to the correct pronunciation to allow the reader to pinpoint the corresponding
character (such as replacing a sound with one that involves the same place of
articulation (“chī” for “zhī”) or adding a nasal (“bǎn” for “bǎ”).
(4) Providing a character with the same Pinyin as the right classifier but with a different
tone (e.g., “跳 tiào” for “条 tiáo”);
(5) Providing a character that is a homophone of the required classifier (“风 fēng” for “封
fēng”).
EI tests. Unlike GJTs, which are written, visual, and without time constraint, EI tests
are aural, involve oral reconstruction, and are taken under time pressure. The scoring of
EI tests therefore involves different criteria. Full credit was given to cases where the
target structure was supplied in obligatory contexts. This would mean that no credit was
given if the target structure was supplied but the context for the use of the structure was

109

not established; it also means that scoring only focused on the use of the target structure
and the rest of a reproduced sentence was ignored. Also, the purpose of an EI test is to
measure learner’s implicit knowledge, which is unconscious and automatic. Therefore
cases containing self-correction, which shows the learner’s conscious processing of the
target structures, did not receive credit. These generic rules, of course, did not suffice to
account for all the varied responses to the provided stimuli. The examples shown in Table
9 are representative of special cases in the EI test data.
In example (1) of Table 10, the noun phrase qúnzi was mispronounced as kūnzi
(probably because of the absence of the consonant /q/ in English), but the error was
ignored in scoring because the correct classifier was produced and the error was
committed on a non-target structure. Examples (2), (3), (4), and (5) all involve selfcorrection, but in (2) and (3), the first attempts are erroneous but the second attempts are
correct; in (4) and (5), the learners were correct at first but then they changed the
targetlike uses into nontargetlike uses. In either case, no credit was given. In examples (6)
and (7), although the produced sentences sound awkward, the perfective –le was used and
the obligatory contexts were established containing activity verbs and bounding
devices—temporal expressions. Therefore partial credit was given. In contrast, (8) and (9)
did not receive any credit even though the target structures were produced because the
obligatory contexts were not established. In (8), the noun that the classifier accompanies
was not provided; in (9), there is no bounding device (temporal expression) to necessitate
the use of –le.

110

Table 10. Scoring of EI Data
Category

Example

Score

Problem with nontarget structure

(1) liăng tiáo kūn [the correct pronunciation is qún] zi
两 条 昆 [裙]子。
two-CL skirt
Two skirts.

1

Self-correction

From wrong to correct:

0

(2) wǒ měi gè yuè
xiě
yī gè yī zhāng zhīpiào jiāo fāngzū
我 每 个月
写
一 个 一 张
支票
交
房租。
I every month write one-CL one-CL check pay rent
Every month, I write a check to pay my rent.
(3) zuótiān wănshang wŏ shuì shuìle
qī gè
xiăoshí
昨天
晚上
我 睡， 睡了
七个
小时。
Yesterday night
I sleep, sleep-Perf seven-CL hour.
Last night I slept seven hours.
0

From correct to wrong:
(4) qùnián xuéle
liǎng gè yuè zhōngwén
去年
学了
两 个 月
中文，
Last year study-Asp two-CL month Chinese,
Last year [I] studied Chinese for two months.
(5) wǒjiā
qiánmiàn
我家
前面
my home in front of

yǒu
liǎngkē
有
两棵，
there be two-CL
111

xué liǎng gè yuè zhōngwén
学 两 个 月
中文
study two-CL month Chinese.

liǎngshù
两树
two trees.

Table 10 (cont’d)
In front of my home, there are two trees.
Context established

(6) zhù le
wŏde péngyou jiā yī gè xīngqī
住 了
我的
朋友
家
一 个 星期。
Live-Perf my
friend home one-CL week.
[I] lived in my friend’s house for a week.

.5

(7) xué zhōngwén le liǎng gè yuè
学
中文
了 两 个 月。
study Chinese-Asp two-CL months.
[I] studied Chinese for two months.
Context not established

(8) wŏ mĕitiān wŭ zhī
我 每天
五 支。
I everyday five-CL
Everyday I [smoke] five [cigarettes].
(9) zài bĕijīng xuéle
zhōngwén
在 北京
学了
中文。
in Beijing study-Asp Chinese.
[I] Studied Chinese in Beijing [for two months].

112

0

Inter-coder reliability. All data were coded by two native speakers of Mandarin
Chinese: the researcher and an experienced instructor of Chinese as a heritage language.
At the time of data collection, the researcher has a master’s degree in linguistics and is an
ABD in second language acquisition. The Chinese instructor has a bachelor’s degree in
ESL. Both coders have many years of experience teaching ESL and Chinese.
Altogether four rounds of coding were performed. Initially the two coders coded
10% of the test data and created a coding scheme after extensive and intensive discussion.
The data subjected to initial coding include data related to pretests, immediate posttests,
and delayed posttests. Following the scheme both coders agreed upon, the two coders
coded all test data, which involved 7,020 codes for the GJT and EI tests on each of the
target structure. The two coders then checked all the codes once again to make
sure that their coding was accurate and consistent. The agreement rate for GJT codes is
98.3%, and for EI codes it is 97.6%. In the final round of coding, the two coders carefully
examined the codes they had disparity on and resolved the differences after detailed
discussion. For the EI data, the two coders transcribed all the responses verbatim,
compared their transcripts, and resolved the differences prior to scoring.
The Working Memory Test
During the working memory test, learners were asked to listen to 72 sentence stimuli
divided into 4 span sizes (3, 4, 5, and 6) and three sets at each span size, decide whether
each sentence makes sense, and recall the last word of each sentence after listening to all
stimuli in a certain set. Half of the 72 sentences are plausible and half are implausible. A
WM score for each learner has three components: plausibility judgment, reaction time,
and recall accuracy. The raw score for plausibility judgment is 72, with 1 point assigned

113

for each correct judgment. Reaction time was only calculated for correctly judged items.
The full score for recall accuracy is also 72, with each accurately recalled sentence-final
word receiving one point. There was no penalty for errors related to inflectional
morphemes (such as “worked” recalled as “work”) when recalled words were scored.
Analysis
This study investigates whether the effects of implicit and explicit feedback are
constrained by learners’ proficiency, the choice of target structure, and learners’
individual differences in language analytic ability and working memory capacity. To
answer the question of whether the two types of feedback work differently for high and
low learners in the learning of the perfective –le, mixed design repeated measure
ANOVAs were performed separately for data generated by the GJT and EI tests. The
within-group variable is the timing of tests (pretest, posttest 1, and posttest 2), and the
between-group variables are feedback type (implicit, explicit, and control) and
proficiency (high and low). Subsequently, one-way ANOVAs and post hoc contrasts
were conducted on gain scores to detect group differences. The same analytic procedures
were repeated for the data on classifiers.
Prior to the statistical analyses, different tests were conducted to investigate the
assumptions of parametric statistics. Table 11 displays the results of Shapiro-Wilk’s tests
of normality regarding the performance scores on both the GJT and EI tests of each group
as defined by feedback type, proficiency, and timing of posttests. As shown, among the
72 group scores, 63 are normally distributed. The Mauchly’s test was performed for each
repeated measure analysis and the results showed that the assumption of sphericity was
not violated. Levene statistic was examined for follow-up ANOVAs, and it was found

114

that the assumption of homogeneity of variances was met.
Table 11. Tests of normality

Classifiers

Perfective -le
Test

Proficiency

Group

.94

.96

.93

.87

.96

.85

.77

*

.92

.88

.79

*

.92

.75

Implicit

.96

.90

.90

.91

.95

.93

.91

.89

.92

.87

.92

.90

.83

*

.94

.98

.91

.95

.91

Implicit

.94

.88

.91

.87

*

.92

.89

.88

.94

.95

.88

.90

.85

Control

.71

*

.88

.90

.92

.98

.93

Implicit

.89

.88

.92

.95

.94

.94

Explicit

.85

*

.91

.87

.90

.94

.89

Control

High

Post 2
.91

Explicit

Low

Post 1
.89

Control
EI

Pretest
.91

Explicit

High

Implicit

Post 2
.97

Control

Low

Post 1
.91

Explicit

GJT

Pretest
.92

.89

.96

.91

.99

.91

.96

*
*

*

*

Note. The significance value is below .05, which means that the scores related to the
condition are not normally distributed.
In addition to using p values to determine whether group differences were
significant, effect sizes (Cohen’s d) were calculated to explore if the effects of feedback
were different across different test formats and target structures. While a p is useful in
deciding whether to reject or accept a null hypothesis, it provides no information on the
magnitude of an effect or relationship. The effect size, in contrast, indicates “the

115

magnitude of an observed difference between two groups in standard deviation units”
(Norris & Ortega, p. 442). Cohen’s d, one of the most commonly used effect size indexes
for group differences, is calculated through dividing mean difference by pooled standard
deviation (which takes into account sample sizes and standard deviations of both groups
involved). An effect size of 0.2 is small, 0.5 is a medium effect, and 0.8 suggests a large
effect. Examining effect sizes makes it possible to examine the effect of a certain
instructional intervention across different conditions such as the effects of feedback on
the learning of different target structures.
Pearson’s correlation analyses were used to probe into the relationship between
feedback type and learners’ individual differences in language analytic ability and
working memory capacity. Instead of data from all participants in the study, included in
th

the correlation analyses were only the data from learners who were in their 4 semester
of study. Recall that the participants were at different stages of their study at the time of
th

data collection. Among the 78 recruited participants, 41 were in their 4 semester of
th

th

study and 37 were in their 6 and 8 semesters of study. Performing correlation analyses
on all participants would be less ideal because of the heterogeneity among them in terms
of the amount of prior instruction the learners received, which might to some extent mask
the relationship between aptitude components and treatment effects. This is because any
relationship between aptitude and learning has to be interpreted as follows: given the
same amount of instruction, learners with higher aptitude (or higher ability in a certain
aptitude component) achieve more or progress at a faster rate. Therefore, to have a clearer
picture of the role of aptitude in learning, the more dimensions learners are comparable

116

on, the more reliable the results are. As far as this study is concerned, it would be ideal to
conduct separate correlation analyses on learners at different levels and with similar
amount of prior instruction. However, the number of learners from higher level classes (n
= 20 and n = 17 including those assigned to the control groups) is too small for
correlation analyses, hence the decision to conduct the analyses only with the learners in
th

their 4 semester of study.

117

CHAPTER 4 RESULTS
Chapter 3 detailed the methodology of the study including participant information,
feedback operationalization, the target structures, tasks, testing, procedure, scoring,
coding, and analytic procedures. This chapter presents the results and summarizes the
nd

results by answering the research questions advanced at the end of the 2

chapter.

Analyses regarding treatment effects are conducted by target structure (perfective -le and
classifiers) and test (GJT and EI) and results will be presented accordingly. For the GJT
and EI data on each target structure, descriptive statistics will be presented regarding the
means and standard deviations of the three involved groups: implicit, explicit, and control.
These are followed by the results from the repeated measure analyses on how the effects
of feedback type are mediated by proficiency. Mean gain scores and standard deviations
of each feedback group at each proficiency level will be calculated, and post hoc group
contrasts will be conducted on the gain scores. Results from the two test formats will be
compared using effect sizes to explore whether feedback contributes more to the
acquisition of implicit knowledge or explicit knowledge in the learning of each target
structure. Effect sizes will also be used to determine whether feedback affects the
learning of the two target structures differently. As to the results pertaining to the two
aptitude components, separate correlation analyses are conducted on the gain scores of
each feedback group and learners’ performance scores on the MLAT subtest and the
working memory test. Descriptive statistics will also be presented.
Results on the Perfective -le
GJT Results
Table 12 shows the descriptive statistics of the GJT scores on the perfective -le

118

including means and standard deviations of pretest and posttest scores of each group. The
changing patterns of the pretest and posttest scores of the three groups are shown in
Figure 4. A one-way ANOVA analysis was performed on the pretest scores of the three
groups and no significant difference existed between the three groups, F (2, 77) = 1.66, p
= 0.2. The mean scores of all three groups increased over time, and the scores of the
explicit group appeared to have dropped the most from time 2 to time 3.
Table 12. Perfective –le: Descriptive statistics on GJT scores
Pretest
Condition

n

Implicit

Posttest 1

Posttest 2

M

SD

M

SD

M

SD

28

6.52

1.68

8.75

2.27

9.75

2.91

Explicit

29

5.56

2.03

11.65

1.74

9.94

2.99

Control

21

5.74

2.58

6.98

2.93

7.62

2.75

15
13
11
Implicit

9

Explicit
7

Control

5
3
1
1

2

3

Time

Figure 4. Perfective –le: GJT score changes

119

A 3  2  3 mixed design repeated measure ANOVA was conducted to obtain a
general picture of how GJT score variation was impacted by feedback type (3 groups:
implicit, explicit, and control) and proficiency (2 levels: low and high) and the timing of
testing (3 time points: pretest, immediate posttest/posttest 1, and delayed posttest/posttest
2). As shown in Table 13, significant main effects were found for time, F (2, 142) =
92.15, p < .05, feedback, F (2, 72) = 14.45, p < .05, and proficiency, F (1, 72) = 92.15, p
< .05. An interaction effect was found between time and feedback, F (4, 142) = 15.54, p
< .05. The three-way interaction between time, feedback, and proficiency and the twoway interaction between feedback and proficiency approached significance.
In order to determine the sources of difference, the gain scores of each feedback
group at each proficiency level at posttest 1 and posttest 2 were subjected to post hoc
analyses. Gain scores were obtained by subtracting pretest scores from posttest scores.
The descriptive statistics of the gain scores appear in Table 14 (see Appendix C for the
descriptive statistics related with the raw scores of different proficiency levels). Results
pertaining to group contrasts and the corresponding effect sizes (Cohen’s d) are shown in
Table 15. As shown, at the low proficiency level, the explicit group outperformed the
control group and the implicit group at both posttests; the implicit group did not show
significant improvement at either posttest. At the high proficiency level, the implicit
group did not perform significantly better than the control group at the time of posttest 1,
but they did at posttest 2; learners benefited more from explicit feedback than implicit
feedback at posttest 1 but the difference between the two feedback groups did not
significantly differ at posttest 2. Examination of effect sizes showed that the effects of
explicit feedback dropped substantially from posttest 1 to posttest 2 at both proficiency

120

levels and that the effects of implicit feedback increased over time at the high proficiency
level.

121

Table 13. Perfective –le: ANOVA results related to GJT scores
Source

Sum of Squares

df

Mean Square

F

p

Within-Group Results
Time

492.16

2

246.08

92.15

.00

Time  Feedback

165.96

4

41.49

15.54

.00

Time  Proficiency

15.02

2

7.51

2.81

.06

Time  Feedback  Proficiency

7.12

4

1.78

.67

.62

Between-Group Results
Feedback

70.68

2

35.34

14.45

.00

Proficiency

127

1

127

51.91

.00

Feedback  Proficiency

14.45

2

7.23

2.95

.059

122

Table 14. Perfective –le: Descriptive statistics on GJT gain scores
Gains at Posttest 1
Proficiency Group

n

Low

Implicit

Gains at Posttest 2

SD
2.16

Mean
2.14

SD
2.07

Explicit

15

6.29

1.93

4.18

2.91

Control

10

1.40

1.76

1.55

.93

Implicit

14

2.46

2.53

4.32

2.15

Explicit

14

5.69

1.96

4.42

2.89

Control

High

14

Mean
2.00

11

1.09

2.77

2.18

1.77

Table 15. Perfective –le: Post hoc contrasts related to GJT scores
Low Proficiency
Posttest 1
Contrasts

Posttest 2
ES

I—C

High Proficiency

Contrasts

.30

I—C

*

2.62

E—C

*

2.10

E—I

E—C
E—I

Posttest 1
ES

Contrasts

.29

I—C

*

1.03

E—C

*

.81

E—I

ES

Contrasts
*

ES

.52

I—C

*

1.94

E—C

.92

*

1.42

E—I

.04

Note. ES = effect size; I = implicit; E = explicit; C = control
*

Posttest 2

p < .05

123

*

1.03

EI Test Results
The descriptive statistics regarding learners’ performance on the elicited imitation
tests appear in Table 16. Overall, EI pretest scores are lower than GJT pretest scores,
indicating that learners had less implicit knowledge than explicit knowledge about the
target structure prior to the treatment. The standard deviations of EI scores are in general
larger than those of GJT scores, suggesting that learners were more homogeneous in their
explicit knowledge about the target structure. Figure 5 shows the development patterns of
the three groups over time. Evidently the two experiment groups improved substantially
after treatment but the control group did not undergo substantial change. As with the GJT
results, the explicit group seemed to have dropped the most from posttest 1 to posttest 2.
One-way ANOVA conducted on the pretest scores of the three groups showed that there
was no significant difference between them before treatment, F (2, 77) = .56, p = .57.

Table 16. Perfective –le: Descriptive statistics on EI test scores
Pretest
Condition

n

Posttest 1

Posttest 2

M

SD

M

SD

M

SD

Implicit

28

4.14

3.59

8.71

3.47

7.61

4.12

Explicit

29

3.26

3.36

9.98

3.19

7.69

4.00

Control

21

4.67

3.63

5.60

4.26

5.12

3.12

124

15
13
11
Implicit

9

Explicit
7

Control

5
3
1
1

2

3

Time

Figure 5. Perfective –le: EI score changes
The EI test scores pertaining to the perfective –le were subjected to a mixed design
repeated measure ANOVA, with time as the within-group variable and feedback and
proficiency as between-group variables. Results (Table 17) revealed that there is a
*

significant effect for time, F (2, 142) = 113.48, p < .05, for time feedback interaction, F
(4, 142) = 13.98, p < .05, for feedback, F (2, 72) = 5.96, p < .05, and for proficiency, F (1,
72) = 74.66, p < .05. In order to identify group differences, gain scores were calculated
for the six groups that were formed by feedback and proficiency and the results are
displayed in Table 18 (see Appendix C for the descriptive statistics related with the raw
scores of different proficiency levels).

125

Table 17. Perfective –le: ANOVA results related to EI test scores
Source

Sum of Squares

df

Mean Square

F

p

Within-Group Results
Time

704.89

2

352.44

113.48

.00

Time  Feedback

173.68

4

43.12

13.98

.00

Time  Proficiency

3.67

2

1.83

.59

.55

Time  Feedback  Proficiency

14.75

4

3.69

1.19

.32

Between-Group Results
Feedback

66.69

2

33.34

5.96

.00

Proficiency

417.76

1

417.76

74.66

.00

Feedback  Proficiency

4.68

2

2.34

.42

.66

126

Table 18. Perfective –le: Descriptive statistics related to gain scores on EI tests
Gains at Posttest 1

Gains at Posttest 2

Proficiency Group

n

Mean

SD

Mean

SD

Low

Implicit

14

4.39

2.58

2.68

2.58

Explicit

15

6.86

2.78

3.93

2.58

Control

10

1.15

2.07

1.35

1.53

Implicit

14

4.75

2.38

4.25

2.56

Explicit

14

6.57

3.30

4.96

3.12

Control

11

1.68

1.79

0.59

2.04

High

Results of post hoc group comparisons (Table 19) showed that at the lower
proficiency level, learners benefited from both explicit feedback and implicit feedback as
reflected on posttest 1, but the difference between the implicit condition and the control
group was not significant on posttest 2. Effect sizes for both feedback types underwent
remarkable decrease. At the higher proficiency level, learners benefited from both
feedback types at both posttests. Also, it appeared that the effects of both feedback types
were well maintained at the higher proficiency level, with a slight decrease for explicit
feedback and an increase for implicit feedback. As to implicit-explicit contrasts, the only
significant difference between the two feedback groups was found for low-level learners
on the immediate posttest.

127

Table 19. Perfective –le: Post hoc contrasts related to EI test scores
Low Proficiency

Posttest 1

Contrasts
*

I—C

*

E—C

E—I

High Proficiency

Posttest 2

Posttest 1

ES

Contrasts

ES

Contrasts

1.36

I—C

.60

I—C

2.27

E—C

1.17

.92

E—I

.49

*

*

Posttest 2

ES

Contrasts

1.43

I—C

E—C

1.80

E—C

1.63

E—I

.64

E—I

.25

*

*

*

*

ES

1.56

Note. ES = effect size; I = implicit; E = explicit; C = control
*

p < .05

Summary of the Results on the Perfective –le
Based on the results reported above, we turn to research question 1, that is, whether
the two types of feedback have differential effects on learners of different proficiency
levels in the learning of the Chinese perfective –le, the answer is affirmative. More
specifically, the following results were obtained:
(1) The effect of implicit feedback on low proficiency learners was limited. The GJT
results showed that the implicit group did not perform significantly better than the
control group on either posttest. Although there were significant effects for implicit
feedback on the immediate EI test, but the effects did not sustain.

128

(2) Implicit feedback was effective for high proficiency learners. On the EI test, implicit
feedback showed large effects on both posttests; on the GJT test, although the implicit
group did not perform substantially better than the control group on the immediate
posttest, but it did on the delayed posttest. Also, the effects of implicit feedback
increased over time: On both the GJT and EI tests, the effect sizes associated with the
delayed effects are larger than those associated with the immediate effects.
(3) Explicit feedback was beneficial to learners at both proficiency levels as reflected on
both test formats.
(3) The superiority of a certain feedback type seems to depend on proficiency level, test
type, and timing of test. Out of the eight contrasts between the two feedback groups,
four are significant in favor of explicit feedback. Out of the four significant contrasts,
three pertain to low proficiency learners, GJTs, and immediate posttests. In other
words, explicit feedback tended to be more effective than implicit feedback to less
advanced learners when treatment effects were measured by using tests that favored
explicit knowledge; the difference between the two feedback types tended to
disappear over time.
Results on Classifiers
GJT Results
The descriptive statistics for the GJT results on classifier use, including group means
and standard deviations, are displayed in Table 20. The group means are also plotted on
the graph in Figure 6. As shown, both feedback groups outperformed the control groups
on both posttests. Pretest scores were subjected to a one-way ANOVA, which showed
that there was no significant difference between the three conditions, F (2, 78) = 2.1, p

129

= .13. This suggests that any difference between the two feedback groups and the control
group did not result from their difference at the time of pretesting.

Table 20. Classifiers: Descriptive statistics on GJT scores
Pretest
Condition

n

Implicit

Posttest 1

Posttest 2

M

SD

M

SD

M

SD

28

6.00

1.12

9.23

2.39

9.20

2.51

Explicit

29

5.41

1.34

10.72

2.61

9.86

2.32

Control

21

6.02

1.22

6.57

1.72

6.29

1.82

15
13
11
Implicit

9

Explicit
7

Control

5
3
1
1

2

3

Time

Figure 6. Classifiers: GJT score changes
A mixed design repeated measure ANOVA was conducted to determine if learners’
performance scores on classifier use were mediated by feedback type, proficiency, and

130

timing of testing. Results as presented in Table 21 reveal that there was a significant
*

effect for time, F (2, 143) = 21.57, p < .05, and for time feedback interaction, F (4, 143)
= 21.57, p < .05. The interaction between time and proficiency was marginally significant.
Between-group results showed a main effect for feedback, F (2, 72) = 20.83, p < .05,
and for proficiency, F (1, 72) = 31.32, p < .05. To locate the source of the differences,
post hoc pairwise comparisons were conducted on the gain scores of the subgroups
formed by feedback type and proficiency. Descriptive statistics on the gain scores
including means and standard deviations over time are displayed in Table 22 (see
Appendix D for the descriptive statistics related with the raw scores of different
proficiency levels).
As Table 23 shows, both feedback groups outperformed the control groups at both
proficiency levels and the effects were well maintained: The effect sizes related to the
delayed posttests did not appear to be substantially smaller than the effect sizes related to
the immediate posttests. At the low proficiency level, there was a larger effect for
explicit feedback than for implicit feedback, but at the high proficiency level, there was
no significant difference between the two types of feedback.
EI Test Results
Descriptive statistics related to learners’ performance on classifier use as reflected
on the elicited imitation test, including means and standard deviations over time, are
displayed in Table 24. The means of the three groups are also plotted graphically in
Figure 7. It is evident that all three groups improved over time in their performance
scores on the EI test and that the two experiment groups appeared to have made greater
improvement than the control group.

131

Table 21. Classifiers: ANOVA results related to GJT scores
Source

Sum of Squares

df

Mean Square

F

p

Within-Group Results
Time
Time  Feedback
Time  Proficiency
Time  Feedback  Proficiency

420.33

2

210.17

102.87

.00

176.25

4

44.06

21.57

.00

11.27

2

5.64

21.76

.06

8.09

4

2.02

.99

.42

Between-Group Results
Feedback
Proficiency
Feedback  Proficiency

77.04

2

38.52

20.83

.00

57.92

1

57.92

31.32

.00

.28

2

.14

.075

.93

132

Table 22. Classifiers: Descriptive statistics on GJT gain scores
Gains at Posttest 1
Proficiency Group

n

Low

Implicit

Gains at Posttest 2

SD
1.62

Mean
2.50

SD
1.65

Explicit

15

5.03

3.08

4.21

2.22

Control

10

0.01

0.74

0.20

1.70

Implicit

14

4.32

2.15

3.89

2.61

Explicit

14

5.61

2.59

4.71

2.62

Control

High

14

Mean
2.14

11

0.96

1.94

0.32

1.49

Table 23. Classifiers: Post hoc contrasts related to GJT scores
Low Proficiency

Posttest 1

Contrasts
*

I—C

*

E—C

E—I

*

High Proficiency

Posttest 2

ES

Contrasts

1.53

I—C

2.01

E—C

1.16

E—I

*

*

*

Posttest 1

ES

Contrasts

1.37

I—C

1.99

.88

ES

Contrasts

1.63

I—C

E—C

1.99

E—C

2.00

E—I

0.50

E—I

0.31

*

*

Note. ES = effect size; I = implicit; E = explicit; C = control
*

Posttest 2

p < .05
133

*

*

ES

1.63

Table 24. Classifiers: Descriptive statistics on EI test scores
Pretest
Condition

n

Implicit
Explicit
Control

28
29
21

M
3.14
2.90
3.86

Posttest 1
SD
2.06
2.20
2.44

M
7.77
9.33
5.17

SD
2.88
3.13
2.53

Posttest 2
M
7.23
8.50
5.76

SD
2.99
3.50
2.77

15
13
11
Implicit

9

Explicit
7

Control

5
3
1
1

2

3

Time

Figure 7. Classifiers: EI score changes
In order to determine if score variation is mediated by feedback type, proficiency,
and time, a mixed design repeated measure analysis was performed. The within-group
variable is time, and the two between-group variables are feedback type and proficiency.
Before the mixed design ANOVA was conducted, a one-way ANOVA was performed on
the pretest scores of the three groups, and no significant difference was found, F (2, 77) =
1.19, p = .31.

134

The mixed ANOVA (Table 25) showed that significant effects were found for time
*

F (2, 143) = 151.02, p < .05, for time feedback interaction, F (4, 143) = 15.89, p < .05,
for feedback, F (2, 72) = 622.42, p < .05, and for proficiency, F (1, 72) = 6.34, p < .05.
Post hoc group comparisons were conducted on the gain scores of the three involved
groups at each proficiency level to locate the source of differences. The descriptive
statistics related to the gain scores appear in Table 26 including pre-post change scores
and standard deviations (see Appendix D for the descriptive statistics related with the raw
scores of different proficiency levels). Table 27 displays the results generated by the post
hoc analyses including group contrasts and the corresponding effect sizes. As shown,
both the implicit and explicit groups performed significantly better than the control group
on both posttests at the low proficiency level; at the more advanced level, the explicit
group outperformed the control group on both posttests, but the implicit group only
outperformed the control group on the immediate posttest. Explicit feedback worked
better than implicit feedback for low-proficiency learners at the time of posttest 1 but the
difference did not sustain. No difference was found between the two corrective moves at
the high proficiency level.

135

Table 25. Classifiers: ANOVA results related to GJT scores
Source

Sum of Squares

df

Mean Square

F

p

Within-Group Results
Time
Time  Feedback
Time  Proficiency
Time  Feedback  Proficiency

803.52

2

401.76

151.02

.00

169.15

4

42.29

15.89

.00

2.69

2

1.34

.51

.06

1.83

4

.46

.17

.95

Between-Group Results
Feedback
Proficiency
Feedback  Proficiency

54.80

2

2688.32

622.42

.00

129.06

1

129.06

6.34

.00

2.51

2

1.26

.29

.75

136

Table 26. Classifiers: Descriptive statistics related to gain scores on EI tests
Gains at Posttest 1
Proficiency Group

n

Low

Implicit

Gains at Posttest 2

SD
2.00

Mean
4.11

SD
2.36

Explicit

15

6.57

2.37

5.29

3.15

Control

10

1.55

1.30

1.70

1.21

Implicit

14

4.64

2.74

4.07

2.75

Explicit

14

6.29

2.56

5.71

2.53

Control

High

14

Mean
4.61

11

1.09

1.41

2.09

1.98

Table 27. Classifiers: Post hoc contrasts related to EI test scores
Low Proficiency

Posttest 1

Contrasts
*

I—C

*

E—C

E—I

*

High Proficiency

Posttest 2

ES

Contrasts

1.74

I—C

2.48

0.89

Posttest 1

ES

Contrasts

1.22

I—C

E—C

1.41

E—I

0.42

*

*

ES

Contrasts

ES

1.57

I—C

0.81

E—C

2.43

E—C

1.57

E—I

0.62

E—I

0.62

*

*

Note. ES = effect size; I = implicit; E = explicit; C = control
*

Posttest 2

p < .05
137

*

Summary of the Results on Classifiers
The second research question asks whether implicit feedback and explicit feedback
have different effects on learners at different proficiency levels in their learning of
Chinese classifiers. ANOVAs and post hoc analyses generated the following findings:
(1) Both feedback types benefited the learning of the target structure at both proficiency
levels and the effects were maintained. Despite the fact that the mean difference
between the high-implicit group and the high-control group did not reach significance
on the delayed EI posttest, the related effect size was large. Therefore, a claim can be
made for the superiority of implicit feedback to no feedback.
(2) Neither feedback type seemed to have differential effects on learners of the two
proficiency levels. Examination of the effect sizes related to individual feedback type
across proficiency levels showed that neither feedback type impacted the two
proficiency groups differently.
(3) As to which feedback type is more effective, it was found that overall explicit
feedback showed larger effects than implicit feedback at the low proficiency level,
but high-proficiency learners benefited equally from the two types of feedback.
The Perfective –le vs. Classifiers
The third research question concerns whether the choice of target structure mediates
the effects of feedback. Because the present study includes two feedback types and two
proficiency levels, this research question involves multiple interacting dimensions
regarding the relationship between feedback type, proficiency, and structure type. First of
all, in terms of the effectiveness of implicit feedback, low-proficiency learners did not
benefit from this type of feedback in learning the aspect marker but they did in learning

138

classifiers. Implicit feedback also facilitated the learning of the aspect marker at the high
proficiency level. Moreover, in the learning of the aspect marker, the effects of implicit
feedback improved over time for high-proficiency learners, but the same pattern was not
found for classifier learning. As to the effects of explicit feedback, it worked for both the
perfective –le and classifiers and for learners of both proficiency levels. There was no
evidence to show its superior effects for either structure or either proficiency level. In
terms of which type of feedback is more facilitative of learners’ interlanguage
development, it was found that for both target structures, low-level learners benefited
more from explicit feedback than implicit feedback; this was more so on immediate
posttests and when treatment effects were measured by using the GJT test.
To determine whether the overall effects of feedback were different for the two
target structures, effect sizes were calculated for the two feedback groups (as compared
with the control group) on the GJT and EI test scores. The results, which appear in Table
28, show that the mean effect size (averaged across feedback types and test formats)
associated with classifiers is larger than that associated with the perfective –le, indicating
that overall, feedback tend to be more effective for the learning of the former.
To identify if the effects of feedback on the two target structures are reflected
differently on GJTs and EI tests, mean effect sizes were calculated for the two feedback
types for each structure on the results related to both posttests. The results, which are
presented in Table 29, show that in general, the effect sizes associated with EI test results
are larger than those associated with GJT results, regardless of target structure and timing
of testing.

139

Table 28. Effect sizes associated with perfective –le and classifiers
Perfective -le
Test

Feedback

GJT

Classifiers

Posttest 2

Implicit

0.43

0.67

1.39

1.50

Explicit

2.27

1.03

2.01

1.00

Implicit

1.41

1.14

1.67

2.03

Explicit

EI

Posttest 1

2.06

1.42

2.50

1.48

1.54

1.07

1.89

1.50

Mean effect size

Posttest 1

Posttest 2

Table 29. Effects of feedback shown on different Tests
-le

Classifier

Test

Immediate

Delayed

Immediate

Delayed

EI

1.73

1.28

2.85

1.75

GJT

1.35

0.85

1.70

1.25

Results on Language Analytic Ability and Working Memory
To answer research questions 4 and 5, which ask about the relationship between the
two aptitude components, feedback type, and the choice of target structure, correlation
th

analyses were performed on the data contributed by the learners in their 4 semester of
study. More specifically, it is the data produced by the 30 learners assigned to the two
experiment groups (15 for each) that were analyzed because the purpose was to examine
the extent to which the gains under the two feedback conditions correlated with the
aptitude components. The data related to the control group are therefore irrelevant. Also,

140

th

as previously mentioned, only the data from learners in their 4 semester of study were
analyzed to ensure homogeneity of learners’ background in terms of the amount of prior
instruction.
Prior to the correlation analyses, descriptive statistics were calculated on the scores
of language analytic ability and working memory and on the gain scores of the two
feedback groups. The full score of the language analytic test is 45. Learners in the
implicit condition scored an average of 26.43 and those in the explicit condition averaged
23.20 (Table 30). Learners’ working memory score consists of three components:
plausibility judgment, reaction time, and recall (of sentence final words). The raw mean
scores for the three components and the standard deviations are presented in Table 31.
Following Leeser (2007), the raw scores were transformed into z scores. The composite
WM score for each participant is the average of the z scores associated with the three
components. Table 32 shows the GJT and EI gain scores and standard deviations by
feedback type and target structure at the time of posttest 1 and posttest 2.

Table 30. Scores of language analytic ability

Feedback

n

Mean

SD

Implicit

15

26.43

7.96

Explicit

15

23.20

4.52

141

Table 31. Raw scores of working memory
Plausibility Judgment

Reaction Time (ms.)

Recall Accuracy

Feedback
Mean

SD

Mean

SD

Mean

SD

Implicit

64.00

3.78

3683.23

417.38

53.29

10.69

Explicit

63.47

4.12

3816.92

650.62

50.00

9.61

Pearson’s correlation analyses were performed on the scores of language analytic
ability, working memory, and the gain scores of the feedback groups. The results are
displayed in Table 33, which show that significant correlations existed between language
analytic ability and the delayed effects of explicit feedback in the learning of the
perfective –le, r = 0.67, p = .01 (GJT); r = 0.55, p = .04 (EI). Working memory was not
significantly related to any gain score as far as the learning of the aspect marker is
concerned.
th

Table 32. Descriptive statistics for 4 semester learners: Gain scores
Perfective -le
Posttest 1

Posttest 2

Feedback

Test

Mean

SD

Mean

SD

Implicit

GJT

2.14

2.56

3.32

2.33

EI

4.50

2.62

3.25

2.77

GJT

6.29

1.79

3.82

3.10

EI

6.86

2.50

4.14

2.27

Explicit

142

Table 32 (Cont’d)
Classifiers
Implicit

3.32

1.99

2.96

1.65

EI

5.00

2.71

3.64

2.55

GJT

5.31

2.91

4.57

2.05

EI

Explicit

GJT

6.28

2.54

5.63

2.86

In terms of classifier learning, there was a significant correlation between language
analytic ability and the delayed effects of implicit feedback (GJT scores), r = 0.69, p
= .01; a significant correlation was also found between working memory and the delayed
effects of explicit feedback (GJT), r = 0.57, p = .03. In the implicit condition, language
analytic ability was found to be significantly related to working memory, r = 0.61, p
= .02, raising the question of whether there was an overlap between the two constructs
and how much variance language analytic ability accounts for in the delayed effects of
implicit feedback in the learning of classifiers. Partial correlation analyses were therefore
conducted to explore the unique contribution of either construct when one of them was
held constant. It was found that when working memory was controlled for, language
analytic ability continued to be significantly correlated with the gain scores of the
implicit group on the GJT test at the time of posttest 2, r = 0.67, p = .01. The result
suggests that language analytic ability was solely responsible for a significant portion of
the variance in the treatment effects after working memory was partialed out. Working
memory, however, did not significantly correlate with the effects of implicit feedback in
the learning of classifiers when language analytic ability was held constant.

143

Table 33. Feedback, aptitude, and the target structure: Correlation results
Perfective -le
Feedback

Test Type

Posttest

LAA (r)

p

WM (r)

p

Implicit

GJT

1

0.39

0.17

0.19

0.50

2

0.24

0.41

0.10

0.73

1

0.31

0.28

0.18

0.54

2

0.20

0.49

-0.14

0.63

1

0.37

0.19

0.15

0.60

2

0.67

0.01

*

-0.51

0.07

1

0.14

0.63

0.41

0.15

2

0.55

0.04

*

-0.47

0.09

EI

Explicit

GJT

EI

Classifiers
GJT

EI

-0.01

0.98

0.69

0.01

*

0.29

0.32

1

0.31

0.28

0.30

0.29

0.39

0.17

0.20

0.49

1

-0.12

0.66

0.15

0.59

-0.09

0.75

0.57

0.03

1

-0.32

0.25

0.12

0.68

2

GJT

0.19

2

Explicit

0.38

2

EI

1
2

Implicit

-0.10

0.73

0.37

0.17

*

*

Note. p < .05; GJT = grammaticality judgment test; EI = elicited imitation; LAA =
language analytic ability; WM = working memory

144

CHAPTER 5 DISCUSSION
This study sought to ascertain the impact of some learner-internal and learnerexternal factors on the effectiveness of corrective feedback in second language
acquisition. These factors include feedback type, learners’ proficiency, the choice of
target structure, and learners’ individual differences in language analytic ability and
working memory. The two feedback types under investigation are implicit feedback in
the form of recasts and explicit feedback operationalized as metalinguistic correction. 78
L2 Chinese learners were recruited and divided into two proficiency levels. At each level,
learners were assigned to three conditions: implicit, explicit, and control. They were
tested on their use of the Chinese perfective –le and Chinese classifiers before and after
instructional treatment by means of two tests: grammaticality judgment and elicited
imitation. Learners’ language analytic ability and working memory were tested by using
the Words in Sentences subtest of the MLAT and a listening span test respectively.
The following results were obtained. First, there was an interaction between
feedback type, proficiency, and the choice of target structure. In the learning of the
perfective –le, there was a limited effect for implicit feedback at the low proficiency level,
but this type of feedback showed large effects high-level learners. Also, implicit feedback
showed larger delayed effects in relation to immediate effects at the high proficiency
level, indicating that the effects improved over time. Explicit feedback worked for all
learners, irrespective of their proficiency. In the learning of classifiers, both feedback
types benefited learners at both levels of proficiency. Unlike the results on the perfective
–le, proficiency was not a mediating factor, that is, neither feedback type had differential
effects on learners of the two different proficiency levels. As to which type of feedback is

145

more effective, it was found that explicit feedback demonstrated superior effects as
compared with implicit feedback at the lower proficiency level and it was more so for
immediate effects; at the higher proficiency level, there were no significant differences
between the two corrective moves. The result was obtained for both target structures,
indicating the relative robustness of this finding.
Second, with regard to the interaction between the two aptitude components,
feedback, and the choice of target structure, it was found that language analytic ability
correlated with the effects of implicit feedback in the learning of classifiers and with the
effects of explicit feedback in the learning of the perfective –le. Working memory only
correlated with the effects of explicit feedback in the learning of classifiers. All
significant correlations were related with the delayed effects of feedback but not with
immediate effects.
Previous research has obtained valuable findings on the effects of corrective
feedback. However, researchers have mostly either examined the effectiveness of one
feedback type (e.g., recasts) or compared different types of feedback (e.g., recasts versus
metalinguistic feedback or prompts) in terms of their effectiveness per se without
investigating factors constraining their effectiveness. The inclusion of multiple feedback
types and independent variables makes it possible to take an integrated approach to the
efficacy of feedback, as shown in Figure 1 (“An integrated model of corrective
feedback”), for which the obtained findings of this study provided empirical evidence.
In what follows, the results will be discussed by exploring the joint and separate
contribution of different factors to the obtained findings.

146

Implicit Feedback
The Perfective -le
One of the most striking findings is that implicit feedback in the form of recasts did
not have substantial effects for low-proficiency learners in the learning of the perfective –
le. This finding is attributable to several factors. First, it is partly attributable to the
implicit nature of this corrective move. Recasts are implicit and are intended not to
overtly draw learners’ attention to linguistic forms. Those who have argued for the
benefits of recasts (Doughty, 2001; Long, 1996, 2007) argued that this type of feedback
constitutes an ideal strategy of focus on form just because of its implicitness: It is nonintrusive, and at the same time, it juxtaposes nontargetlike production with the correct
form and affords opportunities for cognitive comparison. However, a number of studies
(Ellis et al., 2006; Lyster, 2004; Sheen, 2007a, 2010; Yang & Lyster, 2010) found that
recasts did not work in classroom settings where feedback was directed toward multiple
learners because they were implicit and because learners might have interpreted recasts as
confirmation of the content without realizing the corrective force subsumed in the
feedback. This argument is backed up by the fact that studies conducted in laboratory
settings, where recasts became more salient, have consistently demonstrated that recasts
were effective (e.g., Han, 2002; Lyster & Izquierdo, 2009; Mackey & Philp, 1998). The
same trend was also obtained in research syntheses on feedback research (Li, 2010;
Mackey & Goo, 2007) which showed that lab-based studies produced larger effects than
classroom studies. Instructional setting does not contribute to the implicitness of recasts
in this study because it was conducted in the laboratory. However, the implicit nature of
recasts might be partly responsible because explicit feedback was effective in the same

147

condition.
The implicit nature of recasts cannot account for the whole picture because this type
of feedback was effective for learners at the same level of proficiency (low) in the
learning of the other linguistic structure, Chinese classifiers. An interpretation therefore
has to be sought with recourse to the nature of the linguistic target. The perfective –le, as
previously discussed, is redundant, and non-salient, which may have made it difficult for
the learner to notice and benefit from recasts, a type of feedback which is implicit by
nature.
A link can be established between the above claim and the mechanism of input
processing and learners’ strategies in semantic encoding. According to VanPatten’s Input
Processing Theory (2007), learners are more likely to process non-redundant structures
than redundant structures because the former carry more communication load than the
latter. By the same token, learners give priority to lexicon as compared with
morphosyntax in processing input because lexicon is more meaning-loaded and facilitates
or impedes comprehension more than morphosyntax does. Researchers adopting
functional approaches (the concept-oriented approach in this case) to SLA (BardoviHarlig, 2007) hold that learners make use of a range of linguistic options to express a
concept such as contextual information, lexical devices, and grammatical morphemes.
Beginners tend to rely more on context and lexicon than on morphosyntax in their L2
production. The non-salient and redundant nature of this structure is also likely to affect
the learners’ test performance: Learners failed to notice and correct the errors related to
the use of –le during the GJT test, which is essentially a comprehension-based test; in the
production-based EI test, they perceived it less necessary to use it than linguistic features

148

that were more meaningful.
It follows from the above discussion that when recasts do not work, it may not be
entirely because of its implicitness and the instructional setting; one important factor to
consider is the nature of the linguistic structure to be learned. It is to be reminded that
recasts were not effective in Ellis et al. (2006), Lyster (2004), Sheen (2007a, 2010), or
8

Yang and Lyster (2010) . Indeed, what is common about these studies is that all were
conducted in the classroom, but what they also have in common is that the target
structures in these studies—English past –ed, French gender agreement, and English
articles(a/an and the)—are either non-salient or redundant. Another piece of evidence for
the role of the target structure in moderating the effects of recasts comes from Ammar &
Spada (2006), which was also classroom-based but which, unlike the above mentioned
studies, demonstrated that recasts were effective. The effects of recasts in Ammar and
Spada’s study may be attributable to, among other factors, the fact that the target
structure is the English possessive determiners his/her, which is meaning distinctive and
salient.
The next question to be tackled is why implicit feedback oprationalized as recasts
did not work for low-level learners but did for high-level learners in the perfective –le. If
the implicit nature of recasts and the non-salient, redundant nature of the linguistic
feature jointly account for the poor performance on the part of the less proficient learners,
it appears difficult to understand why more advanced learners benefited from the same
feedback when learning the same structure. It would appear that the only explanation is
proficiency. It is possible that compared with low-proficiency learners, high-proficiency
learners have more cognitive resources at their discretion such that they were more able

149

to notice the corrective force of recasts despite their implicitness, and they were more
likely to process and produce the target structure despite its non-saliency and redundancy.
Philp (2003) comprehensively summarized the factors constraining noticing of recasts,
which include, among others, developmental readiness, saliency of linguistic structure,
and type of instruction. She found that learners tended not to notice linguistic features
that were beyond their level of acquisition. Experimental studies (Ammar & Spada, 2006;
Mackey & Philp, 1998) demonstrated that recasts were indeed more beneficial to learners
who were more developmentally ready.
It should be noted that developmental readiness is often operationalized as learners’
prior knowledge about a target structure in previous research (Ammar & Spada, 2006;
Mackey & Philp, 1998). In this study, however, learners’ level of linguistic competence
refers to their general proficiency, which is measured through a standardized proficiency
test. Developmental readiness is most probably consistent with general proficiency but
may not always be so. The former involves the learning of a particular linguistic structure,
whereas the latter concerns learners’ overall linguistic development. Learners with higher
proficiency, because their load in processing other competing linguistic stimuli is reduced,
likely have more cognitive space freed up and are more cognitively involved in
processing the corrective information contained in recasts on the target structure.
The fact that recasts facilitated the learning of the perfective –le provides further
empirical evidence for the claim that learners, or rather advanced learners, are able to
induce grammatical rules through repeated exposure to input (Ellis, 2009; Williams, 1999,
2005). This is encouraging because the target structure is complex and the usage of which
requires detailed metalinguistic explanation. Ellis speculated that in most so-called

150

implicit conditions, learning may not be implicit because repeated practice may prompt
learners to engage in “meta-awareness in the form of hypothesis-testing and conscious
rule-formation” (p. 8). One might argue that learners may not have engaged in ruleinducing in this case; the resultant learning may be exemplar-based: Learners may have
merely memorized target-related cases as separate items. This may not be true because
correlation analyses revealed that only one of four gain scores (from two tests at two time
points) associated with the high-implicit group—the immediate GJT scores—
significantly correlated with working memory. The relationship no longer existed on the
delayed GJT test. It would seem that learners may have only used the memory traces
associated with the structure immediately after the treatment when taking the GJT test,
but it is the induced rule that learners used over a longer term.
Classifiers
Let us now turn to the effects of recasts on classifier learning. The question to be
answered is why this type of feedback did not work for low-proficiency learners in
learning the perfective –le but it did in learning classifiers, and why it has differential
effects on high- and low-proficiency learners in learning the perfective –le but it is
equally effective for both levels of learners when it comes to classifiers. The answer lies
in the different attributes of these two structures. As previously discussed, classifiers and
the perfective –le differ in level of redundancy, degree of saliency, form-meaning
mapping, and learnability. Classifiers are obligatory, salient, transparent in form-meaning
mapping. The perfective –le is to the opposite: redundant, non-salient, and opaque in
form-meaning mapping. The perfective –le is a post-verbal morpheme, which involves
long, complex movement. The interpretation of –le has to be made at the sentential level.

151

The classifier is situated in the DP (determiner phrase) and does not involve complex
movement. The interpretation of a classifier is made locally in relation to the noun phrase
it is attached to, and in most cases, there is a one-to-one correspondence between
classifiers and a category of objects. The use of classifiers is therefore more semantically
than syntactically driven. Therefore, to a certain degree, the distinction between
classifiers and the aspect marker –le also parallels one between morphosyntax and
lexicon.
Classifiers are salient and therefore are easy to notice; they involve simple formmeaning mapping and hence are amenable to minimal instruction (DeKeyser, 2005). So
despite the implicit nature of recasts and the small amount of information that it contains
as compared with metalinguistic information, this type of instruction proved to be
effective for even low-level learners; and unlike the findings with the perfective –le,
high-level learners did not benefit more from recasts than their low-level counterparts in
classifier learning. Interaction studies focusing on learners’ perception showed that
learners’ noticing of feedback was indeed constrained by the nature of linguistic targets:
learners were more likely to recognize lexical recasts than morphosyntactic recasts
(Carpenter, Jeon, Macgregor, & Mackey, 2007; Mackey, Gass, & McDonough, 2000).
These studies corroborate the superior effects of recasts on classifiers, a structure which
is more of a lexical than a morphosyntactic nature.
One interesting finding pertaining to the interaction between recasts and linguistic
targets is that the effects of recasts improved over time in the learning of the perfective –
le at the high proficiency level, but the same trend was not found for classifier learning:
The effect sizes associated with the delayed effects of recasts are not higher than the

152

effect sizes associated with the immediate gains at the high proficiency level. It is
speculated that structures involving complex meaning-form mapping engage deeper
cognitive processing and the effects of instruction are therefore more endurable. The
improved effects of recasts might also have to do with the implicit nature of the feedback
because the effects of explicit feedback did not increase over time. Learning under the
explicit condition is less endurable probably because the availability of external
assistance reduces the need for deep cognitive processing and the obtained knowledge is
less likely to be proceduralized. Following this line of thinking, the following hypotheses
might be formulated regarding the sustainability of learning outcome:
(1) Effects related to the learning of complex structures are more persistent than effects
related to simple structures.
(2) Effects obtained under implicit conditions are more persistent than effects under
explicit conditions.
Certainly, these claims are only true of high-proficiency learners in this study and
are at best hypothetical. Li’s meta-analysis (2010) also found a larger long-term effect for
implicit feedback. However, his study did not take into consideration the impact of
proficiency and the choice of target structure due to the lack of research on the two
variables. There might exist a complicated relationship between proficiency, learning
conditions, and the complexity of the linguistic structure as far as the sustainability of
treatment effects is concerned, and the jury is still out. SLA researchers rarely address the
distinction between immediate and delayed effects when investigating or discussing a
certain type of instruction or interventional treatment. Even in cases where several
posttests are included, the related results are only reported rather than interpreted. It is

153

hoped that the discussion in this regard prompts feedback or SLA researchers to address
or investigate the sustainability of treatment effects; after all, proceduralized L2
knowledge as reflected on delayed measures is the ultimate goal of SLA.
Explicit-Implicit Comparison
Following Sheen (2007a, 2010), explicit feedback in this study was operationalized
as metalinguistic correction, that is, the provision of the correct form followed by
metalinguistic explanation. Thus, the explicit feedback contains positive evidence (the
provision of the correct form) as well as negative evidence in the form of rule explanation.
The explicit feedback benefited all the learners in this study, irrespective of proficiency
and target structure. Also, the magnitude of its effects did not seem to vary across the two
proficiency levels and target structures. The working principle of explicit feedback seems
obvious: It is salient and increases learners’ awareness of the target structure. According
to Sheen (2007a), metalinguistic feedback is especially facilitative of L2 acquisition
because it develops learners’ awareness at the level of both noticing and understanding.
The provision of rule explanation, especially in the case of the aspect marker, moved the
learner’s level of awareness from mere noticing to rule awareness, which is “strongly
facilitative of subsequent learning” (Robinson, 2002, p. 226; also see Schmidt, 2001). It
should be noted that metalinguistic information takes different forms: It can be brief and
only serves a metalinguistic alert (e.g., “Think about your question again” [Loewen &
Nabei, 2007]); and it can also be detailed and contain rule explanation (e.g., “The fox.
You should use the definite article ‘the’ because you have already mentioned ‘fox’”
(Sheen, 2007a)). Exactly how much metalinguistic information should be provided is not
known and may depend on the complexity of the target and the amount of metalinguistic

154

knowledge the learner has about the target prior to the treatment. In this study, the
metalinguistic information on the perfective –le is detailed and that on classifiers is brief
(because classifier use does not involve complex rule explanation). It is speculated that
learners may have especially benefited from the detailed rule explanation in addition to
the provision of the correct form while learning the perfective –le, a complex and nonsalient structure.
The finding that explicit but not implicit feedback was effective for the perfective –le
for low-proficiency learners has two implications. First and foremost, it shows that
complex structures are amenable to explicit instruction. This finding deviates from the
claims of Krashen (1981, 1994) and Reber (1989) that conscious learning is only
effective for some easy and semantically transparent structures, and that complex,
semantically opaque structures can only be learned implicitly or unconsciously. In the
meantime, it is in line with Hulstijn and de Graaff’s argument for the advantage of
explicit instruction in the learning of complex linguistic features (1994). Second, it
testifies to the importance of selective (focused) attention (Gass, 1997, 2003). Low-level
learners have heavy processing load and are faced with a large amount of competing
stimuli, the explicit information afforded through metalinguistic correction helps focus
their attention on a semantically opaque structure that was very difficult to notice under
implicit learning conditions. Gass, Svetics, and Lemelin (2003) argued that in learning
complex grammatical rules, “internal devices are insufficient for learning, and focused
attention…may be a necessary crutch” (p. 528).
As to which feedback type is more facilitative of L2 development, this study showed
that the superiority of explicit feedback to implicit feedback is subject to several caveats,

155

the discussion of which will have useful pedagogical implications. It was found that
while it is true that explicit feedback seemed to be effective regardless of proficiency and
target structure, it was not always more effective than implicit feedback (also see
DeKeyser, 1993; Loewen & Nabei, 2007). The advantage of explicit feedback was more
evident for low-level learners on immediate posttests and the differences were greater on
GJTs than on EI tests. These results are somewhat different from what was found in
previous research. SLA literature abounds in studies that promote the utility of explicit
instruction. Both empirical studies (e.g., Carroll & Swain, 1993; Ellis et al., 2006; Sheen,
2007a, 2010) and research syntheses (Li, 2010; Lyster & Saito, 2010; Norris & Ortega,
2000; Spada & Tomita, 2010) directly and indirectly showed that explicit feedback is
unequivocally more effective than implicit feedback.
However, what these studies did not examine is the role of proficiency in
moderating the effects of feedback. Ammar and Spada (2006) and Li (2009) are probably
the only studies that included proficiency as an independent variable when investigating
the differential effects of different types of feedback and they reported similar results. In
a classroom setting, Ammar and Spada compared prompts, which included metalinguistic
feedback, and recasts, and they found that prompts were more effective than recasts for
low-level learners but the two feedback types worked equally well for high-level learners.
Li’s lab-based study showed that the pre-post gains of the explicit group were larger than
the gains of the implicit group. To be noted is the fact that proficiency in Ammar and
Spada’s study was operationalized as learners’ prior knowledge about the target structure,
and as enrollment status in Li’s study.
To emphasize the importance of considering learners’ proficiency in opting for

156

explicit or implicit feedback is not to favor one instruction type over the other. The point
here is to view them in perspective. Given the superior effects of explicit feedback for
low-level learners, it seems advisable to employ instruction types that facilitate their
awareness of the linguistic structure and that provide them with more external resources.
However, where explicit feedback does not lead to more learning than implicit feedback
or where they are equally effective, as is the case for high-proficiency learners, implicit
feedback would be a better choice. This recommendation is defendable both theoretically
and pedagogically.
From a theoretical point of view, according to the Socio-Cultural Theory (Aljaafreh
& Lantolf, 1994; Lantolf, 2009; Ohta, 2009), learning occurs through mediation (which is,
in this case, interaction) between a novice and an expert in the zone of proximal
development (ZPD), which refers to the difference between what a learner can achieve
independently and what he/she can achieve with external assistance. Learning should
evolve from object-regulation to other-regulation and finally to self-regulation. The
purpose of instruction is to provide assistance to the effect that the learner’s reliance on
assistance is progressively reduced and ultimately the learner becomes autonomous. The
ideal condition for learning to happen is one where the learner is offered “just enough
assistance…[and] assume[s] increased responsibility for arriving at the appropriate
performance” (Aljaafreh & Lantolf, 1994, p.469). Providing too little or too much
assistance is both beyond what the ZPD requires for optimal learning outcome. Thus,
low-proficiency learners are under-assisted if they are provided with implicit feedback,
which contains insufficient information; high-proficiency learners are over-assisted if
they are provided with explicit feedback, which is imbued with superfluous information.

157

Too much assistance impedes the development of the learner’s ability to work
independently and autonomously (Ohta, 2009, p. 52).
From a pedagogical perspective, implicit feedback in the form of recasts is ideal for
form-focused instruction. The essence of form-focused instruction is that learning is
optimal in situations where the primary focus is on meaning but linguistic forms are
attended to in the meantime. In their seminal synthesis of the literature on recasts, Ellis
and Sheen (2006) precisely summarized the interactionist views (Long, 1996, 2007) on
the nature of this corrective strategy and its match with form-focused instruction:
They [recasts] induce a joint focus on form and meaning, thereby encouraging
form-function mapping in the context of, and without disturbing, the
communicative flow of the interaction. Furthermore, they allow for cognitive
comparison of erroneous and target language forms in a context in which the
learner is primed to notice the difference. In contrast, explicit forms of correction
involve treating language as an object and interrupting the flow of communication
and, thus, will not assist form-function mapping. (p. 578)
Lyster and Mori (2006) pointed out that recasts constitute an especially favored feedback
type in immersion classes because in addition to serving language learning purposes, they
help learners focus on content and communicate about subject matter that is beyond their
current linguistic competence. Moreover, one attribute of recasts as an implicit type of
feedback is that its effects might be longer-lasting than explicit feedback, at least for
high-proficiency learners in the learning of complex structures whose form-meaning
mapping is not transparent. As previously discussed, the finding that implicit feedback is
better at retaining instructional effects than explicit feedback is likely not anecdotal

158

because the same pattern was also obtained in Li’s meta-analysis on previous empirical
research on corrective feedback (2010).
Again, it would be arbitrary and misleading to take an absolute rather than
dialectical position when it comes to determining which type of feedback is a better
option because both, like any other type of instruction, have their respective advantages
and limitations. In light of the findings of this study and the above discussion on the
related theoretical underpinnings and pedagogical implications, it is recommended, albeit
tentatively, that explicit feedback be provided to low-proficiency learners and implicit
feedback to more advanced learners.
The Effects of Target Structure and Testing
In addition to the interactions between feedback type, nature of the target structure,
and proficiency, one question this study attempts to answer is whether there is a main
effect for the target structure, that is, whether feedback in general has differential impact
on the two target structures. It was found that the average effect size associated with the
learning of classifiers, regardless of proficiency and feedback type, was greater than that
associated with the perfective –le. This is not surprising because classifiers are
perceptually more salient, syntactically simpler, and semantically more transparent than
the perfective –le. The information contained in the implicit feedback targeting classifiers
is more likely to be perceived and incorporated than that targeting the perfective –le. In
the case of explicit feedback, because the linguistic derivation of the perfective –le is
complex and the metalinguistic information is difficult to comprehend, the information is
less likely to be internalized even if the information is delivered in an obvious,
straightforward manner.

159

The following excerpt from the exit interview with a participant (from the highimplicit group) indicates how salience might have contributed to the larger effects for
classifiers. When asked to comment on the two target structures, she said:
You can still get your point across without saying -le. Maybe that's the reason
[why I omitted –le in some cases]. It's not saying that it’s not important, but that
the -le structure is something you can easily drop [and] still get your point across.
I think measure words are more important. When in China, and you are having a
conversation, 'cause as a foreigner, I don't think many people understand you. So
giving a measure word is one more clue to what you're talking about. Dropping a
measure causes more misunderstanding than dropping -le.
The superior effect of feedback for classifiers in relation to the aspect marker is to
some extent consistent with what was found in previous literature. Mackey and Goo
(2007) found that interaction, an important component of which is feedback, had a larger
effect on lexical items than on morphosyntactic items; Yang & Lyster (2010) found that
recasts were more effective for irregular past forms than the regular past form. Lexicon
and irregular past forms are necessarily more salient and semantically more transparent
than morphosyntax and regular past forms. Spada and Tomita (2010) showed that both
implicit and explicit instructions showed larger immediate effects for simple structures
than for complex structures. Certainly, complexity, salience, and difficulty are not the
same and the relationship between them is far from clear and is controversial in current
SLA literature. This topic is beyond the scope of the current study. Regardless, one thing
seems obvious: the perfective –le is a more complex structure than classifiers.
The calculation of mean effect sizes related to the two test formats showed that EI

160

tests showed larger effects than GJTs. EI tests were intended to measure implicit
knowledge as a result of instructional treatment, and GJTs measure learners’ explicit
knowledge. Previous feedback research also showed a larger effect of feedback as
measured by tests tapping into learners’ online spontaneous performance than tests
reflecting learners’ offline, declarative knowledge of the target structure (Ellis et al., 2006;
Li, 2010; Loewen & Nabei, 2007; Lyster & Saito, 2010). This would suggest, as Ellis et
al. pointed out, that instruction, which in this case is realized through the provision of
corrective feedback, does contribute to the development of learners’ implicit knowledge.
It confirms DeKeyser’s (1998; 2007) and N. Ellis’s (2008) claim that explicit and implicit
knowledge stands in a continuum, and that through practice, conscious, declarative
knowledge can be converted into unconscious, procedural knowledge. It counters
Krashen’s (1981, 1994) and Hultijin’s (2002) argument that explicit knowledge is
unlikely to become proceduralized.
While it appears true that feedback leads to the acquisition of implicit knowledge,
questions need to be answered regarding why gains on EI tests are larger than on GJTs.
Does it mean that feedback leads to more implicit knowledge than explicit knowledge?
What caused the difference between the two test formats? Answers to these questions are
critical to understanding feedback research and the effectiveness of feedback, for the
same study might show different, or even conflicting, results depending on how the
effects are tested. Without appropriate testing, it would be misleading to discuss the
effects of any instructional intervention. The difference between the two test formats
might be accounted for from a combination of three perspectives: transfer appropriate
processing, amount of previous explicit knowledge, and the typological features of

161

Mandarin Chinese.
Transfer appropriate processing (TAP) is a cognitive approach of information
processing advanced by Morris, Bransford, and Franks (1977) (see Lightbown (2008) for
how the TAP applies to SLA). Morris et al., based on empirical evidence, contended that
“assumptions about the value of particular types of acquisition activities must be defined
relative to the type of activities to be performed at the time of test” (p.531). The basic
tenet of this approach is that information or input is better retrieved in tasks that resemble
the conditions where the information is received. Thus, if learning occurs in meaningfocused activities where attention is also paid to language, the learning outcome will be
best demonstrated on tests that are similar to the learning conditions, that is, taxing the
learner’s ability to process meaning and form simultaneously. Likewise, if learning
occurs in activities that involve discrete item practice where language is learned as an
object, the learner will perform better in item-based tasks that primarily involve the
processing of linguistic forms. In the case of this study, oral feedback was provided in
meaning-based interaction. The EI tests are also oral and the cognitive construct
underlying the EI tests is the same as the treatment tasks: on-line, simultaneous
processing of meaning and form. The GJTs are written and the primary focus of the tests
is on linguistic forms. Therefore, the TAP may partly explain the larger effects on the EI
tests than on the GJTs.
The difference between the two tests in terms of treatment effects may also have to
do with the greater amount of explicit knowledge learners had at the outset of the study.
Re-examining the descriptive statistics displayed in tables 9 and 13 confirmed this
speculation: the GJT scores of all three groups (implicit, explicit, and control) are larger

162

than their EI test scores on the pretests. Therefore, there was a certain ceiling effect for
9

the learners’ level of explicit knowledge .
The third potential factor relates to the typological features of the target language.
Unlike alphabetic languages where phonological systems are linked with orthographic
representations, that is, the pronunciation of a word is somewhat predictable from its
spelling, the Chinese language has separate writing and speech systems. The association
between a character or word and its pronunciation is arbitrary. The GJTs, unlike the EI
tests, are written, so learners might have difficulty recognizing the vocabulary words
involved in the obligatory contexts for the target structures in test items because these
words were presented orally in the treatment tasks. Although Pinyin, the Romanized
phonetic representation of Chinese characters, was provided for each single character in
each test item, more than 83% of the learners indicated during the exit interview that they
either entirely ignored the Pinyin or only occasionally referred to it when they felt the
need to. In fact, a few of them even reported that Pinyin was distracting and affected their
reading speed. According to one of the instructors of the classes from which the
participants were recruited, students were encouraged to use Pinyin as a crutch during the
first year of study, but it was discouraged in the second year and thereafter because the
Chinese language, after all, is not an alphabetic language and students must learn to
decipher written materials without Pinyin. In fact, non-teaching Chinese materials are
never accompanied by Pinyin.
One might also argue that learners might have exercised more control over their
output during the EI tests than the GJTs, calling into question the validity of the test as a
measure of implicit knowledge. However, this is unlikely to be true because when asked

163

on which test they used more grammar knowledge, 85% percent of the participants
indicated that they did so more on the GJT and that they relied more on “hunch” or “feel”
during the EI test. One participant, who is from the low-explicit group, commented on his
use (or non-use) of grammar rules during the two tests: “I did it more by intuition when I
was speaking and I tried to remember rules we learned in class while I was doing the
written one. I thought more about rules when I did the written part.” Interestingly, one
participant (high-implicit) pointed out that he did give more thought to the correctness of
a sentence when taking the GJTs, but the extra efforts had a negative effect on his test
performance:
When I was listening, I don't think I was thinking about it as much. Just my instinct
said that I need to add -le here. I didn't actually process that sentence in my mind.
And then when I am reading it, I kind of thought about it more. I second-guessed
myself and I wasn't sure, so I didn't do it [make the correction]. So that extra time
actually hurt me.
To conclude the discussion on the interface between the effects of feedback and
testing, some further comments are in order. First, if the TAP is accepted as one criterion
for test validity, it stands to reason to call for the need to match treatment conditions with
testing conditions. In other words, what is measured should be what is learned/taught.
Thus, it would be appropriate to use tests that involve oral production such as elicited
imitation to measure the effects of oral feedback in communicative activities. If the
purpose of a study is to investigate the effects of feedback provided in discrete item
practice, similar tasks should be used in testing. Second, to accommodate the dissociation
between orthographic and phonological systems of non-alphabetic languages, items in

164

written tests might be presented both orally and visually if the instructional treatment
mainly involves oral production and if a written test must be used.
Third, the implicit-explicit distinction in terms of knowledge type might be relative
rather than dichotomous and might be subject to multiple factors. Proficiency, for
instance, is likely to affect how much implicit knowledge learners have in their discretion.
Beginners might have to access their explicit knowledge about the target language most
of the time. Therefore, the implicit-explicit distinction may be more applicable to learners
at higher stages of their language development. One remedy to this problem might be to
include the component of reaction time in EI tests. Obviously, the amount of time a
learner takes to respond to linguistic stimuli is an indicator of the automaticity of L2
knowledge: The faster the learner responds, the higher the likelihood that the related
knowledge is proceduralized. Another factor that affects the validity of a test of implicit
or explicit knowledge is the characteristics or dynamics of the instructional setting. For
instance, learners from intensive language programs may have more explicit knowledge
than learners from an immersion context where language is taught or learned through the
subject matter and where priority is given to fluency rather than on accuracy of language
use. Also, how learners acquire their first language literacy may also be relevant.
Learners who are accustomed to a meaning-oriented approach to linguistic materials in
their first language may transfer the corresponding learning strategy into their second
language learning. If that is the case, they are likely to engage in more semantic than
syntactic processing, hence affecting their performance on a GJT.
Feedback, Linguistic Structure, and Aptitude Components
A major goal of this study is to determine whether the effects of feedback are related

165

to learners’ individual differences in two aptitude components: language analytic ability
as measured by the Words in Sentences subtest of the MLAT and working memory that
is assessed through a listening span test. It was found that there was an interaction
between feedback type, the choice of target structure, and the two aptitude components.
More specifically, language analytic ability correlated significantly with the effects of
implicit feedback in the learning of classifiers and with the effects of explicit feedback in
the learning of the perfective –le; learners’ working memory capacity was related only
with the effects of explicit feedback in the learning of classifiers. All the significant
correlations were found for the delayed effects of feedback.
These findings underscore the importance of exploring aptitude-treatment
interaction (Snow, 1987, 1991) and provide further justification for the necessity of
taking a componential rather than monolithic approach to aptitude research (Dörnyei &
Skehan, 2003; Robinson, 1997, 2002, 2005; Skehan, 2002). Clearly, the idiosyncratic
characteristics of each learning condition, defined jointly by the type of feedback and the
nature of the linguistic target, set different processing demands on learners’ cognitive
abilities, hence the resultant dynamic relationships between the two aptitude components
and the two feedback types. A related point is that unlike previous feedback research that
focused on the two-way feedback-aptitude interaction (e.g., Sagarra, 2007), this study
revealed the impact of a third variable, the target structure, on the sensitivity of individual
difference factors to the effects of corrective feedback.
Language Analytic Ability
The significant correlation between language analytic ability and the effects of
implicit feedback in the learning of classifiers is seemingly surprising because no

166

metalinguistic information or rule explanation is available in this learning condition. A
plausible explanation seems to be that when grammatical explanation is not available,
learners with high analytic ability achieved more. They were better-versed than learners
with lower analytic ability in extracting and generalizing the syntactic regularities related
to classifier use based on the positive and/or negative evidence contained in the provided
recasts. However, two questions arise regarding this interpretation. One is whether
learners engage in syntactic processing in implicit learning conditions; the other is
whether language analytic ability is drawn upon given the fact that classifiers constitute
an easy structure and do not involve complicated rule explanation.
To answer the first question, it is necessary to draw on Robinson’s work on aptitudetreatment interaction (1997). Robinson found that in the implicit condition, where they
were told to simply memorize some examples without being provided with any rule
explanation, learners with high aptitude claimed to have actively looked for and were
able to verbalize rules. Therefore, learners with high analytic ability are more likely to be
aware of linguistic problems and engage in hypothesis testing about the target structure.
To answer the second question, it is useful to refer to Polio’s work on the acquisition of
Chinese classifiers (1994). Polio investigated how speakers of L1 Japanese (a classifier
language) and of L1 English (a non-classifier language) used Chinese classifiers. Despite
Polio’s claim that the nonnative speakers did not seem to have difficulty using classifiers
in obligatory contexts, her data revealed that all the classifier omission errors were
committed by L1 English speakers and none was made by the Japanese speakers. Thus, it
can be inferred that although the classifier is not a complicated structure, it does pose
problems for speakers of languages where this structure is absent. Therefore, it is

167

reasonable to speculate that language analytic ability did play a role in the implicit
condition in the learning of classifiers because the structure was a challenge, especially
when the metalinguistic information was unavailable.
While language analytic ability was related to the learning of classifiers in the
implicit condition, it was not the case with the learning of the aspect marker in the same
condition. Lack of awareness and the nature of the target structure might be jointly
accountable for this result. On one hand, the implicitness of the learning condition and
the non-saliency of the perfective –le may have made the feedback and the target
structure difficult to notice. Learners in this condition may therefore not have engaged in
syntactic processing by utilizing their analytic ability. On the other hand and more
importantly, because of the complexity and difficulty involved in learning the perfective
–le, learners were not able to make inductions or generalizations on the usage of the
structure through mere reliance on their internal resources (in this case, language analytic
ability), even if they were aware of the problems in their L2 production. In other words, it
is beyond all learners’ cognitive capacity to conduct syntactic processing of this linguistic
structure with recourse only to the limited external assistance in the form of implicit
feedback.
Consideration of the nature of the linguistic target also helps explain the conflicting
findings in previous studies (Sheen, 2007a; Trofimovich et al., 2007). Whereas Sheen
failed to find a significant correlation between the effects of recasts and language analytic
ability, Trofimovich et al. did. In Sheen’s study, the target structure is the English articles
(the and a/an), a non-salient, complex structure; in the study by Trofimivoch et al., the
linguistic structure is the English possessive determiners (his/her), a salient, simple

168

structure. Clearly it was difficult for learners to notice the English articles in an implicit
condition, but even if there was a high level of noticing, learners were likely unable to
extract rules about the articles by using their analytic ability. However, by taking
advantage of their analytic ability and the input from recasts, it is possible for learners to
solve problems related to the possessive determiners, a relatively easy structure; hence
the significant correlation in Trofimovich et al.’s study. Another related study, which is
cited by Robinson (2005), is by Robinson and Yamaguchi (1999), who found that “there
were nonsignificant correlations of learning of relative clauses [a complex structure]
during task-based interaction (supplemented by targeted recasts) and the grammatical
sensitivity aptitude subtest” (p.56). Taken together, these studies point to the possibility
that non-significant correlations between language analytic ability and the learning gains
under implicit conditions are attributable to the complex, difficult nature of the linguistic
structure, which may be beyond learners’ processing capacity.
Language analytic ability has been consistently found to correlate with learning
under explicit conditions where rule explanation or metalinguistic information is
available (Robinson, 1997; Erlam, 2005; Sheen, 2007a). This study is no exception: It
significantly correlated with the effects of explicit feedback in the learning of the
perfective –le. The mechanism through which this aptitude component works under this
condition seems simple: Learners with superior analytic ability are better at discovering
patterns in the input or processing and applying the knowledge “assimilated from external
sources” (Roehr & Ganem-Gutierrez, 2009, p. 167). In the case of the perfective –le, the
explicit feedback enhances learners’ awareness of the linguistic structure and provides
metalinguistic information about a very complex linguistic structure; learners likely

169

engaged in active processing of the information by applying their previous knowledge
and language analytic skills.
But why is the learning of classifiers not related to learners’ language analytic
ability in the explicit condition? Once again, this potentially related with the nature of the
target structure. Since the classifier is a relatively transparent structure and the provided
metalinguistic information is easy to process and internalize, learners’ individual
differences in language analytic ability was likely not drawn upon and therefore did not
impact learning in this condition. Therefore, in the case of classifiers, it is, as it were, the
availability of the metalinguistic information that led to the marginization of the role of
language analytic ability.
Working Memory
Variation in learners’ working memory capacity was found to be related to the
effects of explicit feedback in the learning of classifiers. Surrounding this finding, two
important questions need to be answered: (a) What is the mechanism through which
working memory functions in relation to the treatment effects? (b) Why is this aptitude
component not sensitive to the other learning conditions formed by feedback type and
choice of linguistic structure?
To answer the questions, it is necessary to revisit the architecture of working
memory and the functions of its components. As previously discussed, working memory
is characterized by simultaneous storage and manipulation of information. According to
Baddeley’s componential model (Baddeley, 1986, 2000, 2006, 2007; Baddeley & Logie,
1999), working memory is composed of a central executive and three slave systems: a
phonological loop, a visuospatial sketchpad, and an episodic buffer. The central executive

170

is responsible for information processing and integration and coordination between the
three subcomponents. The phonological loop encodes, temporarily stores, and rehearses
incoming auditory stimuli; the visuospatial sketchpad is a short-term store of visual and
spatial information; and the episodic buffer activates information in long-term memory,
constructs integrated representations, and encodes them in long-term memory as schema.
The processing demands of classifier learning through external assistance in the
form of metalinguistic correction seem a perfect match to the mechanism of working
memory. When the learner’s attention was brought to the target structure through the
provided feedback, the learner encoded the auditory stimuli (sound representations about
a classifier as well as the metalinguistic information) in the phonological loop, matching
the phonological codes with existing codes (e.g., sounds and tones the learner previously
learned) archived in long-term memory. This was followed by subvocal rehearsal of the
stored information. The central executive maintained the information in focal attention
and processed it for storage in long-term memory through the episodic buffer. The
cognitive processing may have taken place by matching a certain classifier with a noun
and analyzing the metalinguistic information; it may also have involved the inhibition of
other classifiers in the repertoire, which likely competed for the limited capacity of
working memory. Evidently, classifier learning in the explicit condition draws heavily on
the learner’s ability to store and process the provided information, which led to the
significant correlation between working memory and the treatment effects.
The significant correlation found in the explicit condition between working memory
and classifier learning has to do with consciousness, a defining feature of explicit
learning conditions. Almost all models of working memory, such as the Multiple-

171

Component Model (Baddeley & Logie, 1999), and the Executive Attention Model (Engle,
2002), the Embedded-Process Model (Cowan, 1999), acknowledge the role of
consciousness and attention control (also see Dehn, 2008; Paradis, 2009). Baddeley
pointed out that “as has become increasingly obvious over the years, conscious awareness
appears to be closely related to the executive control, and hence to the operation of
working memory” (2007, p. 302). Cowan argued that awareness “increases the number of
features encoded, and…allows new episodic representations to be available for explicit
recall” (1999, p. 62). Engle even stated that working memory is not about short-term span;
rather, it is about the ability to focus attention on relevant information and inhibit
irrelevant information. Indeed, in this study, learners’ ability to focus their attention on
the information contained in feedback and at the same time resist distracting information
may be critical to the development of their knowledge about classifier use.
The association between consciousness and working memory also explains why this
cognitive construct was not related with the effects of implicit feedback (in learning
either target structure). Implicit feedback was not intended to make the learner
conscious/aware of the target structures, and thus working memory may not have been
implicated in this learning condition. Schmidt (1990) stated that implicit/unconscious
processes are not susceptible to working memory capacity. Similarly, Ellis (2009)
pointed out that implicit learning does not implicate central attentional resources; explicit
learning, in contrast, relies heavily on working memory because it involves conscious
memorization of facts.
There is discrepancy in previous research with regard to the link between working
memory and the effects of implicit feedback. Trofimovich et al. (2007) did not find any

172

association between the effectiveness of recasts and measures of working and
phonological memory. Sagarra (2007), however, found such association. It is not clear
what caused such a discrepancy, but some speculations can be made based on the
research methods of these two studies. In both studies, recasts were provided via the
computer in discrete item practice, which made “the corrective nature of recasts more
salient”. The learning conditions are therefore explicit, which likely contributed to
learners’ consciousness of the learning tasks, and hence the significant correlation in
Sagarra’s study. The nonsignificant correlation in Trofimovich et al.’s study might be due
to two factors: (a) The test items were identical to treatment items, all of which involved
picture description, and (b) both posttests were immediate (which is also pointed out by
Sagarra). The identicalness between testing and treatment, the additional assistance from
picture cues, and immediacy of posttests (which will be discussed below) may have
minimized the role of individual differences in learners’ working memory.
A final explanation needs to be explored related to the absence of a connection
between working memory and the effectiveness of explicit feedback in the learning of the
perfective –le. It would seem that unlike classifier learning, which involves simultaneous
storage and processing of information, the perfective –le is purely rule-based. While rules
must be memorized for learning to occur, the greater or lesser capacity of the short-term
store is not as influential as it is in learning more exemplar-based structures such as the
classifier. Therefore, the learning of this structure likely does not tax the storage function
of working memory. Certainly, working memory is also responsible for information
processing, which is controlled by the central executive. However, the central executive
is often considered a hub of attention control (such as selective attention, attention

173

switching, resource allocation, etc.) and is rarely credited with syntactic processing. Of
course, the central executive must be responsible for a certain amount of syntactic
processing such as, in the case of classifier learning, quickly encoding and decoding a
permutation of classifier use (numeral + classifier + noun) when the related feedback is
provided. Nevertheless, the learning of complex morphosyntax might rely more on
language analytic ability than working memory, a construct that indexes the efficiency in
temporarily holding and processing information; this explains why it is the former, not
the latter, that correlates with the learning of the aspect marker in the explicit condition.
Detailed explanation on the relationship between these two aptitude components is
provided in the next section.
Language Analytic Ability vs. Working Memory
One interesting finding of this study is that in the implicit condition, language
analytic ability correlated with working memory but such a correlation was not found for
the explicit condition. Recall that the analyses related to the two cognitive factors were
th

based on the data contributed by learners’ in their 4 semester of study to control for
variation in the amount of instruction. A correlation analysis performed on the whole
dataset that involved all participants’ scores on the two variables indicated that there was
a significant correlation between them, r = 0.3, p = .01. This suggests that the correlation
between these constructs found in the implicit condition did not happen by chance. It also
suggests that there is a certain overlap between the two, and yet they are separate
constructs as the correlation is small. But how are they different and related? How do
they contribute differently to L2 learning?
The two aptitude components differ along the following dimensions. First and

174

foremost, they are measures of different cognitive abilities. Language analytic ability
refers to the learner’s sensitivity to linguistic regularities and ability to identify linguistic
patterns. Working memory is an indicator of the learner’s capacity in information storage
and online cognitive processing. Therefore, the former is more likely to be drawn upon in
tasks that place heavy demands on syntactic processing (either online or offline) as in the
learning of complex linguistic rules such as the Chinese perfective -le; the latter is most
useful in the learning of more data-driven, exemplar-based linguistic items. Second,
related to the first point, they interact differently with different learning conditions. For
instance, as this study demonstrates, the role of working memory is probably more
obvious in explicit learning conditions; language analytic ability is useful in implicit
conditions in the learning of simple rules and in explicit conditions in the learning of
complex rules. Third, language analytic ability is domain-specific whereas working
memory is domain-general. It is obvious that language analytic ability is only implicated
in language acquisition, or rather, adult second language acquisition (because first
language acquisition and child second language acquisition are not dependent upon
language analytic ability (DeKeyser, 2000; Harley & Hart, 1997; Sasaki, 1996)).
Working memory, however, is “a general-purpose system that can perform multiple
functions” (Dehn, 2008, p. 41) and has been found to be related with many academic
skills such as math reasoning, science, and so on (Gathercole & Aloway, 2008; McGrew
& Woodcock, 2001).
The relationship between language analytic ability and working memory is definitely
not one between oranges and apples—they work in concord in facilitating second
language development. The finding that working memory correlated with language

175

analytic ability, a core aptitude component of the MLAT, provides further evidence that
working memory is an aptitude component. Previous research also found working
memory to be related with the L2 learners’ overall performance in aptitude and aptitude
components as measured by the MLAT or similar test batteries (Robinson, 2002; Safar &
10

Kormos, 2008) . In fact, working memory, a measure of the ability to handle the
juggling of information storage and processing is considered an ideal substitute for the
Paired Associates subtest of the MLAT, which measures learners’ rote memory ability,
especially in form-focused instruction where linguistic forms are attended to during
meaningful communication.
Sawyer and Ranta pointed out that working memory may “serve as an arena in
which the effects of other components of aptitude are integrated” (2001, p. 342). For
example, phonetic coding, an aptitude component measured by the MLAT, is critical to
the functioning of the phonological loop of working memory, which relies heavily on the
phonemic awareness and efficiency in phonological encoding and decoding. Language
analytic ability affects the speed and efficiency of processing of the central executive
such that high analytic ability frees up space for the storage subsystems, and deficits in
analytic ability slows down processing and causes working memory overload. The
storage component of working memory, on the other hand, might impact the learner’s
capacity in rule identification and application. Temporary maintenance of information
affects comprehension of input which in turn influences the accuracy of language
analysis. Another bond between language analytic ability and working memory is
noticing. Robinson (1997) found that in the implicit condition of his study, there was a
strong relationship between grammatical sensitivity (language analytic ability) and

176

awareness. In Mackey et al.’s study (2002), learners with high working memory reported
more noticing of the target structure, confirming the link between working memory and
consciousness. And noticing, or conscious awareness, is a defining attribute of the central
executive of working memory.
Aptitude and Testing
It is interesting that the feedback-aptitude interactions found in this study are subject
to the timing of testing and type of measure: all correlations are related to delayed effects
and are demonstrated on different test formats. The correlation of aptitude measures with
the delayed effects of feedback is consistent with the findings of previous research (Ando
et al., 1992; Mackey et al., 2002; Trofimovich et al., 2007). Even in Sheen’s study (2007a)
where language analytic ability correlated significantly with both the immediate and
delayed effects of feedback, the correlations appeared stronger on delayed posttests. It is
not clear why it is so, but researchers have made some reasonable speculations. Robinson
(2002), noting that “immediate posttest performance shows very little relationship to
measures of IDs [individual differences]”, speculated that “learning continued as a
consequence of the immediate delayed transfer test experiences…[and] IDs in relevant
abilities contributed to the capacity to build on initial exposure during training, and
continue to learn during the posttests” (p. 204). Trofimovich et al. explained that the role
of language analytic ability may be greater when corrective feedback is no longer
available. Mackey et al. hypothesized that learners with greater working memory
capacity may have “gleaned more data to process and consolidated this over time”, in
comparison with learners with smaller working capacity who “could not ‘hold on’ to data
with great accuracy” (p. 204).

177

In essence, these speculations come down to two themes. One is that the immediacy
of testing may have leveled out the impact of variation in aptitude. The other is that
during the interval between instructional treatments and delayed tests, learners conduct
off-line processing of the data they obtained during the treatments. In addition to the two
possible explanations, another factor that comes into play might be the research setting.
This study, as well as the ones by Mackey et al. and Trofimovich et al., is conducted in
the laboratory and targets only one linguistic structure, which makes the treatments well
implemented. The lab setting may have partly made the immediate effects of individual
difference variables less obvious. This claim may find indirect support from Sheen’s
studies (2007a; 2007b) where correlations between language analytic ability and the
effectiveness of feedback were found at the time of immediate posting. Both Sheen’s
studies were conducted in classroom settings where there is more distraction than in
laboratory settings and where the effect of individual differences is more likely to surface
immediately. However, this speculation is tentative and is subject to further empirical
verification.
Another interesting finding related to testing effects on aptitude-feedback
correlations is that the correlations are constrained by test formats. First, with respect to
classifier learning, language analytic ability correlated with learners’ performance on the
grammaticality judgment test but not with elicited imitation test scores (the result relates
to implicit feedback). However, the results related to the perfective –le showed that
language analytic ability correlated with the gain scores on both tests (the result relates to
the explicit feedback). It would appear easy to understand the correlation between
language analytic ability with GJT scores because it was measured with a written test that

178

taps into metalinguistic knowledge, which was obviously utilized in the GJT tests.
Somewhat unexpected is the fact that language analytic ability was also drawn upon in
the EI test for the aspect marker. It is possible that learners had proceduralized the
knowledge obtained through the application of their analytic ability under the assistance
of the metalinguistic information afforded in the explicit feedback. The proceduralized
knowledge became accessible during the EI test.
Working memory scores were correlated with the GJT scores that are related to
classifier learning in the explicit condition but not with the EI test scores. The nonsignificant correlation between working memory and EI test results has two implications.
First, it provides evidence that the EI test is not a measure of memory. An EI test item
involves comprehending a verbally presented sentence stimulus, judging the semantic
applicability to the test taker per se, and repeating it in a correct way. It is tempting to
consider the test as being related with the test taker’s memory capacity. The fact that the
test results were not associated with working memory scores further demonstrates the
validity of the test as a measure of the learner’s knowledge about the target structure in
question; working memory, necessary as it may be in completing the test, did not have
substantial impact on the test scores.
Second, accuracy in online comprehension and oral production of the target
structure was not reliant on working memory. This is because the EI test measures
implicit, automatized knowledge, which is accessed through long-term memory instead
of working or short-term memory. Working memory involves conscious processing, so if
skills or knowledge are fully automatized, performance or activation does not need much
support from working memory (Conway & Engle, 1994; Gathercole & Baddeley, 1993;

179

Montgomery, 1996; Schmidt, 1990). Dehn (2008) elaborated that comprehension of
spoken language happens immediately when related information is directly retrieved
from long-term memory; “[t]his activated long-term information automatically facilitates
comprehension without the necessity of creating a working memory representation” (p.
98). And there is empirical evidence to back up the argument. For instance, Walters and
Caplan (2004) found that differences in online syntactic processing did not relate to
working memory capacity. Also, results from the WJ III Tests of Cognitive Ability
revealed that among all the academic skills (reading, writing, math, etc.), oral expression
showed the weakest correlation with working memory, r = 0.38, all other coefficients
being greater than 0.50 (McGrew & Woodcock, 2001).
The association between working memory and the GJT results is subject to two
possibilities. First, it is possible that during the test, the learner was able to engage in
conscious retrieval of the information encoded and registered thorough working memory
during the treatment. It is also possible that the information available for use during the
test was the information that, after initial encoding and registration through working
memory during the treatment, was stored as explicit knowledge in declarative memory
(Paradis, 2009). Second, whereas working memory does not seem to be required in oral
production that involves instant access to automatized knowledge, research has
consistently shown that it is implicated in reading comprehension, both in L1 and L2
learning (e.g., Deneman & Carpenter, 1980; Leeser, 2007). The GJT is a written test and
accuracy in learners’ performance clearly involves comprehension of the sentence stimuli.

180

CHAPTER 6 CONCOUSION
The main objective of this study is to investigate the extent to which the
effectiveness of corrective feedback is mediated by learner-external factors such as
explicitness of feedback and the nature of the linguistic target, as well as learner-internal
factors such as proficiency and individual differences in language analytic ability and
working memory. It attempts to adopt a holistic, integrated approach to the efficacy of
feedback, overcoming the limitation of a monolithic, isolated approach. Noteworthy is
the fact that this study is the first attempt to address the three-way interaction between
feedback type, learners’ proficiency, and the choice of target structure. It is also the first
attempt to tackle the complicated relationship between aptitude components, feedback
type, and the nature of the linguistic target.
Previous research has demonstrated that recasts did not fare well in classroom
settings (e.g., Lyster, 2004; Sheen, 2010) but always seemed to be beneficial in
laboratory settings (e.g., Han, 2002; Lyster & Izquierdo, 2009) where the feedback
became salient. However, this study showed that the effects of recasts were also
determined by the nature of the target structure and learners’ overall L2 proficiency: The
feedback benefited less advanced learners in the learning of the easy, simple structure
(Chinese classifiers), but not the hard, complex structure (the Chinese perfective -le);
whereas less advanced learners did not benefit from the feedback in the perfective –le,
more proficient learners did. Also, while proficiency was a mediator in the learning of the
perfective –le, it was not in the learning of classifiers. The results were interpreted from
multiple perspectives: L2 learners’ input processing strategies, saliency of the linguistic
structure, noticing of feedback, and amount of available cognitive resources. Furthermore,

181

the delayed effects of recasts were larger than the immediate effects in the learning of the
perfective –le at the higher proficiency level. This indicates that in the learning of a
complex structure, the effects of recasts are well maintained and may even increase over
time—a finding that is significant and needs further investigation.
Previous research has revealed an almost unequivocal advantage of explicit
feedback over implicit feedback (Carroll & Swain, 1992; Ellis et al., 2006), and similarly,
research syntheses on the effectiveness of second language instruction in general showed
a superior effect of explicit instruction over implicit instruction (Norris & Ortega, 2000;
Spada & Tomita, 2010). This study showed that the “explicit-better-than-implicit” claim
was only true for low-proficiency learners, particularly in the learning of the perfective –
le, a complex structure, and that the difference did not obtain for high-proficiency
learners. The result points to the importance of external assistance in prompting the
learner to notice the linguistic target at the beginning stage of SLA. Beginners, therefore,
should be provided with explicit feedback to achieve optimal learning outcome. The
researcher argued for the utilization of implicit feedback for advanced learners because (1)
it is as effective as explicit feedback, (2) its effects are better retained, (3) it is an ideal
strategy for form-focused instruction, and (4) according to the Socio-Cultural Theory, it
affords the appropriate amount of scaffolding in the learner’s zone of proximal
development and is conducive to making one a more autonomous learner. Certainly,
more empirical evidence needs to be accumulated before any definitive claims can be
made on the relationship between the explicitness of feedback (or instruction) and level
of proficiency. No research syntheses in SLA have investigated proficiency as a
mediating factor for instructional treatments because there is a lack of primary studies

182

that include proficiency as an independent variable (Li, 2010).
Ellis et al. (2006) noted that there is a tendency in feedback research to use tests of
explicit knowledge, and consequently, the obtained results are biased toward explicit
feedback. To minimize such a bias and have a comprehensive view of the effects of
feedback, this study included an EI test as a measure of implicit knowledge and a GJT as
a measure of explicit knowledge. The EI tests showed larger effects for feedback than the
GJTs. Meta-analyses on the effectiveness of corrective feedback also showed larger
effects for measures of implicit knowledge (Li, 2010; Saito & Lyster, 2010). While a
claim can be made that feedback contributed to the acquisition of implicit knowledge, it
was speculated that the larger effects shown by the EI tests result from several sources:
congruence of treatments with tests, ceiling effects for the learners’ explicit knowledge,
and the typological features of the target language. It was argued that reaction time
should be included as a component of an EI test because it is an indicator of the degree of
automaticity. It was further argued that the implicit-explicit distinction is continuous and
relative and might be constrained by multiple factors such as proficiency, instructional
setting, and transfer of L1 processing strategy.
In response to the call from researchers of language aptitude for the investigation of
aptitude components and of aptitude-treatment interaction (Robinson, 1997, 2002, 2005)
and from researchers of corrective feedback for the investigation of individual difference
variables (Ellis & Sheen, 2006), the study explored the interrelation between feedback
type, the linguistic structure, and learners’ variation in language analytic ability and
working memory. It was found that language analytic ability was sensitive to the effects
of explicit feedback in the learning of the perfective –le and to the effects of implicit

183

feedback in the learning of classifiers. It was argued that the provision or lack of
metalinguistic information about the two target structures was accountable for the
interaction between language analytic ability and the learning conditions. Working
memory was found to be correlated only with the effects of the explicit condition in the
learning of classifiers. Interpretations were sought through the mechanism of working
memory, particularly the functions of the central executive.
The relationship between the two cognitive variables and learning conditions is
mediated by the measures of feedback effects and the timing of testing. Language
analytic ability correlated with both the GJT and EI test results pertaining to the
perfective –le (in the explicit condition), but only with the GJT results related with
classifiers (in the implicit condition). The speculation is that the acquired explicit
knowledge about the perfective –le was proceduralized, and that it is the proceduralized
knowledge that was retrieved during the EI test. The finding that working memory was
related with the GJT results but not with the EI test results is probably attributable to the
fact that the autonomized knowledge activated during the test was derived from longterm memory. The explicit knowledge measured in the GJT test was activated and
processed through working memory. Finally, it was found that the two individual
difference variables only correlated with the delayed effects but not the immediate effects
of feedback. The immediacy of posting, the research setting, and possible offline
processing on the part of the learners may all contribute to the finding.
This study has the following methodological strengths. First, the participants were
recruited from two different instruction settings and were randomly assigned to different
treatment conditions. The obtained results are likely to be more generalizable because the

184

participants represent a larger learner population. Second, the contexts for the obligatory
use of the target structures were established based on native speakers’ speech data
extracted from previous studies (in the case of the perfective -le) or on native speakers’
responses to questionnaire items regarding the usage of the target structure (in the case of
classifiers). The validity of the obligatory contexts warranted the provision of feedback
during the treatment tasks. Third, the tests used in this study have proven to be valid
measures of the related constructs. The HSK, which was used to measure learners’
proficiency, has been recognized across the world as an authoritative test for L2 Chinese
learners. All test items in the GJT and EI tests were piloted among native speakers of the
target language. The Words in Sentences subtest of the MLAT was used as a measure of
language analytic ability, and the MLAT has proven to be a valid test of language
aptitude by numerous studies. A listening span test rather than a reading span test was
used to measure working memory to accommodate the fact that feedback was provided
verbally and the encoding and decoding of auditory stimuli was pivotal for the
internalization of feedback. All sentence stimuli in the working memory test are
developed and validated by Walters and Caplan (1996) and have been used a number of
SLA studies (Leeser, 2007; Sagarra, 2007). Also, a non-word repetition or digit span test
was not used because they measure phonological short-term memory, which is passive in
nature and does not involve information processing (Baddeley, 2003, 2007). Fourth, with
respect to data analyses, effect sizes were calculated for all pairwise contrasts. Effect
sizes complemented p-values in the interpretation of results and made it possible to
examine the effects of feedback across target structures and test formats.
Last but not least, as with all studies, this study has limitations. First, the fact that the

185

study was carried out in a laboratory is a double-edged sword. On one hand, in laboratory
settings experiments can be better implemented and variables can be better controlled
than in classroom settings. Therefore, the results are less likely to result from latent,
distracting variables and can be interpreted with more precision. The processes and
principles underlying L2 learning can be clearly scrutinized and inspected. On the other
hand, because of the different dynamics of laboratory and classroom settings, different
results may have been obtained had the study been contextualized in the classroom
setting. For instance, the role of individual differences in L2 learning in the laboratory
may be less obvious than in the classroom as a result of the availability of more external
assistance in the laboratory setting. Future research may investigate the interaction
between feedback type, the linguistic target, and proficiency in a classroom setting, or
how individual difference variables affect the effects of feedback differently in classroom
and laboratory settings. Second, although the sample size is large (n = 78), the cell sizes
are relatively small because of the number of groups (n = 6) the participants were
assigned to. The sample sizes of interaction studies, characterized by dyadic interaction
and multiple sessions, are typically smaller than other types of research such as
psycholinguistic studies because of incurred logistic problems. Regardless, increasing the
sample size may generate slightly different results; for instance, non-significant results
are likely to turn significant. Third, the duration of treatment is short (less than1 hour),
which might be partly responsible for the lack of effects for the low-implicit group in the
learning of the perfective –le. More exposure to the target structure could have led to
better performance by this group.
Fourth, Pearson’s correlation analyses were performed to identify the relationship

186

between the two cognitive variables and the learning outcome of different learning
conditions. It is a well-known truth that caution must be exercised where correlation
coefficients are interpreted. As Field (2005) pointed out, there are two pitfalls with regard
to bivariate correlations—the third variable problem and direction of causality. The
former refers to the fact that there may be unmeasured variables that affected the results.
The latter suggests that it is not know which of the two involved variables is the “cause”
and which is the “effect”. As far as this study is concerned, it would have been ideal to
investigate working memory and language analytic ability as dichotomous rather than
continuous variables, in which case it would have been possible to determine whether the
difference between the learners along the two dimensions reacted differently to the
interventional treatments. However, the relatively small cell sizes made it difficult to
dichotomize the two individual difference variables. Increasing the sample size would be
what faces future researchers to better examine the impact of cognitive factors on the
effectiveness of different feedback conditions.

187

NOTES
1

Neither the information about the number of subjects involved nor the biographic

information about the data contributors was provided.
2
3

See Sawyer and Ranta (2001) for a review.
All three L1 Korean learners reported having stayed in the U.S. for more than five

years, and all were enrolled in academic programs at the data-contributing universities.
However, the extent to which they resemble L1 English speakers in terms of English
proficiency is uncertain. This might cause a concern when these learners take the two
aptitude tests—their L2 English background might make their test performances different
from their L1 English peers. For this reason, they were placed in the two control groups,
whose aptitude test scores were not used in the data analyses, which made the
randomness of group assignment relative rather than absolute.
4

Clearly, to a second language learner of Chinese whose native language (English) is

a non-classifier language, there are two challenges associated with classifier learning.
First, the learner must develop the awareness that a classifier must be used between a
determiner and the following nominal phrase. Second, the learner must choose a proper
classifier depending on the physical properties of the noun. As previously discussed,
classifiers are semantically related with the physical characteristics of the objects they cooccur with. However, the connections between many classifiers and their accompanying
nouns have become invisible and appear arbitrary as a result of the fact that the language
has changed. From a pedagogical perspective, while it is possible to explain the rationale
behind the choice of a classifier in some cases, it is not always realistic to do so because

188

of the seeming arbitrariness in terms of the connection between the classifier and the
corresponding nominal phrase. To provide consistent instructional treatment, in this study,
the explicit feedback has two components: informing the learner that a classifier is
required and providing the correct classifier.
5
6

The tasks for classifiers were revised based on Li (2009).
Alternatively, the test taker is asked to decide whether he/she agrees or disagrees

with the statement.
7

States verbs (such as “like”) were not included in the treatment or test. States verbs

are atelic, so a delimiting device such as time duration must be added to warrant the use
of –le. Though theoretically possible, using states verbs with a time period sounds odd
and is therefore uncommon in Chinese as well as other languages.
8

Recasts showed some effects on posttests as compared with pretest results, but did

not show any effects when compared with the control group.
9

Ceiling levels of explicit knowledge were also found by Ellis et al. (2006).

10

Robinson (2002) reported a correlation of 0.35 between working memory and

aptitude as measured by the MLAT; Safar & Kormos (2008) found that working memory
correlated with inductive language learning ability at r = 0.33, and with the global
aptitude scores at r = 0.36. These results, together with the correlation (r = 0.3) between
working memory and language analytic ability consistently demonstrate that (1) working
memory is moderately related to aptitude and components of aptitude measured by the

189

MLAT and yet it is a separate construct, and (2) it is justified to consider working
memory an aptitude component.

190

APPENDICES

191

APPENDIX A A Sample Card Used in Picture Description*

*For interpretation of the references n this and all other figures, the reader is referred to the electronic version of this dissertation.

192

APPENDIX B A Sample Picture Set Used in “Spot the Differences”

193

APPENDIX C Table C-1. Perfective –le: Descriptive Statistics Related to Raw Scores
Pretest

Posttest 2

Posttest 3

Test

Proficiency

Group

n

GJT

Low

Implicit

14

Mean
5.75

SD
1.01

Mean
7.75

SD
1.99

Mean
7.89

SD
2.31

Explicit

15

4.70

1.81

11.11

1.91

9.00

2.45

Control

10

3.80

0.95

5.20

1.77

5.35

1.53

Implicit

14

7.29

1.88

9.75

2.14

11.61

2.19

Explicit

14

6.57

1.86

12.23

1.36

1.096

3.27

Control

11

7.50

2.30

8.59

2.88

9.68

1.78

Implicit

14

1.79

1.37

6.18

2.62

4.46

2.79

Explicit

15

1.36

1.03

8.21

2.73

5.23

2.55

Control

10

1.70

1.71

2.85

2.53

3.05

1.77

Implicit

14

6.50

3.61

11.25

2.06

10.75

2.49

Explicit

14

5.31

3.81

11.88

2.53

10.67

3.69

Control

11

6.41

3.48

8.09

4.02

7.00

2.91

High

EI

Low

High

194

APPENDIX D Table D-1. Classifiers: Descriptive Statistics Related to Raw Scores
Pretest
Test

Proficiency

GJT

Low

Group

n

Implicit

Posttest 1

Posttest 2
Mean
8.21

SD
2.07

15

4.80

1.41

9.83

2.43

8.93

1.74

10

5.20

0.88

5.30

1.09

5.40

2.01

Implicit

14

6.29

0.80

10.61

2.37

10.18

2.59

14

6.07

0.89

11.68

2.54

10.78

2.51

Control

11

6.77

0.98

7.73

1.33

7.01

1.11

Implicit

14

2.07

1.66

6.68

2.38

6.18

2.78

Explicit

15

1.57

1.10

8.13

2.91

6.96

3.55

Control
High

SD
1.48

Explicit

Low

Mean
7.86

Control

EI

SD
1.39

Explicit

High

14

Mean
5.71

10

2.25

1.79

3.80

2.36

3.95

2.27

Implicit

14

4.21

1.88

8.86

3.00

8.29

2.91

Explicit

14

4.32

2.21

10.61

2.93

10.04

2.79

Control

11

5.32

2.02

6.41

2.05

7.41

2.11

195

REFERENCES

196

Ahrens, K. (1994). Classifier production in normals and aphasics. Journal of Chinese
Linguistics. 22, 202-247.
Aljaafreh , A. , & Lantolf , J. ( 1994 ). Negative feedback as regulation and second
language learning in the zone of proximal development. Modern Language Journal,
78, 465–483 .
Ammar, A., & Spada, N. (2006). One size fits all? Recasts, prompts, and L2 learning.
Studies in Second Language Acquisition, 28, 543-574.
Anderson, R. (1989). Unpublished lecture in the seminar on the acquisition of tense and
aspect. University of California, Los Angeles.
Avons, S., Wragg, C., Cupples, L., & Lovegrove, W. (1998). Measure of phonological
short-term memory and their relationship to vocabulary development. Applied
Psycholinguistics, 19, 583-601.
Baddeley, A. (1986). Working memory. Oxford: Oxford University Press.
Baddeley, A. (2000). The episodic buffer: A new component in working memory?
Trends in Cognitive Science, 4, 417-423.
Baddeley, A. (2003). Working memory and language: An overview. Journal of
Communication Disorders, 36, 189-208.
Baddeley, A. (2006). Working memory: An overview. In S. Pickering (Ed.), Working
memory and education (pp. 1-31). Burlington, MA: Academic Press.
Baddeley, A. (2007). Working memory, thought, and action. Oxford: Oxford University
Press.
Baddeley, A., Gathercole, S., & Papagno, C. (1998). The phonological loop as a language
learning device. Psychological Review, 105, 158-173.
Baddeley, A., & Hitch, G. (1994). Developments in the concept of working memory.
Neuropsychology, 8, 485-493.
Baddeley, A., & Logie, R. (1999). Working memory: The multiple-component model. In
A. Miyake & P. Shah (Eds.), Models of working memory: Mechanisms of active
maintenance and executive control (pp. 28-61). Cambridge: Cambridge University
Press.
Carpenter, H., Jeon, S., MacGregor, D., & Mackey, A. (2006). Learners’ interpretations
of recasts. Studies in Second Language Acquisition, 28, 209-236.
Carroll, J. (1962). The prediction of success in intensive foreign language training. In R.

197

Glaser (Ed.), Training research and education (pp. 87–136). Pittsburgh: University
of Pittsburgh Press.
Carroll, J. (1973). Implications of aptitude test research and psychological theory for
foreign language teaching. International Journal of Psycholinguistics, 2, 5-14.
Carroll, J. B. (1981). Twenty-five years of research on foreign language aptitude. In K. C.
Diller (Ed.), Individual differences and universals in language learning aptitude (pp.
83–118). Rowley, MA: Newbury House.
Carroll, J. (1993). Human cognitive abilities: A survey of factor-analytic studies. New
York: Cambridge University Press.
Carroll, J., & Sapon, S. (1959). Modern Language Aptitude Test. New York: The
Psychological Corporation/Harcourt Brace Jovanovich.
Carroll, J., & Sapon, S. (2002). Manual for the MLAT. N. Bethesda, Maryland: Second
Language Testing, Inc.
Carroll, S., & Swain, M. (1993). Explicit and implicit negative feedback: An empirical
study of the learning of linguistic generalizations. Studies in Second Language
Acquisition, 15, 357-386.
Chang, W. (1986). The particle le in Chinese narrative discourse: An interactive
description. Ph.D. dissertation. The University of Florida, Gainesville.
Chang, Hsianghua. (2002). Child acquisition of the aspect marker –le in Mandarin
Chinese. Master’s thesis. Michigan State University, East Lansing.
Chao, Y. (1968). A grammar of spoken Chinese. Berkeley: University of California Press.
Chen, H. (1996). A study of the effect of corrective feedback on foreign language
learning: American students learning Chinese classifiers. Ph.D. dissertation.
University of Pennsylvania, Philadelphia.
Chou, C., Eagar, J., & Chiang, J. (1999). A new China: Intermediate reader of modern
Chinese. Princeton: Princeton University Press.
Comrie, B. (1976). Aspect. Cambridge: Cambridge University Press.
Conway, A., & Engle, R. (1994). Working memory and retrieval: A resource dependent
resource inhibition model. Journal of Experimental Psychology: General, 123, 354373.
Conway, A., Jarrold, C. Kane, M., Miyake, A., & Towse, J. (2007) (Eds.). Variation in
working memory. Oxford: Oxford University Press.

198

Cowan, N. (1999). An embedded-process model of working memory. In A. Miyake & P.
Shah (Eds.), Models of working memory (pp. 62-101). Cambridge: Cambridge
University Press.
Cook, V. (1996). Second language learning and language teaching. London: Arnold.
Christensen, M. (1994). Variation in spoken and written Mandarin narrative discourse.
Ph.D. dissertation. Ohio State University, Columbus.
Craig, C. (1986). Introduction. In C. Craig (Ed.), Noun classes and categorization (pp. 111). Philadelphia: John Benjamins.
Cronbach, L., & Snow, R. (1977). Aptitude and instructional methods: A handbook for
research on interactions. New York: Irvington.
Dehn, M. (2008). Working memory and academic learning: Assessment and intervention.
Hoboken, NJ: John Wile & Sons, Inc.
DeKeyser, R. (1993). The effect of error correction on L2 grammar knowledge and oral
proficiency. The Modern Language Journal, 77, 501-514.
DeKeyser, R. (2000). The robustness of critical period effects in second language
acquisition. Studies in Second Language Acquisition, 22, 499-533.
DeKeyser, R. (2005). What makes learning second language grammar difficult? A review
of issues. Language Learning, 55, 1-25.
DeKeyser, R. (2007). Skill acquisition theory. In B. VanPatten & J. Williams, Theories in
second language acquisition (pp. 97-113). Mahwah, New Jersey: Lawrence
Erlbaum Associates.
DeKeyser, R. (Ed.) (2008). Practice in a second language: Perspectives from applied
linguistics and cognitive psychology. Cambridge: Cambridge University Press.
Daneman, M. (1991). Working memory as a predictor of verbal fluency. Journal of
Psycholinguistic Research, 20, 445-464.
Daneman, M., & Carpenter, P. (1980). Individual differences in working memory and
reading. Journal of Verbal Learning and Verbal Behavior, 19, 450-466.
Daneman, M., & Merikle, P. (1996). Working memory and language comprehension: A
meta-analysis. Psychonomic Bulletin & Review, 3, 422-433.

199

Dörnyei, Z. (2005). The psychology of the language learner: Individual differences in
second language acquisition. Mahwah, NJ: Lawrence Erlbaum Associates.
Dörnyei, Z., & Skehan, P. (2003). Individual differences in second language learning. In
Catherine D. and Michael L. (Eds.) Handbook of Second Language Acquisition
(pp.589-630). Malden, MA: Blackwell Publishing Ltd
Doughty, C. (2001). The cognitive underpinnings of focus on form. In P. Robinson (Ed.),
Cognition and second language instruction (pp. 206-257). Cambridge: Cambridge
University Press.
Doughty, C., & Varela, E. (1998). Communicative focus on form. In C. Doughty & J.
Williams (Eds.), Focus on form in classroom second language acquisition (pp.114–
138). New York: Cambridge University Press.
Doughty, C., & Williams, J. (1998). Focus on form in classroom second language
acquisition. Cambridge: Cambridge University Press.
Duff, P., & Li, D. (2002). The acquisition and use of perfective aspect in Mandarin. In R.
Salaberry & Y. Shirai (Eds.), The L2 acquisition of tense-aspect morphology (pp.
417-452). Philadelphia: John Benjamins.
Egi , T. ( 2007 ). Recasts, learners’ interpretations, and L2 development . In A. Mackey
(Ed.), Conversational interaction in second language acquisition: A collection of
empirical studies (pp. 249–267). Oxford : Oxford University Press .
Ehrman, M., & Oxford, R. (1995). Cognition plus: Correlates of language learning
success. The Modern Language Journal, 79, 67-89.
Ellis, N. (2008). Implicit and explicit knowledge about language. In J. Cenoz and N.
Hornberger (Eds.), Encyclopedia of language and education (pp. 119-131). New
York: Springer.
Ellis, N., & Sinclair, S. (1996). Workign memory in the acquisition of vocabulary and
syntax: Putting language in good order. The Quarterly Journal of Experimental
Psychology, 49A, 234-250.
Ellis, R. (2001). Investigating form-focused instruction. Language Learning, 51(Suppl. 1),
1-46.
Ellis, R. (2004). The definition and measurement of L2 explicit knowledge. Language
Learning, 54, 227-275.
Ellis, R. (2005). Measuring implicit and explicit knowledge of a second language: A
psychometric study. Studies in Second Language Acquisition, 27, 141-172.

200

Ellis, R. (2006). Modeling learning difficulty and second language proficiency: The
differential contributions of implicit and explicit knowledge. Applied Linguistics, 27,
431-463.
Ellis, R. (2007). The differential effects of corrective feedback on two grammatical
structures. In A. Mackey (Ed.), Conversational interaction in second language
acquisition (pp.339-360). New York: Oxford University Press.
Ellis, R. (2008). The study of second language acquisition. Oxford: Oxford University
Press.
Ellis, R. (2009). Implicit and explicit learning, knowledge and instruction. In R. Ellis, S.
Loewen, C. Elder, R. Erlam, J. Philp, & H. Reinders (Eds.), Implicit and explicit
knowledge in second language learning, testing and teaching (pp. 3-25).
Tonawanda, NY: Multilingual Matters.
Ellis, R. (2010). Epilog: A framework for investigating oral and written corrective
feedback. Studies in Second Language Acquisition, 32, 335-349.
Ellis, R., Basturkmen, H., & Loewen, S. (2001). Learner uptake in communicative ESL
lessons. Language Learning, 51, 281-318.
Ellis, R., Loewen, S., Elder, C., Erlam, R., Philp, J., & Reinders, H. (Eds.) (2009).
Implicit and explicit knowledge in second language learning, testing and teaching.
Tonawanda, NY: Multilingual Matters.
Ellis, R., Loewen, S., & Erlam, R. (2006). Implicit and explicit corrective feedback and
the acquisition of L2 grammar. Studies in Second Language Acquisition. 28, 339–
368.
Ellis, R., & Sheen, Y. (2006). Reexamining the role of recasts in second language
acquisition. Studies in Second Language Acquisition, 28, 575-600.
Engle, R. (2002). Working memory capacity as executive attention. Current Directions in
Psychological Science, 11, 19-23.
Erbaugh, M. (1986). Taking stock: The development of Chinese noun classifiers
historically and in young children. In C. Craig (Ed.), Noun classes and
categorization (pp. 399-436). Philadelphia: John Benjamins.
Erbaugh, M. (2001). The Chinese pear stories: Narratives across seven dialects.
Available at http://www.pearstories.org/
Erlam, R. (2005). Language aptitude and its relationship to instructional effectiveness in
second language acquisition. Language Teaching Research, 9,147–171.

201

Erlam, R. (2006). Elicited imitation as a measure of L2 implicit knowledge: An empirical
validation study. Applied Linguistics, 27, 464-491.
Field, A. (2005). Discovering statistics using SPSS. Thousand Oaks, CA: SAGE
Publications Inc.
French, L. (2006). Phonological working memory and second language acquisition:
Developmental study of Francophone children learning English in Quebec. New
York: The Edwin Mellen Press
Gardner, R. (1985). Social psychology and second language learning: The role of
attitudes and motivation. London: Arnold.
Gass, S. (1997). Input, interaction, and the second language learner. Mahwah, NJ:
Erlbaum.
Gass, S. (2003). Input and interaction. In C. J. Doughty & M. H. Long (Eds.), The
handbook of second language acquisition (pp. 224–255). Malden, MA: Blackwell.
Gass, S. (2004). Conversation and input-interaction. The Modern Language Journal, 88,
579-616.
Gass, S., & Selinker, L. (2008). Second language acquisition: An introductory course.
New York: Routledge.
Gass, S., Svetics, I., & Lemelin, S. (2003). Differential effects of attention. Language
Learning, 53, 497-546.
Gass, S., & Varonis, E. (1994). Input, interaction and second language production.
Studies in Second Language Acquisition, 16, 283-302.
Gathercole, S., & Alloway, T. (2008). Working memory and learning: A practical guide
for teachers. London: Sage Publications.
Gathercole, S., & Baddeley, A. (1993). Working memory and language. East Sussex, UK:
Lawrence Erlbaum.
Gathercole, S., Frankish, C., P:ickering, S., & Peaker, S. (1999). Phonotactic influences
on short-term memory. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 25, 84-95.
Goldschneider, J., & DeKeyser, R. (2005). Explaining the “natural order of L2 morpheme
acquisition” in English: A meta-analysis of multiple determinants. Language
Learning, 55 (Suppl.), 27-77.
Han, Z. (2002). A study of the impact of recasts on tense consistency in L2 output.

202

TESOL Quarterly, 36, 543–572.
Harley, B., & Hart, D. (1997). Language aptitude and second language proficiency in
classroom learners of different starting ages. Studies in Second Language
Acquisition, 19, 379-400.
Harley, B., & Hart, D. (2002). Age, aptitude, and second language learning on a bilingual
exchange. In P. Robinson (Ed.), Individual differences and instructed language
learning (pp. 301-330). Philadelphia: John Benjamins.
Harrington, M. (1991). Individual differences in L2 reading: Processing capacity versus
linguistic knowledge. Paper presented at the Annual Meeting of the American
Association of Applied Linguistics.
Harrington, M., & Sawyer, M. (1992). L2 working memory capacity and L2 reading skill.
Studies in Second Language Acquisition, 14, 25-38.
Havranek, G. & Cesnik, H. (2001). Factors affecting the success of corrective feedback.
In S. Foster-Cohen & A. Nizegorodzew (Eds.). EUROSLA Yearbook, Volume 1.
Amsterdam: John Benjamins.
Huang, W., & Ao, Q. (2002). Chinese language and culture: An intermediate reader.
Hong Kong: The Chinese University Press.
Hulstijn, J. (2002). Towards a unified account of the representation, processing, and
acquisition of second language knowledge. Second Language Research, 18, 193223.
Hulstijn, J., & de Graaff, R. (1994). Under what conditions does explicit knowledge of a
second language facilitate the acquisition of implicit knowledge? A research
proposal. AILA Review, 11, 97-112.
Hummel, K. (2009). Aptitude, phonological memory, and second language proficiency in
nonnovice adult learners. Applied Psycholinguistics, 30, 225-249.
Ishida, M. (2004). Effects of recasts on the acquisition of the aspectual form –te i-(ru) by
learners of Japanese as a foreign language. Language Learning, 54, 311-394.
Iwashita, N. (2003 ). Positive and negative input in task-based interaction: Differential
affects on L2 development. Studies in Second Language Acquisition, 25, 1-36 .
Juffs, A. (2004). Representation, processing, and working memory in a second language.
Transactions of the Philological Society, 102, 199-225
Just, M., & Carpenter, P. (1992). A capacity theory of comprehension: Individual
differences in working memory. Psychological Review, 99, 122-49.

203

Kim, H. R., & Mathes, G. (2001). Explicit vs. implicit corrective feedback. The Korean
TESOL Journal, 1, 57-72.
Krashen, S. (1981). Second language acquisition and second language learning. Oxford:
Pergamon.
Krashen, S. (1994). The input hypothesis and its rivals. In N. Ellis (Ed.), Implicit and
explicit learning of languages (pp. 45-78). London: Academic Press.
Krashen, S. (1995). Principles and practice in second language acquisition.
Hertfordshire, England: Prentice Hall Europe.
Lantolf, J. (Ed.) (2009). Sociocultural Theory and second language learning. Oxford:
Oxford University Press.
Leeman, J. (2003). Recasts and second language development: Beyond negative evidence.
Studies in Second Language Acquisition, 25, 37–63.
Leeser, M. (2007). Learner-based factors in L2 reading comprehension and processing
grammatical form: Topic familiarity and working memory. Language Learning, 57,
229-270.
Lehto, J. (1996). Are executive function tests dependent on working memory capacity?
The Quarterly Journal of Experimental Psychology, 94A, 29-50
Li, C. & Thompson, S. (1981). Mandarin Chinese: A functional reference grammar. Los
Angeles, CA: University of California Press.
Li, P., & Shirai, Y. (2000). The acquisition of lexical and grammatical aspect. Berlin:
Mouton de Gruyter.
Li, S. (2009). The differential effects of implicit and explicit feedback on L2
learners of different proficiency levels. Applied Language Learning, 19, 53-79.
Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis.
Language Learning, 60, 309-365.
Li, W. (2000). Numeral-classifiers as a grounding mechanism in mandarin Chinese.
Journal of Chinese linguistics, 28, 337-367.
Loewen, S. (2004). Uptake in incidental focus on form in meaning-based ESL lessons.
Language Learning, 54, 153–188.
Loewen, S. (2005). Incidental focus on form and second language learning. Studies in
Second Language Acquisition, 27, 361-386.

204

Loewen, S., & Nabei, T. (2007). Measuring the effects of oral corrective feedback on L2
knowledge. In A. Mackey (Ed.), Conversational interaction in second language
acquisition (pp. 361-377). New York: Oxford University Press.
Loewen, S., & Philp, J. (2006). Recasts in the adult English L2 classroom: characteristics,
explicitness, and effectiveness. The Modern Language Journal, 90, 536-556.
Lightbown, P. (2008). Transfer appropriate processing as a model for class second
language acquisition. In Z. Han (Ed.), Understanding second language process (pp.
27-44). Clevedon, UK: Multilingual Matters.
Liu, X. (2001). Explaining the grammatical meaning of the sentence-final le in modern
th
Chinese. Paper presented the 10 International Conference of Chinese Linguistics,
th

in conjunction with the 13 North American Conference on Chinese Linguistics.
University of California.
rd

Liu, Y, Yao, T., Bi, N., Ge, L., & Shi, Y. (2009). Integrated Chinese (3 ed.). Boston:
Cheng & Tsui Company.
Long, M. H. (1996). The role of the linguistic environment in second language
acquisition. In W. C. Ritchie & T. K. Bhatia (Eds.), Handbook of language
acquisition. Vol. 2: Second language acquisition (pp. 413-468). New York:
Academic Press.
Long, M. H. (2007). Problems in SLA. Mahwah, NJ: Erlbaum.
Long, M., Inagaki, S., & Ortega, L. (1998). The role of negative feedback in SLA:
Models and recasts in Japanese and Spanish. The Modern Language Journal, 82,
357-371.
Lyster, R. (1998). Negotiation of form, recasts, and explicit correction in relation to error
types and learner repair in immersion classrooms. Language Learning, 48, 183–218.
Lyster, R. (2001). Negotiation of form, recasts, and explicit correction in relation to error
types and learner repair in immersion classrooms. Language Learning, 51 (Suppl.
1), 265-301.
Lyster, R. (2004). Different effects of prompts and effects in form-focused instruction.
Studies in Second Language Acquisition, 26, 399–432.
Lyster, R., & Izquierdo, J. (2009). Prompts versus recasts in dyadic interaction. Studies in
Second Language Acquisition, 59, 453-498.
Lyster, R., & Mori, H. (2006). Interactional feedback and instructional counterbalance.

205

Studies in Second Language Acquisition, 28, 269–300.
Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake. Studies in Second
Language Acquisition, 19, 37-66.
Lyster, R., & Saito, K. (2010). Oral feedback in classroom SLA: A meta-analysis. Studies
in Second Language Acquisition, 32, 265-302.
Mackey, A. (1999). Input, interaction and second language development. Studies in
Second Language Acquisition, 21, 557-587.
Mackey, A. (Ed.) (2007). Conversational interaction in SLA: A collection of empirical
studies. New York: Oxford University Press.
Mackey, A., Adams, R., Stafford, C., & Winke, P. (2010). Exploring the relationship
between modified output and working memory capacity. Language Learning, 60,
501-533.
Mackey, A., & Gass, S. (2005). Second language research: Methodology and design.
Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.
Mackey, A., Gass, S., & McDonough, K. (2000). How do learners perceive international
feedback? Studies in Second Language Acquisition, 22, 471-497.
Mackey, A., & Goo, J. (2007). Interaction research in SLA: A meta-analysis and research
synthesis. In A. Mackey (Ed.), Conversational interaction in SLA: A collection of
empirical studies (pp. 408–452). New York: Oxford University Press.
Mackey, A., & Philp, J. (1998). Conversational interaction and second language
development: recasts, responses, and red herrings? The Modern Language Journal,
82, 338-356.
Mackey, A., Philp, J., Egi, T., Fujii, A., & Tatsumi, T. (2002). Individual differences in
working memory, noticing of interactional feedback, and L2 development. In P.
Robinson, Individual differences and instructed language learning (181-209).
Philadelphia: John Benjamins.
McDonough, K. (2007). Interactional feedback and the emergence of simple past activity
verbs in L2 English. In A. Mackey (Ed.), Conversational interaction in second
language acquisition (pp. 323–338). New York: Oxford University Press.
McGrew, K., & Woodcock, R. (2001). Woodcock-Johnson III technical manual. Itasca,
IL: Riverside Publishing.
Michas, I., & Henry, L. (1994). The link between phonological memory and vocabulary
acquisition. British Journal of Developmental Psychology, 12, 147-164.

206

Miyake, A., & Friedman, N. (1998). Individual differences in second language
proficiency: Working memory as language aptitude. In A. Healy & L. Bourne
(Eds.), Foreign language learning: Psycholinguistic studies on training and
retention (pp.339-364). Mahwah, NJ: Erlbaum.
Montgomery, J. (1996). Sentence comprehension and working memory in children with
specific language impairment. Topics in Language Disorders, 17, 19-32.
Morris, D., Bransford, J., & Franks, J. (1977). Levels of processing versus transfer
appropriate processing. Journal of Verbal Learning and Verbal Behavior, 16, 519533.
Nicholas, H., Lightbown, P., & Spada, N. (2001). Recasts as feedback to language
learners. Language Learning, 51, 719–758.
Norris, J., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and
quantitative meta-analysis. Language Learning, 50, 417–528.
Ohta, A. (2009). Rethinking interaction in SLA: Developmentally appropriate assistance
in the zone of proximal development and the acquisition of L2 grammar. In J.
Lantolf, Sociocultural theory and second language learning (pp. 51-78). New York:
Oxford University Press.
Osaka, M., & Osaka, N. (1992). Language-independent working memory as measured by
Japanese and English reading span tests. Bulletion of the Psychonomic Society, 30,
287-289.
Oxford, R. (1995). Gender differences in language learning styles: What do they mean?
In J. M. Reid (Ed.), Learning styles in the ESL/EFL classroom (pp. 34-46). Boston:
Heinle and Heinle.
Pagagno, C., Valentine, T., & Baddeley, A. (1991). Phonological short-term memory and
foreign-language vocabulary learning. Journal of Memory and Language, 30, 331347.
Panova, I., & Lyster, R. (2002). Patterns of corrective feedback and uptake in an adult
ESL classroom. TESOL Quarterly, 36, 573-595.
Paradis, M. (2009). Declarative and procedural determinants of second languages.
Philadelphia, PA: John Benjamins.
Petersen, C. & Al-Haik, A. (1976). The development of the Defense Language Aptitude
Battery (DLAB). Educational and Psychological Measurement, 6, 369-380.
Pica, T. (1988). Interlanguage adjustments as an outcome of NS-NNS negotiated

207

interaction. Language Learning, 38, 45-73.
Pienemann, M. (1998). Language processing and second language development:
Proccessability Theory. Amsterdam: John Benjamins.
Philip, J. (2003). Constraints on “noticing the gap”: Nonnative speakers’ noticing of
recasts in NS-NNS interaction. Studies in Second Language Acquisition, 25, 99-126.
Pimsleur, P. (1966). Pimsleur Language Aptitude Battery (PLAB). New York: The
Psychological Corporation.
Polio, C. (1994). Non-native speakers’ use of nominal classifiers in mandarin Chinese.
JCLTA, 29, 51-66.
Polio, C. & Gass, S. (1998). The effect of interaction on the comprehension of nonnative
speakers. Modern Language Journal, 82, 308-319.
Ranta, L. (2002). The role of language analytic ability in the communicative classroom.
In P. Robinson (Ed.), Individual differences and instructed language learning (pp.
159-180). Philadelphia: John Benjamins.
Reber, A. (1989). Implicit learning and tacit knowledge. Journal of Experimental
Psychology: General, 118, 219-235.
Reves, T. (1983). What makes a good language learner? Personal characteristics
contributing to successful language acquisition. Ph.D. dissertation. Hebrew
University, Israel.
Robinson, P. (1997). Individual differences and fundamental similarity of implicit and
explicit adult second language learning. Language Learning, 47, 45-99.
Robinson, P. (2002). Effects of individual differences in intelligence, aptitude and
working memory on adult incidental SLA: A replication and extension of Reber,
Walkenfield and Hernstadt (1991). In P. Robinson, Individual differences and
instructed language learning (pp. 211-266). Philadelphia: John Benjamins.
Robinson, P. (2005). Aptitude and second language acquisition. Annual Review of
Applied Linguistics, 25, 46-73.
Robinson, p., & Yamaguchi, Y. (1999). Aptitude, task feedback and generalizability of
focus on form: A classroom study. Paper presented at the 12th AILA World
Congress, Waseda University, Tokyo.
Roehr, K., & Ganem-Gutierrez, G. (2009). The status of metalinguistic knowledge in
instructed adult L2 learning. Language Awareness, 18, 165-181.

208

Ross, S., Yoshinaga, N., & Sasaki, M. (2002). Aptitude-exposure interaction effects on
Wh-movement violation detection by pre-and-post-critical period Japanese
bilinguals. In P. Robinson (Ed.), Individual differences and instructed language
learning (pp. 267-299). Philadelphia: John Benjamins
Russell, J., & Spada, N. (2006). The effectiveness of corrective feedback for second
language acquisition: A meta-analysis of the research. In J. Norris & L. Ortega
(Eds.), Synthesizing research on language learning and teaching (pp. 131–164).
Amsterdam: Benjamins.
Safar, A., & Kormos, J. (2008). Revisiting problems with foreign language aptitude.
IRAL, 46, 113-136.
Sagarra, N. (2007). From CALL to face-to-face interaction: The effect of computerdelivered recasts and working memory on L2 development. In A. Mackey (Ed.),
Conversational interaction in second language acquisition (pp. 229–248). New
York: Oxford University Press.
Sasaki, M. (1996). Second language proficiency, foreign language aptitude, and
intelligence. New York: Lang.
Sawyer, M., & Ranta, L. (2001). Aptitude, individual differences, and instructional
design. In P. Robinson, Cognition and second language instruction (pp. 319-353).
Cambridge: Cambridge University Press.
Segalowitz, N. (1997). Individual differences in second language acquisition. In A.M.B.
de Groot & J. F. Kroll (Eds.), Language acquisition studies in generative grammar
(pp. 85-112). Killsdale, NJ: Erlbaum.
Service, E., & Kohonen, V. (1995). Is the relation between phonological memory and
foreign-language learning accounted for by vocabulary acquisition? Applied
Psycholinguistics, 16, 155-172.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied
Linguistics, 11, 129-158.
Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language
instruction (pp. 3-32). Cambridge: Cambridge University Press.
Schmitt, N., Dörnyei, Z., Adolphs, S., & Durow, V. (2003). Knowledge and acquisition
of formulaic sequences: A longitudinal study. In N. Schmitt (Ed.), The acquisition,
processing, and use of formulaic sequences (pp. 55–86). Amsterdam: John
Benjamins.
Schwartz, B. (1993). On explicit and negative data effecting and affecting competence
and linguistic behavior. Studies in Second Language Acquisition, 15, 147-163.

209

Sheen, Y. (2004). Corrective feedback and leaner uptake in communicative classrooms
across instructional settings. Language Teaching Research, 8, 263–300.
Sheen, Y. (2006). Exploring the relationship between characteristics of recasts and
learner uptake. Language Teaching Research, 10, 361–392.
Sheen, Y. (2007a). The effects of corrective feedback, language aptitude, and learner
attitudes on the acquisition of English articles. In A. Mackey (Ed.), Conversational
interaction in second language acquisition (pp.301-322). New York: Oxford
University Press.
Sheen, Y. (2007b). The effect of focused written corrective feedback and language
aptitude on ESL learners’ acquisition of articles. TESOL Quarterly, 41, 255-283.
Sheen, Y. (2008). Recasts, language anxiety, modified output, and L2 learning. Language
Learning, 58, 835-874.
Sheen, Y. (2010). The role of oral and written corrective feedback in SLA: Introduction.
Studies in Second Language Acquisition, 32, 169-179.
Sheen, Y. (2010). Differential effects of oral and written corrective feedback in the ESL
classroom. Studies in Second Language Acquisition, 32, 203-234.
Shi, Z. (1990). Decomposition of perfectivity and inchoativity and the meaning of the
particle –le in Mandarin Chinese. Journal of Chinese Linguistics, 18, 95-123.
Skehan, P. (1982). Memory and motivation in language aptitude testing. Ph.D.
dissertation. University of London.
Skehan, P. (1998). A cognitive approach to language learning. Oxford: Oxford
University Press.
Skehan, P. (2002). Theorising and updating aptitude. In P. Robinson (Ed.), Individual
differences and instructed language learning (pp. 70-94). Philadelphia: John
Benjamins.
Smith, C. (1997). The parameter of aspect. Dordrecht: Kluwer.
Snow, R. (1987). Aptitude complexes. In R. Snow & M. Farr (Eds.), Aptitude, learning,
and instruction (pp. 13-59). Hillsdale, NJ: Erlbaum.
Snow, R. (1991). Aptitude-treatment interaction as a framework for research on
individual differences in psychotherapy. Journal of Consulting and Clinical
Psychology, 59, 205-216

210

Snow, R. (1994). Abilities in academic tasks. In R. Sternberg & R. K. Wagner (Eds.),
Mind in context: Interactionnist perspectives on human intelligence (pp. 3-37). New
York: Cambridge University Press.
Spada, N. (1997). Form-focused instruction and second language acquisition: A review of
classroom and laboratory research. Language Teaching, 29, 1-15.
Spada, N., & Tomita, Y. (2010). Interactions between type of instruction and type of
language feature: A meta-analysis. Language Learning, 60, 263-308.
Sparks, R., Patton, J., Ganschow, L., & Humbach, N. (2009). Long-term relationships
among early first language skills, second language aptitude, second language affect,
and later second language proficiency. Applied Psycholinguistics, 30, 725-755
Spolsky, B. (1989). Conditions for second language learning. Oxford: Oxford University
Press.
Swain, M. (1985). Communicative competence: Some roles of comprehensible input and
comprehensible output in it development. In S. Gass & C. Madden (Eds.), Input in
second language acquisition (pp. 235-252). Rowley, MA: Newbury House.
Swain, M. (1995). Three functions of output in second language learning. In G. Cook and
B. Seidlhofer (Eds.), Principle and practice in applied linguistics: Studies in honor
of H.G. Widdowson (pp. 125-144).Oxford: Oxford University Press.
Swain, M. (2005). The output hypothesis: Theory and research. In E. Hinkel (Ed.),
Handbook on research in second language teaching and learning (pp. 471-484).
Mahwah, NJ: Lawrence Erlbaum.
Tai, J., & Wang, L. (1990). A semantic study of the classifier Tiao. Journal of the
Chinese Language Teachers Association, 25, 35-56.
Thompson, C. (1968). Aspects of the Chinese verb. Linguistics, 38, 70-76.
Trofimovich, P., Ammar, A., & Gatbonton, E. (2007). How effective are recasts? The
role of attention, memory, and analytical ability. In A. Mackey (Ed.),
Conversational interaction in second language acquisition (pp. 171–195). New
York: Oxford University Press.
Van Den Berg, M. (1989). Modern standard Chinese: Een functionele grammatical.
Muiderberg: Coutinho.
Van den Berg, M., & Wu, G. (2006). The Chinese particle –le. New York: Routledge.

211

VanPatten, B. (2007). Input processing in adult second language acquisition. In B.
VanPatten & J. Williams, Theories in second language acquisition (pp. 115-135).
Mahwah, NJ: Lawrence Erlbaum Associates.
Vendler, Z. (1957). Verbs and times. Philosophical Review, 66, 143-160.
Waters, G., & Caplan, D. (1996). The measurement of verbal working memory capacity
and its relation to reading comprehension. The Quarterly Journal of Experimental
Psychology, 49A, 51-79.
Waters, G., & Caplan, D. (2004). Verbal working memory and on-line syntactic
processing: Evidence from self-paced listening. The Quarterly Journal of
Experimental Psychology, 57A, 129-163
Wen, X. 1995. Second language acquisition of the Chinese particle le. International
Journal of Applied Linguistics, 5, 45-62.
Wen, X. (1997). Acquisition of Chinese aspect: An analysis of the interlanguage of
learners of Chinese as a second language. ITL: Review of Applied Linguistics,
117/118, 1-26.
Wesche, M. (1981). Language aptitude in measures in streaming, matching students with
methods, and diagnosis of learning problems. In K. Diller (Ed.), Individual
differences and universals in language learning aptitude, (pp. 119-154). Rowley,
MA: Newbury House.
Williams, J. (1999). Memory, attention and inductive learning. Studies in Second
Language Acquisition, 21, 1-48.
Williams, J. (2005). Learning with awareness. Studies in Second Language Acquisition,
27, 269-304.
Williams, J., & Lovatt, P. (2003). Phonological memory and rule learning. Language
Learning, 53, 67-121.
Wu, Y., & Bodomo, A. (2009). Classifiers ≠ determiners. Linguistic Inquiry, 40, 487-503
Wu, S., Yu, Y., Zhang, Y., & Tian, W. (2007). Chinese link. Upper Saddle River, NJ:
Pearson Education, Inc.
Xiao, R., & McEnery, T. (2004). Aspect in Mandarin Chinese. Philadelphia: John
Benjamins Publishing Company.
Yang, S. (1995). The aspectual system of Chinese. Ph.D. dissertation. University of
Victoria, Canada.

212

Yang, J. (2002). The acquisition of temporality by adult second language learners of
Chinese. Ph.D. dissertation. Tucson: The University of Arizona.
Yang, J. (2003). Back to the basic: The basic function of particle LE in modern Chinese.
Journal of the Chinese Language Teachers Association, 38, 77-96.
Yang, S., Huang, Y., & Sun, D. (1999). Acquisition of Aspect in Chinese as a
Second Language. Journal of the Chinese Language Teachers Association 34, 3154.
Yang, S., Huang, Y., & Sun, D. (2000).Underuse of temporal markers in Chinese
as a Second Language. Journal of the Chinese Language Teachers Association 35,
87-116.
Yang, Y., & Lyster, R. (2010). Effects of form-focused practice and feedback on Chinese
EFL learners’ acquisition of regular and irregular past tense forms. Studies in
Second Language Acquisition, 32, 235-263.
Yao, T., Liu, Y., Bi, N., Hayden, J., & Wang, X. (2005). Integrated Chinese. Boston:
Cheng & Tsui Company.
Zhang, H. (2007). Numeral classifiers in Mandarin Chinese. Journal of East Asian
Linguistics, 16, 43-59.
Zhang, K., Liu, S., Chen, X., Zuo, S., Shi, J., & Liu, X. (2002). New practical Chinese
reader. Beijing: Beijing Languages University Press.
Zhang, Y. (2005). Processing and formal instruction in the L2 acquisition of five Chinese
grammatical morphemes. In M. Pienemann, Cross-linguistic aspects of
processability theory (pp. 155-177). Philadelphia: John Benjamins.
Zhao, X., Li, Y., & Lin, L. (1999). An intensive reading course of intermediate Chinese.
Beijing: Beijing University Press

213