THE EFFECTS OF AGE OF IMMERSION AND WORKING MEMORY ON SECOND
LANGUAGE PROCESSING OF ISLAND CONSTRAINTS
: AN EYE-MOVEMENT STUDY
By
Sehoon Jung

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Second Language StudiesâDoctor of Philosophy
2017

ABSTRACT
THE EFFECTS OF AGE OF IMMERSION AND WORKING MEMORY ON SECOND
LANGUAGE PROCESSING OF ISLAND CONSTRAINTS:
AN EYE-MOVEMENT STUDY
By
Sehoon Jung
One of the central questions in recent second language processing research is whether the types
of parsing heuristics and linguistic resources adult L2 learners compute during online processing
are qualitatively similar or different from those used by native speakers of the target language.
While the current L2 processing literature provides evidence for both qualitative similarities and
differences between L1 and adult L2 processing, Clahsen and Felser (2006a, 2006b, 2006c)
claimed that the types of syntactic representations adult L2 learners apply during online
processing are shallower and hierarchically less detailed, and adult L2 learners rely more on
other types of linguistic resources available to them, such as lexical-semantic and pragmatic
information. This dissertation aimed to explore these issues to provide more insight into the
nature of adult L2 syntactic processing, by investigating how advanced ESL learners varying in
their ages of arrivalâearly learners: ages of arrival between 2 and 9 years old; adult learners:
ages of arrival between 18 and 31 years oldâdeal with relative clause island constructions while
processing filler-gap dependencies online in a natural reading environment. In addition, the
present study also sought to examine how individual differences in working memory capacity
(WMC) influence learnersâ processing behaviors and use of target language grammars in real
time. Twenty-eight advanced adult ESL learners with either Korean or Chinese background and
21 early ESL learners, as well as 24 native English speaker controls participated in an eyetracking reading experiment and took two different types of automated complex working

memory span tests (operation-span & symmetry-span). Results suggested that while the early
and adult ESL learners made use of active filler strategies to fill the gap as early as possible in
the non-island environment, they rapidly deployed relevant syntactic knowledge of island
constraints, thereby avoiding illicit filler-gap formation inside the relative clause islands from
early stages of processing, as measured by first-pass reading time, first-pass regression, and
regression path duration. Results also suggested that the early ESL learners and native English
speaker controls were sensitive to structural cues and gap identifications at the ultimate gap,
initiating filler-gap reanalysis processes from early stages of processing, as measured by first
fixation duration and first-pass reading time. On the other hand, the adult ESL learners exhibited
filler-gap reanalysis effects only during later stages of processing, suggesting that they were not
as efficient and immediate as the early ESL learners and native English speaker controls in
detecting the need for filler-gap reanalysis. Lastly, individual differences in WMC did not show
any significant effect on early and adult ESL learnersâ processing of island constructions, in that
both ESL groups successfully blocked gap postulations in the island environment, by and large,
irrespective of their working memory capacities. However, it was found that different WMCs
among the adult learners influenced their reading behaviors during filler-gap reanalysis at the
ultimate gap, in that adult learners with higher WMC were more sensitive to gap identifications
than those with lower WMC, showing more immediate filler-gap reanalysis effect from early
stages of processing, as measured by first fixation duration and first-pass regression. These
results suggest that early and adult ESL learnersâ processing of structurally complex filler-gap
dependencies in the L2 is not qualitatively different from that of native English speakers.

Copyright by
SEHOON JUNG
2017

This dissertation is dedicated to my family
To my wife, Soojin Ryu, for her love, support, encouragement, and sacrifice.
To my parents, Jong-Sung Jung and Kyung-Ja Shin, and
To my parents-in-law, Hyun-Joo Ryu and Chung-Mi, Kim,
For their support and belief in me
To my grandma, Chun-Nam Jung, for her prayers.
To my sister Se-Jung and my brother Myoungwhon
For their encouragement and support.
But above all, this dissertation is dedicated to the Lord whom I love and trust

v

ACKNOWLEDGEMENTS

This dissertation is the outcomes of work and support of many people. I first would like
to express my deepest appreciation to Professor Patti Spinner who is my teacher, advisor, and
mentor. She has always supported me, taught me to see a big picture of what I am doing in my
research, and helped me feel more confident in becoming a professional since my first days in
East Lansing. I am deeply indebted to her for her encouragement and belief in me during my
time in the Second Language Studies program. I am also grateful to all the members of my
dissertation committee for their support, encouragement, and respecting my ideas.
I am thankful to Professor Susan Gass who has given me wonderful opportunities to work
as an eye-tracking research assistant. The research and work experience I gained through my first
four years of this assistantship certainly provided a great foundation on which I could plan,
conduct, and write this dissertation study. Professor Paula Winke was my supervisor during my
first two years in the eye-tracking lab as a graduate student lab coordinator. I owe her my current
expertise in eye-tracking research. I really appreciate her caring attitude and support from Day 1
until the last moment in my Ph.D. Journey.
I am also thankful to Professor Aline Godfroid, Professor Susan Gass, Professor Shawn
Lowen, and Professor Rod Ellis, for giving me a wonderful opportunity to work with them on the
eye-tracking GJT research project, through which I was able to learn a lot and establish my
research competence in the field. The enthusiasm and passions they brought to every single
meeting over the two-year span was such an inspiration for me.
I also express my best wishes to all SLS colleagues who made life easier when we came
together and shared our thoughts and challenges. I appreciate the good times, conversations and
conference journeys I shared with Jens, Roman, and Hyung-Jo, and all the old and new members

vi

of the SLS friends. I particularly extend my gratitude for Ayman Mohamed with whom I shared
many days of so-called âPseudo-research meting.â I will always cherish our memories together
and truly appreciate his friendship.
But above all, I want to thank my Lord who allowed me to meet these people above, gave
me the knowledge, strength, and ability to finish my dissertation study. Blessed are those who
trust in the name of the Lord.

vii

TABLE OF CONTENTS

LIST OF TABLES .......................................................................................................................... x
LIST OF FIGURES ....................................................................................................................... xi
CHAPTER 1. INTRODUCTION ................................................................................................... 1
CHAPTER 2: REVIEW OF LITERATURE ................................................................................ 14
2.1. Grammatical representations of wh-structures ............................................................... 14
2.1.1 Filler-gap dependency representations in English ................................................... 14
2.1.2. Island Constraints: Violation of the movement constraints .................................... 15
2.1.3. Grammatical representations of wh-structures in wh-in-situ languages ................. 17
2.1.4. Summary ................................................................................................................. 18
2.2. Incremental processing: Evidence from processing of garden-path sentences .............. 18
2.3. Processing of filler-gap dependencies: Incremental gap search processes .................... 21
2.4. Processing of filler-gap dependencies by nonnative speakers ....................................... 26
2.4.1. A review of early research on L2 processing.......................................................... 26
2.4.2. Shallow structure hypothesis .................................................................................. 29
2.4.3. A Review of the empirical research testing the SSH .............................................. 30
2.4.3.1. L2 processing of ambiguous RC constructions .......................................... 30
2.4.3.2. L2 processing of long-distance filler-gap dependencies ............................. 31
2.4.3.3. L2 processing of island constraints ............................................................. 35
2.5. The effect of age of immersion (or acquisition) and critical period hypothesis............. 38
2.6. The role of working memory on L2 parsing .................................................................. 41
2.7. Research Questions ....................................................................................................... 44
CHAPTER 3: METHOD .............................................................................................................. 47
3.1. Participants ..................................................................................................................... 47
3.2. Materials ......................................................................................................................... 50
3.2.1. English proficiency measures ................................................................................. 50
3.2.2. Working memory capacity measures ...................................................................... 53
3.2.3. Main experiment: Eye-tracking reading ................................................................. 58
3.2.3.1. Reading materials........................................................................................ 58
3.2.3.2. Areas of Interest for analyses ...................................................................... 60
3.2.3.3. Eye-tracking reading task design and procedures ....................................... 63
3.2.3.4. Eye-tracking dependent variables ............................................................... 65
3.3. Overall procedures ......................................................................................................... 68
3.4. Data Analysis ................................................................................................................. 68
3.4.1. Preparation of the data for analyses ........................................................................ 68
3.4.2. Main Statistical analyses ......................................................................................... 71
CHAPTER 4. RESULTS .............................................................................................................. 75
4.1. Comprehension Accuracy .............................................................................................. 75
4.2. Overview of reading profiles .......................................................................................... 76
viii

4.3. The effect of age of immersion on L2 processing of filler-gap dependencies ............... 81
4.3.1. Active filler strategy and application of island constraints: Initial gap .................. 81
4.3.1.1. Analysis of reading patters at the first critical region (Region1) ................ 81
4.3.1.2. Analysis of reading patters at the spillover region (Region2) .................... 89
4.3.1.3. Interim summary of the resultsâ Initial gap .............................................. 97
4.3.2. Filler-gap reanalysis: Ultimate gap ......................................................................... 98
4.3.2.1. Analysis of reading patterns at the second critical region (Region3) ......... 98
4.3.2.2. Analysis of reading patterns at the spillover region (Region4) ................ 108
4.3.2.3. Interim summary of the resultsâ Ultimate gap ....................................... 117
4.4. The effect of individual differences in working memory capacity .............................. 118
4.4.1. The effect of WMC at the earliest gap Region1 and spillover Region2 ............... 118
4.4.2. The effect of WMC at the ultimate gap at Region3 and spillover Region4 ......... 128
4.4.3. Summary of the resultsâ the effect of WM ......................................................... 135
CHAPTER 5: DISCUSSION ...................................................................................................... 136
5.1. The effect of age of acquisition .................................................................................... 137
5.2. The role of working memory in L2 processing of island constraints ........................... 152
CHAPTER 6: CONCLUSION ................................................................................................... 157
6.1. Limitations and future research .................................................................................... 159
APPENDICES ............................................................................................................................ 161
Appendix A. Language background questionnaire ............................................................. 162
Appendix B. List of test items in the LexTALE English proficiency measure................... 166
Appendix C. Materials for the eye-tracking experiment ..................................................... 167
REFERENCES ........................................................................................................................... 170

ix

LIST OF TABLES

Table 1. Biodata and English learning background of the ESL learners ...................................... 48
Table 2. Self-rated English proficiency of the ESL learners for each language skill ................... 50
Table 3. LexTALE scores (in percent) of the native speakers and the ESL learners ................... 52
Table 4. Summary of the WM span test results in percent ........................................................... 71
Table 5. Mean comprehension accuracy in percent in the reading task ....................................... 75
Table 6. Descriptive statistics for RTs in and first-pass regressions in percent at Region1 ......... 81
Table 7. Summary of the results of preliminary analyses at Region1 .......................................... 83
Table 8. Descriptive statistics for RTs and first-pass regressions at Region2 .............................. 89
Table 9. Summary of the results of preliminary analyses at Region2 .......................................... 91
Table 10. Summary of the major findings at the initial gap ......................................................... 97
Table 11. Descriptive statistics for RTs and first-pass regression at Region3.............................. 98
Table 12. Summary of the results of preliminary analyses at Region3 ...................................... 103
Table 13. Descriptive statistics for RTs and first-pass regressions at Region4 .......................... 108
Table 14. Summary of the results of preliminary analyses at Region4 ...................................... 110
Table 15. Summary of the findings at the ultimate gap .............................................................. 117
Table 16. Summary of the WM effect analyses at Region1 and Region2 .................................. 120
Table 17. First-pass RT and first-pass regressions by higher- and lower-WM early ESL ......... 122
Table 18. Summary of the WM effect analyses at Region3 and Region4 .................................. 129
Table 19. Summary of the findingsâ The WM effect ............................................................... 135

x

LIST OF FIGURES
Figure 1. A screenshot of the LexTALE Test ............................................................................... 52
Figure 2. Processing and storage component of the operation span test ...................................... 56
Figure 3. Processing and storage component of the symmetry span test...................................... 56
Figure 4. An illustration of eye-movements during reading ......................................................... 65
Figure 5. Fixation map: Reading profiles of the NS English speakers ......................................... 77
Figure 6. Fixation map: Reading profiles of the early ESL learners ............................................ 78
Figure 7. Fixation map: Reading profiles of the adult ESL learners ............................................ 78
Figure 8. Reading patterns of the three groups during early stages of processing at Region1 ..... 85
Figure 9. Reading patterns of the three groups during late stages of processing at Region1 ....... 86
Figure 10. Reading patterns of the three groups during early stages of processing at Region2 ... 92
Figure 11. Reading patterns of the three groups during late stages of processing at Region2 ..... 93
Figure 12. Reading patterns of the three groups during early stages of processing at Region3 . 101
Figure 13. Reading patterns of the three groups during late stages of processing at Region3 ... 102
Figure 14. Reading patterns of the three groups during early stages of processing at Region4 . 111
Figure 15. Reading patterns of the three groups during late stages of processing at Region4 ... 112

xi

CHAPTER 1. INTRODUCTION
How do people come to understand what others say or what they read? One might
assume that the process of understanding linguistic input (e.g., written texts or utterances) is very
simple and straightforward, when considering how frequently and quickly it happens even
without much conscious effort. However, language comprehension works through a sequence of
highly complex linguistic analyses that map linguistic input onto a variety of different types and
levels of mental representations in real time during comprehension (Clahsen, 2007; Mazuka,
1998). That is, within a limited amount of time, one must deploy different components of
linguistic knowledge (e.g., lexical, syntactic, semantic, pragmatic, discourse, and world
knowledge) to process the input in a linguistically and contextually appropriate way. All these
mapping and application processes must be computed efficiently to achieve comprehension. In
light of this, comprehension requires suitable linguistic knowledge at various levels, as well as
sufficient processing skills that allow the processor to carry out such demanding linguistic
operations efficiently. Of the various types and levels of linguistic representations and
processing, this dissertation specifically focuses on second language (L2) learnersâ syntactic
processing during L2 comprehension.
A critical part of understanding linguistic input is creating a licit grammatical
representation that can accommodate the processed input strings. In this respect, syntactic
processing (or parsing1) is mainly responsible for conducting moment by moment computations
making structural inferences from word strings in the input, creating associations between and
1

The term parse or parsing specifically refers to online applications of grammatical
information, namely syntactic processing in real time, whereas processing is used as a more
general term that covers all types of linguistic operations and interfaces between them
(VanPatten & Jegerski, 2010).

1

among words in a sentence to structure constituents and assign syntactic categories. In doing so,
it has been well attested that the parser builds a series of representations incrementally on the
basis of syntactic information provided by the grammar. In this sense, parsing can be understood
as a laboratory where the current state of grammar is tested. Parsing should likely end up being
successful if learnersâ existing grammar is mature and adequate to license a representation for the
target language input. On the other hand, if the deployed information is not appropriate, either
because it has not been fully acquired yet or because it is impaired due to deviations from the
target language norms, then parsing may not be successful, resulting in comprehension
breakdown. Such parsing failures may necessitate the need for an update of the current grammar
system and trigger acquisition of the representation in the long term through repeated processing
practice on the structure (e.g., Dekydtspotter & Renaud, 2014; Fodor, 1998; Gregg, 2003; White,
2003; but cf. Klein, 1999).
In order for parsing to be efficient and successful, it also requires suitable and âleast
effortâ parsing strategies attuned to specific grammatical properties of the target language (e.g.,
wh-movement, grammatical gender, and relative clause attachment preferences), accompanied
with sufficient processing abilities that allow the parser to integrate needed grammatical
information efficiently in a consistent manner (e.g., Dussias, 2003; Gibson, Pearlmutter,
Canseco-Gonzalez, & Hickok, 1996; Juffs, 2005; Juffs & Harrington, 1995, 1996; Keating,
2009; Marinis, Roberts, Felser, & Clahsen, 2005; Williams, 2006; Williams, Moebius, & Kim,
2001). Note that parsing is a processing component (i.e. linguistic performance) guided by the
grammatical information (i.e., linguistic competence) under pressure. In this regard, Juffs and
Rodriguez (2015) analogized the grammar to the engine at rest while comparing parsing to the
engine in motion, further explaining:

2

the grammar is the engine at rest, not driving the vehicle, but with the potential to do so.
Parsing is the engine in motion, subject to stresses and possible breakdowns allowable by
the system, and driving production or comprehension in real timeâŚâŚ the operation of
the grammar during processing may be affected by the quality of input, memory
limitations, and interference from outside influenced not related to the architecture of the
grammar itself. (pp. 15)
What they mean is that, even if one has acquired a relevant representation, it does not necessarily
guarantee that the acquired representation will be fully utilized during comprehension. If the
parser is burdened for some reason (e.g., slower processing speed, complexity of the input, and
limited working memory capacity), and/or if learnersâ parsing strategies are not proficient
enough to rapidly extract and unload the detailed syntactic information of the target language
structure in real time (e.g., Juffs, 2005; Juffs & Harrington, 1996), it may then result in
misanalyses of the input. Additionally, learners might be led to rely on some other alternative
sources of information to compensate for the lack of parsing abilities (e.g., use of semantic
information, context surrounding the input being currently processed, or world knowledge)
(Clahsen & Felser, 2006a; 2006b). Taken together, as far as syntactic processing is concerned,
successful comprehension requires L2 learners to acquire not only the grammar of the target
language, but also processing heuristics that allow the parser to make use of the acquired target
language grammar efficiently in real time (Marinis, 2003).
Now, the question to ask is, to what extent can learners do this job? It is perhaps needless
to say that native speakers of a language come with a full list of relevant syntactic
representations and fully optimized and proficient parsing strategies to get the job done reliably
well. However, this does not always seem to be the case for adult second language (L2) learners,

3

especially when taking into account the observation that many adult learners, even those who are
highly proficient in their L2s, tend to be less accurate, less efficient, and more prone to errors and
processing breakdown during their L2 performance, compared to native speakers of the language
they are acquiring. Given that successful parsing in the L2 presupposes adequate knowledge of
the target language grammar as discussed above, one possibility to account for relatively
inconsistent and deficient L2 performance is to assume that the interlanguage grammar acquired
by adult learners is incomplete, thus deviating from the target language norm (e.g., BleyVroman, 1990; 2009). In this regard, a large body of research, from a formal perspective in
particular, has investigated whether or not learners can ultimately acquire L2 grammar to a
degree that is qualitatively comparable to that of native speakers. Years of accumulated L2
acquisition research conducted to explore this question has provided much information about the
processes of interlanguage grammar development, but it also has shown that even highly
proficient L2 learners often display a wide range of (meta)linguistic performance variability
within and across individuals (e.g., varying accuracy rates on grammaticality judgment tests
(GJTs)), thus making it difficult to conclude what aspects of L2 grammar can or cannot be
learned (e.g., Bley-Vroman, 1990, 2009; Clahsen & Muysken, 1996; Epstein, Flynn, &
Martohardjono, 1996; Hawkins & Chan, 1997; Hawkins & Hattori, 2006; Johnson & Newport,
1989, 1991; Schwartz & Sprouse, 1996; White, 1992; White & Genesee, 1996; White & Juffs,
1998; see White, 2003, for review and discussion).
Another limitation in this line of research is its methodology. That is, with the type of
data obtained from those studies, which mostly consist of intuitional judgment data collected
during offline tasks (e.g., grammaticality judgments, acceptability judgments, or truth-value
judgments; see Gass & Selinker, 2008, for discussion), it is difficult to pinpoint how, at which

4

point, and on what basis during reading learners come to accept or reject certain sentences in
question. This is a very critical piece of information because there may be perhaps more than one
factor driving learners to certain judgments. In other words, learnersâ judgments may not
necessarily be based on their knowledge of the target language grammar that the test attempts to
tap into. Birdsong (1992), for example, used an acceptability judgment test, and found that the
L2 French learners in his study made correct judgments similarly to the native French controls,
but the reasons participants gave for their decisions often exhibited variation within the L2 group
and differed from those of native speakers, raising a potential issue of task validity (e.g., see
Ellis, 1991; Tremblay, 2005 for discussion). Therefore, it is important to look into learnersâ
processing performance more in detail to better understand how they use their interlanguage
grammar to construct syntactic structures in real time. This information may in turn provide
insight into learnersâ underlying L2 grammar (Jiang, 2007; Juffs & Rodriguez, 2015).
Taking into account learnersâ performance variability and the methodological limitations
discussed above, there is now a growing body of research dealing with real-time L2 processing,
with a question as to how learners process target language input and what kinds of processing (or
parsing) mechanisms and information resources (e.g., lexical, syntactic, and semantic) L2
learners access during their reading and listening comprehension. Employing online time course
measures such as cross-modal priming (Swinney, 1979), self-paced reading (Just, Carpenter, &
Wolley, 1982), or eye-tracking (for a review of this method, see Clifton, Staub, & Rayner, 2007;
Dussias, 2010; Roberts & Siyanova-Chanturia, 2013), L2 processing research mainly aims at
capturing learnersâ moment-by-moment parsing decisions to examine exactly how they construct
structural representations. Of particular interest in this line of research has been whether the
ways nonnative speakers and native speakers process (or parse) incoming L2 input online are

5

qualitatively the same or different. In addressing this big question, a number of studies have also
examined whether the extent to which L2 processing converges on or diverges from L1
processing is modulated by some other variables, such as different L1 morpho-syntactic
properties and related processing strategies (e.g., Aldwayan, Fiorentino, & Gabriele, 2010;
Dussias, 2003; Frenck-Mestre, 1997, 2002; Hopp, 2010; Jegerski, VanPatten, & Keating, 2011;
Juffs, 1998, 2005; Juffs & Harrington, 1995, 1996; Marinis, et al., 2005; Jegersky, VanPatten, &
Keating, 2011; Jiang, Novokshanova, Masuda, & Wang, 2011; Keating, 2009; Papadopoulou &
Clahsen, 2003; Sagarra & Ellis, 2013; Omaki & Schulz, 2011, Trenkic, Mirkovic, & Altmann,
2014; White & Juffs, 1998), L2 proficiency (Fernandez, 1999; Frenck-Mestre, 2002; Hopp,
2006; Jackson, 2008; Jackson & Dussias, 2007), L2 exposure (e.g., Cuetos, Mitchell, & Corely,
1996; Dussias & Sagarra, 2007; Ha, 2005; Pliatsikas & Marinis, 2013), or individual working
memory capacity of learners (e.g., Dussias & PiĂąar, 2010; Felser & Roberts, 2007; Juffs, 2004,
2005; Williams, 2006). In search of answers to these questions, processing of filler-gap
dependency constructions has received much attention in recent L2 processing literature (e.g.,
Cunnings et al., 2010; Dekydtspotter & Miller, 2009; 2013; Dussias & PiĂąar, 2010; Juffs, 2005;
Kim et al., 2015; Marinis, Roberts, & Felser, 2005; Miller, 2015; Omaki & Schulz, 2013;
Williams, 2006; Witzel, Witzel, & Nicole, 2002; see Clahsen and Felser, 2006a, 2006b; Juffs &
Rodriguez, 2015).
What makes filler-gap dependency constructions intriguing from a processing perspective
is that for some languages such as English and Spanish, there is a non-canonically positioned
constituent overtly moved out of its original theta position, for example, a fronted wh-phrase in
the wh-question and relative clause in (1) and (2) respectively.

6

(1) Whoi did the police know ti the pedestrian killed ti?

(Dussias & PiĂąar, 2010)

(2) The nursei whoi the doctor argued ti that the rude patient had angered ti is refusing
to work late.

(Marinis et al., 2005)

When processing constructions containing a displaced wh-phrase (called a filler or wh-filler) for
comprehension, the parser needs to track down where the filler was originally positioned (called
the gap), and figure out how it is associated with the other part of the sentence both syntactically
and semantically. This procedure is referred to as filler-gap processing. As will be discussed in
more detail in the next chapter, it is important to note here that there is cross-linguistic variation
with respect to the way the wh-phrases are treated in their grammatical representations. That is,
different from English or Spanish, languages such as Korean, Chinese, and Japanese do not
require such movements, and a wh-phrase stays in its base-generated position (wh-in-situ), thus
not necessitating such filler-gap processing procedures at least at the level of syntax. With this in
mind, there is ample evidence in the L1 processing literature that for wh-movement languages,
the parser universally constructs grammar structures incrementally (e.g., immediacy hypothesis,
Just & Carpenter, 1980) and actively seeks to fill the gap as early as possible by releasing the
filler to every structurally possible trace position (marked as ti) until it finally finds its home (i.e.,
gap) (e.g., active filler strategy by Frazier & Clifton, 1989; trace reactivation hypothesis by
Swinney, Ford, Frauenfelder, & Bresnan, 1988; but cf. Pickering & Berry, 1991). However,
relatively less has been uncovered and no converging evidence seems to have been established
yet when it comes to processing such structures in L2 contexts, especially regarding what types
of linguistic resources and representations learners compute to locate the non-canonically
positioned fillerâs place during real-time processing.

7

In this regard, several different positions have been put forward to account for
characteristics of L2 processing. One position states that adult L2 learners may experience
greater parsing difficulties than native speakers especially when the parser is expected to face
momentarily heavier parsing loads, but that they still may process the target language structures
in a qualitatively similar way to native speakers (e.g., Dussias & PiĂąar, 2010; Juffs, 2005; Juffs
& Harrington, 1995; 1996; Williams, Mobius, & Kim, 2001). In the same vein, a body of
research has also argued for L1-L2 parsing similarities under certain conditions, such as when
learners have high proficiency (e.g., Hopp, 2006; Omaki & Schulz, 2011; Sagarra &
Herschensohn, 2010; Williams, 2006), extensive L2 exposure (e.g., Dussias & Sagarra, 2007;
Frenck-Mestre, 2005; Pliatsikas & Marinis, 2013), and when there is closeness of L1-L2
syntactic properties (e.g., Zawiszewski, Gutierrez, Fernandez, & Laka, 2011). Omaki and Schulz
(2011), for example, implemented a self-paced reading test to examine how advanced L1
Spanish learners of English process long-distance filler-gap dependency constructions. Omaki
and Schulz found that the nonnative speakers in their self-paced reading study not only searched
for the gap actively in a comparable fashion to native speakers, but they also demonstrated that
they apply relevant grammatical constraints into processing with âsubstantial grammatical
precisionâ during filler-gap processing (p. 585). Zawiszewski et al. (2011) reported the role of
L2 proficiency and L1-L2 distance in syntactic features through their event-related potential
(ERP) study with L1 Spanish learners of Basque. Zawiszewski et al. tested three syntactic
parameters: the head parameter, argument alignment, verb agreement. The first two had
diverging syntactic features between the L1 and L2 whereas the same verb-agreement feature
was shared by both L1 and L2 systems. They found that divergence in syntactic parameters
between L1 and L2 may yield L2 processing patterns that are different from L1 processing, but

8

that native-like L2 processing is possible as a function of increased L2 proficiency (see also,
Aldwayan et al., 2010, for similar findings). A more recent study by Pliatsikas and Marinis
(2013) that tested L2 learnersâ use of intermediate gaps during online filler-gap processing found
that learners with substantial immersion experience in the L2 process the target language
structures similarly to the native speakers employing detailed syntactic representations (see also,
Dussias & Sagarra, 2007, for a similar finding).
On the other hand, there are a number of other researchers who view adult L2 processing
as something essentially different compared to native processing. Based on empirical findings
that mostly come from research on L2 processing of filler-gap dependencies and ambiguous
relative clause constructions, these researchers claim that L2 learners are limited in their use of
grammatical information when processing in the L2, by and large irrespective of learnersâ L1s,
L2 proficiency, and available cognitive resources such as working memory capacity (e.g.,
Cunnings et al., 2010; Felser et al., 2003, 2012; Felser & Roberts, 2007; Marinis et al, 2005;
Papadopoulou & Clahsen, 2003). This position eventually led Clahsen and Felser (2006a, 2006b)
to propose their shallow structure hypothesis (SSH). According to the SSH, L2 processing by
adult learners is fundamentally different from L1 processing, in that nonnative speakers are much
less likely to be able to utilize rich and fully detailed syntactic representations when processing
the L2 online, either because of their deficient and inadequate interlanguage grammar
representations (e.g., Bley-Vroman, 1990) and/or their inability to compute sufficiently detailed
syntactic information in the input stream in real time. This is the case even if learners have
acquired relevant syntactic representations of the target language either via their L1 grammar or
through the development of their L2 learning. On the basis of these assumptions, the SSH
suggests that adult L2 learners, even at a highly proficient level, are led to take the shallow

9

processing route, which is fed by less detailed and incomplete syntactic representations, and
instead rely heavily on lexical, semantic, pragmatic and discourse information.
In the light of what has been discussed above, the present study aimed to test the validity
of the shallow structure hypothesis by investigating how proficient adult ESL learners with either
L1 Korean or L1 Chinese background process long-distance filler-gap dependency constructions
in English during real time processing. Specifically, the focus of the study was on L2 learnersâ
use of relative clause island constraint during online reading. As will be discussed more in details
in the next chapter, the relative clause island is a type of syntactic structure that does not allow
the formation of filler-gap dependencies inside the island constructions. In other words, when the
parser encounters this relative clause island structure in the input, it should avoid postulating a
gap inside the island as there is no grammatically licit gap for the filler in the grammatical
representations. From a SSH point of view, however, adult L2 learners would not be able to
employ such abstract and hierarchically detailed syntactic constraints in real time processing (cf.
Cunnings et al., 2010). Consequently, L2 learners may attempt to postulate a gap inside the
relative clause island, constructing a representation that lacks sufficiently detailed hierarchical
configurations. Therefore, looking into how adult L2 learners handle the filler when
encountering an island structure in the middle of filler-gap processing may provide an important
piece of information whether L2 processing is indeed guided by the shallower processing routes
as the SSH would predict. To this end, this study implemented an eye-tracking reading
experiment and analyzed learnersâ eye-movement patterns to provide more insight into the ways
learners handle non-canonically positioned fillers under an island environment during reading.
The present study also explored the role of age of acquisition (or age of immersion) by
including L2 learners who have been immersed in an ESL environment from early ages

10

(hereafter, early ESL). As laid out above, the SSH assumes that adult learners are restricted in
making use of detailed syntactic representations during real-time structure building, either
because of the representational problem (i.e., grammatical knowledge) or application problem
(i.e., processing ability). Of the two possibilities, if we assume that adult L2 learners hold
grammatical representations that are qualitatively comparable to those of native speakers of the
target language (e.g., Rothman, 2008; Schwarz & Sprouse, 1996; White, 2003; White & Juffs,
1998, among others), the question to ask is, âIs this applicational limitation of adult L2 learners
on syntactic processing due to age-related issues, for instance delayed ages of acquisition or
exposure to a target language environment past critical (or sensitive) period for language
learning?â The SSH does not provide a clear reason as to what precisely causes adult learnersâ
limited ability to access the full parsing route if it is not a representational problem, although
they (adult learners) are capable of computing other linguistic types of resources (e.g., lexical
and semantic information) (Dekydtspotter, Schwartz, & Sprouse, 2006), as well as some other
domains of (morpho-)syntactic processing (e.g., Ojima et al., 2005; Sabourin & Haverkort,
2003). In this regard, comparing how highly advanced adult and early L2 learners process the
target language structures may provide further insight into the nature of adult L2 processing.
There is a general consensus that L2 learners who have been immersed to a L2
environment at earlier ages before puberty are generally more efficient, fluent, and arrive at a
native-like end-state grammar stage in a more consistent way than adult learners, who often
display a wide range of L2 grammar knowledge and/or performance variability. The fact that
adult learners demonstrate more performance variability than early learners makes it particularly
crucial to examine the way these two L2 groups process the target language structures online.
Even if syntactic knowledge of the two distinct groups is comparable, processing could differ,

11

meaning that there could be a critical or sensitive period that restricts adult learnersâ application
of the grammar in real time. When such a critical or sensitive period would fall is unknown, but
it is possible that early learners (if early enough) would have an advantage regarding the
acquisition of full, native-like processing strategies. Thus far, however, there has been only a
very small volume of L2 sentence processing research that directly examined the role of age of
acquisition on L2 processing by comparing early and adult L2 learners in the same study (e.g.,
Ha, 2005; Weber-Fox & Neville, 1996), whereas most studies tested only adult L2 learners.
Thus, it has not been clearly revealed yet whether or not, and how, acquisition of proficient
parsing ability to apply detailed parsing mechanisms is affected by the age that learners have
been exposed to the target language. Taken together, delving into how early-immersed and adult
L2 learners perform filler-gap processing using the same test materials may provide valuable
information not only for evaluating the claims of the SSH, but also for increasing our
understanding of the nature of adult learnersâ acquisition of L2 grammar and processing, and
how different ages of immersion influence learnersâ development of grammatical knowledge and
processing abilities.
Lastly, it is another goal of this study to explore the role of individual differences in
working memory capacity (WMC). It has been well attested in the current L2 literature that adult
L2 learners are generally slower and less efficient in their L2 processing than L1 speakers (e.g.,
Juffs, 2005; Segalowitz & Segalowitz, 1993. Ullman, 2004; Williams et al, 2001; 2006). If this
is the case, it may be reasonable to assume that learnersâ cognitive resources during online
processing are more taxed because the amount of time the processor should hold unanalyzed
constituent information such as wh-fillers will increase due to delayed integrations of the
processed information (e.g., Kann, Ballantyne, & Wijnen, 2015). As a result, adult L2 learners

12

may be more prone to processing difficulties due to their shortage of cognitive resources relative
to the amount of processing cost the parser has to pay. This seems more likely especially when
learners process highly demanding target language structures such as long-distance filler-gap
dependencies in real time. As a result, L2 learners, those with lower WMC in particular, may
have less chance to access detailed syntactic information to a degree that native speakers of the
target language would do. Note however, that the SSH does not predict any effect of different
individual WMC in L2 processing assumedly because L2 grammatical representations that are
computed by the parser are more likely shallower and less detailed regardless of oneâs WMC.
While there is growing interest in the role of WMC on L2 processing, the current literature has
not yet provided enough data to elicit any clear conclusion as to how individual differences in
learnersâ WMC affect the way learners process the target language structures during online
sentence processing. Therefore, it seems that further observations of the role of WMC are
obviously needed.
The rest of this dissertation is organized as follows: Chapter 2 provides the theoretical
background for the filler-gap dependency representations/processing and working memory, and
reviews recent L2 processing research, followed by the research questions that guide the present
study. Chapter 3 provides the information of the participants, research design, materials, and
overview the data collection and analysis processes. Chapter 4 reports the results, and Chapter 5
discusses of the results more in detail in light of the research questions. Finally, Chapter 6
provides a brief summary of the research findings, and addresses some of the limitations of this
study, and makes suggestions for future research.

13

CHAPTER 2: REVIEW OF LITERATURE
2.1. Grammatical representations of wh-structures
2.1.1 Filler-gap dependency representations in English
According to most recent generative accounts (e.g., Adger, 2003; Chomsky, 1995), whquestions or relative clauses in English are the product of movement, as shown in (3).
(3) The manager who the consultant claimed that the new proposal had pleased will
hire five workers tomorrow.

(The sentence was adapted from Gibson & Warren, 2004, p.61)

14

In (3), the wh-phrase who, the relative pronoun co-indexed with the antecedent the manager, is
the object argument of pleased in its underlying position. It is fronted to the sentence initial
specifier of the matrix CP, CP1 in order to check and delete the strong uninterpretable wh-feature
in English. The movement operation in this case is to be guided by the grammatical constraints
generally known as subjacency (Chomsky, 1973; 1981), according to which a wh-constituent
may not cross more than one bounding node2 at a time thus restricting movement of the whphrase to be more local (for a review of more recent theoretical accounts and discussions on
subjacency, see Belikova & White, 2009). Taking into account this movement constraint, the
wh-phrase who in (3) needs to be moved up through two separate movement steps; first from the
canonical position to the specifier of CP2âa syntactic gap known as an empty category (e) or
wh-trace (t)âand then to the specifier of CP1 successively. This movement, which is referred to
as successive cyclic movement, illustrates not only how the dislocated wh-phrase is syntactically
associated with the other parts of the sentenceâspecifically with its subcategorizing verbâbut
also the role of the mediating site (i.e., the empty category CP2) in the grammatical
representation for establishing legal long-distance movements (cf. Pickering & Barry, 1991; Sag
& Fodor, 1995).

2.1.2. Island Constraints: Violation of the movement constraints
As shown, the movement of wh-phrases in wh-movement languages is strictly limited to
be local, and it needs a mediating site in its representation to go out of more than one bounding

2

While what constitutes a bounding node varies cross-linguistically, it generally refers to a
NP/DP or IP/TP (S) in English, which is circled in the tree structure in (3) (Hawkins, 1999;
Rahman, 2009).
15

node. The unavailability of a mediating site in the structural representation may result in
movements that are not grammatically licit for long-distance movement, as illustrated in (4).
(4) What did the reporter meet the politician who supported at the congress?

What makes the sentence in (4) ungrammatical is the movement of what, the object argument of
the relative clause verb supported. Considering there is more than one bounding node (i.e., TP1
and TP2 that are circled) between the surface (i.e., sentence initial) and original canonical
position, the only way the what can move to the current place without violating the subjacency
principle is to move via the spec of the lower CP (i.e., CP2), just as is the case with who in (3).
However, it is not possible to take this route because the site is already occupied by the relative
pronoun who. As a result, the only option is moving directly to the spec of CP1, but this makes
the movement illicit, resulting in the sentence being ungrammatical as it crosses two bounding
nodes (TP1 and TP2) in a single movement. This phenomenon is generally known as the island
constraints, specifically a relative clause island in the case of (4) (Ross, 1967). Ross identified
that a to-be-raised wh-constituent cannot be placed within certain structure types (called islands,
including relative clauses among others such as complex NP, subject NP, and adjunct island),
because as shown in (4), there is no way for the constituent to be legally extracted out of those
island structures without violating the locality constraint (for a review, see Belikova & White,
2009).

16

2.1.3. Grammatical representations of wh-structures in wh-in-situ languages
While English, a language with a strong wh-feature, involves syntactic movement
operations guidance of the locality constraints as discussed above, some other languages such as
Chinese, Japanese, and Korean do not require such overt movement operations at least at the
level of syntax, because the wh-feature of those languages is weak, and therefore does not
require any further steps for feature checking. See the Japanese wh-question in (5) and its
equivalent in Korean in (6) below.
(5) Japanese
Johnâwa

[CP Maryâga

kinou

naniâo

katâtaâ to ] oboete

JohnâTOP [CP MaryâNOM yesterday whatâACC buyâpast-C] remember
What did John remember Mary bought yesterday?

imasu-ka?
is-Q-Part?

(Hawkins & Hattori, 2006, p.275)

(6) Korean
Johnâen

[CP Maryâka

eoje

mwsettâeul

JohnâTOP [CP MaryâNOM yesterday whatâACC

sattaâko ] kiyeokkako isseumniâka?
buyâpast-C] remember is-Q-Part?

What did John remember Mary bought yesterday?
In these two sentences, the wh-phrase nani in (5) and mwsett in (6) is a wh-word equivalent to
âwhatâ in English, which remain in their canonical position (i.e., in situ) as the object of kata and
satta âboughtâ respectively. Thus, neither overt syntactic movement operations for feature
checking nor the subsequent locality movement constraint is instantiated in those languages3.

3

Some researchers have questioned whether languages such as Chinese really lack overt
evidence of the operation of Subjacency. See Lardiere (2008) for discussion.
17

2.1.4. Summary
As shown above, there is a cross-linguistic variation in the strength of wh-feature and its
consequences in the structural representations across different languages. This has been a topic
of considerable interest within the Universal Grammar (UG)-based L2 acquisition research (e.g.,
Bley-Vroman, Felix, & Ioup, 1988; Johnson & Newport, 1991; Schachter, 1989, 1990; Schachter
& Yip, 1990; White, 1992; White & Juffs, 1998). Specifically, the main research question these
studies address is whether adult L2 learners whose L1s lack wh-movement can successfully
acquire the abstract and subtle locality constraints instantiated in their target languages (e.g.,
English). Whereas most of these studies used offline grammaticality judgment tests (e.g.,
detecting sentences with subjacency violation), the results are mixed. As noted earlier in Chapter
1, the current study tested Chinese and Korean speakers (wh-in-situ) learning English (whmovement) to examine how they process filler-gap dependency structures online in their L2
English. Instead of using grammaticality judgment tests, this study analyzed learnersâ eyemovement patterns during reading to provide insight into whether they have acquired the
relevant L2 grammar that cannot be acquired from their L1 (i.e., locality constraint, specifically
with relative clause islands), and if so, whether they can employ such knowledge of L2 grammar
during online processing. The next section discusses the nature of language processing in general
followed by a review of literature on L1 and L2 filler-gap processing research.

2.2. Incremental processing: Evidence from processing of garden-path sentences
Models of processing offer different accounts of when and how the different components
of processing come into play during comprehension. Of those, the modular-based accounts (e.g.,
Frazier, 1987, 1998; Frazier & Rayner, 1982) have been predominant both in L1 and L2 sentence

18

processing research4. This model assumes that each operation is computed in its own module
separately due to computational limitations; the syntactic operation occurs at an earlier stage of
processing and builds structural representations so that other types of processing (e.g., semantic
processing) can come into play at later stages of processing. One of the most consistent
observations found across the sentence processing research under these modular accounts is that
comprehension is formed through a series of incremental interpretations. That is, the parser
organizes a representation of the sentence incrementally, word by word, in a bottom-up fashion,
computing applicable syntactic/semantic information immediately as each word comes into the
parse, which is well presented in the parsing principle proposed by Frazier and Rayner (1982)âs
âlate closureâ and Pritchettâs (1992) âgeneralized theta attachmentâ as in (7) and (8), respectively.
(7) Late Closure: When possible, attach incoming lexical items into the clause or phrase
currently being processed. (Frazier & Rayner, 1982, p. 180)
(8) Generalized Theta Attachment (GTA): Every principle of the syntax attempts to be
maximally satisfied at every point during processing. (Pritchett, 1992, pp. 155)
The incremental nature of language processing has been well attested in research on processing
of garden-path type of sentences (e.g., Frazier, 1987), such as the one in (9):
(9) After Mary ate the pizza arrived from the local restaurant.

(Juffs, 1998, p. 411)

What may lead readers down the so-called garden-path in reading (9) is the likely initial
interpretation of the pizza as the direct object and theme of the preceding verb ate, by means of

4

Another model that competes with the modular-based model is the constraints-based interactive
model (e.g., MacDonald, Pearlmutter, & Seidenberg, 1994; Tanenhaus & Trueswell, 1995),
which assumes that all possible sources of syntactic alternatives (e.g., semantics, context, and
frequency of syntactic structure) are processed in parallel, and one information receiving most
support from the analysis gets higher activation. See, van Gompel & Pickering (2007).
19

incremental VP integrations. This interpretation, however, must be rapidly revised as soon as the
parser reaches arrived, in that the noun phrase (the pizza) should be integrated into the matrix
clause receiving a new case and thematic role from the matrix verb arrived. The structural
computations for such reanalysis are costly and may impose a momentary processing burden on
the parser (e.g., theta reanalysis constraint5, see Pritchett, 1992), possibly yielding a slowdown at
arrived. This phenomenon is generally referred to as a garden-path effect. As illustrated,
incremental structure building may result in relatively complex computations at times because it
integrates a word without knowing what will follow next, but it has been well attested that it is
an essential design feature that eventually helps the processor to keep its (working) memory
system manageable for efficient comprehension (e.g., Frazier & Fodor, 1978; Lewis, 1998; Staub
& Clifton, 2006). The next section discusses how the parser carries out gap search processes for
the filler while building the structural representation incrementally.

Theta Reanalysis Constraint (TRC): âSyntactic reanalysis which re-interprets a theta-marked
constituent as outside of a current theta domain is costly.â (Pritchett, 1992, p.15)
5

20

2.3. Processing of filler-gap dependencies: Incremental gap search processes
When processing a sentence such as the one in (3) above, which is copied in (10) below
with a slight modification for readersâ convenience, the parser must search for the canonical
position of the filler (i.e., the gap) and integrate it with a relevant component (e.g., the verb).
(10) The manager who the consultant claimed that the new proposal had pleased will
hire five workers tomorrow.

21

This so-called filler-gap process must be completed as quickly as possible given that working
memory, which is responsible for maintaining unanalyzed filler information, has limited
capacities (e.g., Gibson, 1998; Wagers & Philips, 2014). Linguistic theories and related
processing frameworks diverge as to how the filler-gap dependency relation is formed in the
representation, and consequently what kind of linguistic resources the parser consults to
construct a representation to link the dislocated filler with the canonical gap site in the most
economical way (e.g., Gibson & Warren, 2004; Traxler & Pickering, 1996), but a broad
consensus is that the gap-search process is essentially incremental as well. That is, once a
displaced filler is identified, the parser actively searches for its canonical position by
incrementally testing out syntactic and/or semantic fits of the filler at every grammatically
possible gap position as it moves forward, whether it be of empty categories (e.g., trace-based
active filler hypothesis, Clifton & Frazier, 1989; trace reactivation hypothesis, Swinney et al.,
1988)âor subcategorizing verb6 (e.g., traceless-based direct/immediate association strategies,
Pickering & Berry, 1991). In the current study, I adopt the generative-based parsing framework
and assume that wh-constructions are formed by means of movement operations through
syntactically postulated wh-traces (i.e., a silent copy of the filler), given that there is a good

6

Non-transformational syntactic frameworks such as General Phrase Structure Grammar
(GPSG) or Head-driven Phrase Structure Grammar (HPSG) neither accept the concept of
syntactic wh-movement operations nor the postulation of unpronounced hypothetical wh-traces
(empty categories). Processing frameworks based on those syntactic accounts (e.g., direct
association) assume that the filler is directly associated with the unresolved subcategorizing verb
(i.e., filler as a missing obligatory argument of the verb). Thus, the crucial cue for filler-gap
processing under this system is the verb, not empty categories (cf. Aoshima et al., 2004). Note
that with head-initial languages such as English, it is empirically difficult to dissociate the two
accounts (i.e., trace- vs. traceless-based) because the sites of the potential gaps and verb
subcategorization overlap with each other (cf. Gibson & Warren, 2004; Lee, 2004). Nakano and
colleagues used wh-scrambling and object-topicalization structures in Japanese to test tracereactivation effects, a head-final language where a verb rigidly occurs before its arguments, and
found evidence of trace reactivations that occurred even before the verb is processed.
22

amount of literature that shows evidence for the psychological reality of syntactic wh-traces in
processing, as will be discussed below (e.g., Gibson & Warren, 2004; Lee, 2004; Nakano, Felser,
& Clahsen, 2002, as cited in Marinis et al., 2005; see also, Featherston, 2001 for related
discussions).
According to the trace-based parsing accounts, the incremental gap-search process is
mediated by sets of wh-trace positions assigned by the grammar, such as potential argument
positions or cyclic non-argument Spec of CP2 positions at the clausal boundaries as in (8),
through which the parser reactivates the filler information from the left-most possible extraction
site (i.e., structurally defined gap) until it finally confirms the true canonical position of the filler.
Retrieval of the filler in such a manner has been argued to not only reduce memory cost in the
working memory system that otherwise may have been higher especially with increased linear
filler-gap distance (Gibson, 1998), but also to facilitate the ultimate filler-gap integration (e.g.,
Gibson & Warren, 2004; Traxler & Pickering, 1996). In their self-paced reading experiment,
Gibson and Warren (2004) tested a) whether native English speakers make use of the mediating
wh-trace site (âintermediate structureâ in the authorsâ terms) when processing English longdistance wh-constructions such as the one in (10)âspecifically a reactivation at the nonargument Spec CP2 posited at the complementizer thatâand if so, b) whether this facilitates the
later filler integration at the final destination, pleased as a result of a decrease in linear distance
between the filler and the gap. To observe the facilitation effect of the mediating gap site [a] in
(11), the [- intermediate gap] counterpart of the sentence was added, as in (12).
(11) The manager whoi the consultant claimed [A] ti that the new proposal had pleased [B] ti
will hire five workers tomorrow.

(+extraction across VP, + intermediate gap)

23

(12) The manager whoi the consultantâs claim about the new proposal had pleased [C] ti
will hire five workers tomorrow.

(+extraction across NP, - intermediate gap)

Also included were non-extraction counterparts of the two extraction types, as exemplified in
(13) and (14).
(13) The consultant claimed [D] that the new proposal had [E] pleased
the manager who will hire five workers tomorrow.

(-extraction, VP)

(14) The consultantâs claim about the new proposal had [F] pleased the manager who will
hire five workers tomorrow.

(-extraction, NP)

Gibson and Warren found that reading times (RTs) at the gap sites (i.e., [A], [B]. and [C]) in the
extraction condition were significantly longer than RTs at the corresponding regions in the nonextraction condition (i.e., [D], [E], and [F] respectively), showing evidence for filler-retrievals in
the gap sites. This suggests that in the extraction conditions, the parser spent extra processing
time to postulate a gap for the filler retrieval and run analyses to evaluate its appropriateness as
the potential landing site, whereas there is no need to do so for the sentences in the nonextraction condition. More crucially, reading profiles at the ultimate gap (i.e., pleased) were
found to be significantly shorter in the [+ intermediate gap] in (11) than in the [- intermediate
gap] condition in (12). Taken together, Gibson and Warren claimed that native English speakers
incrementally postulate intermediate gaps in accordance with the syntactic representation in their
mental grammar and utilize those gaps for the filler reactivation, which not only helps the parser
to maintain the filler information with less memory burden, but also facilitates the integration of
the filler with its subcategorizing verb (i.e., [b] pleased in (11)) as the linear distance between the
reactivated filler and the gap decreases.

24

Another piece of evidence supporting incremental gap search comes from studies that
investigated the processing of wh-structures manipulated for plausibility, as illustrated in (15).
Plausibility in this case refers to the semantic relationship between the filler and the first verb
that the parser encounters where an early gap-filling analysis can take place under the
assumption of the active filler-gap creations. Thus, the integration of each of the antecedent NPs,
the book and the city, with the verb wrote yields either a plausible (i.e., wrote the book) or an
implausible (i.e., wrote the city) interpretation respectively.
(15) We liked the booki / cityi that the author wrote ti unceasingly and with great dedication
about ti while waiting for a contract.
Traxler and Pickering (1996) examined through their eye-tracking reading study whether L1
English speakers show any plausibility effect at the early gap site. It was observed that the
participants displayed significantly longer RTs in the implausible condition than in the plausible
condition at the verb, write. Such mismatched RTs between the two plausibility conditions
suggest that the parser postulates an object gap as soon as it encounters the verb (write) rather
than postponing the filler-integration until it identifies the ultimate gap at the preposition about.
The integration yielding implausible interpretations should bother readersâ processing, simply
because the interpretation does not make sense up until the initial integration point at the least,
and also partly because the parser must prepare for a reanalysis more immediately compared to
the plausible condition (see also Aoshima, Philips, & Weinberg, 2004; Frazier & Clifton, 1989;
Lee, 2004; Stow, 1986, for more review of L1 filler-gap processing).

25

2.4. Processing of filler-gap dependencies by nonnative speakers
2.4.1. A review of early research on L2 processing
One of the early studies that brought issues of L2 processing into focus was Juffs and
Harrington (1995), which examined how advanced Chinese-speaking (wh-in-situ) learners of
English process long-distance wh-constructions such as the ones in (16) and (17). While previous
L2 acquisition research reported learners performed comparatively poorer on GJTs in the subject
wh-extraction condition (e.g., Schacter & Yip, 1990; White & Juffs, 19987), the authors
investigated if the subject-object asymmetry phenomenon found in the past research was
associated with learnersâ processing problems rather than a representational deficit in the L2.
(16) Whoi did the police know ti killed the pedestrian? (subject wh-extraction)
(17) Whoi did the police know ti the pedestrian killed ti? (object wh-extraction)
Juffs and Harrington focused on different levels of parsing complexities between the two
extraction conditions based on Pritchettâs (1992) principle-based parsing accounts (i.e., see, GTA
provided in (8) and TRC provided in footnote 5 on p. 17 in this paper), and hypothesized that L2
learners would employ native-like active filler-gap strategies to fill the gap as early as possible.
However, they reasoned that learners might have relatively heavier parsing difficulties in the
subject wh-extraction condition, especially where the parser is expected to deal with
momentarily more demanding linguistic analyses.
Assuming the operation of the active filler-gap strategy, the parser would initially
postulate an object gap right after the verb know in both extraction conditions, assigning a case
and thematic role to the filler (i.e., accusative and theme, respectively from know). The relative

7

For clarification, the experiment in White and Juffs (1998) preceded the ones carried out in
Juffs and Harringtonâs (1995) study (Juffs, 2005, p.123).
26

difference in terms of processing complexity emerges in the next string, killed and the
(pedestrian), respectively: In (16), the parser must cancel the initial analysis above as soon as it
encounters the embedded verb, killed and postulate a subject gap concurrently with immediate
case/theta reassignments (i.e., from object/theme of know to subject/agent of killed). The
reassignment of the case and thematic role in this case occurs across the two different theta
domains (i.e., the two verbs; know and killed), which according to Pritchettâs (1992) theta
reanalysis constraints is more costly for the parser. In contrast, however, such heavy reanalysis
on the filler is not needed at this point in the object wh-extraction condition in (17), thus making
the momentary parsing relatively easier. The results from the word-by-word self-paced reading
confirmed the authorsâ hypothesis in that the Chinese ESL participants showed greater
processing difficulties in the subject wh-extraction condition, as revealed by significantly longer
RTs specifically at the second verb killed and significantly lower GJT performance. The authors
argued that despite the cross-linguistic variation, the Chinese ESL participants processed longdistance wh-constructions qualitatively similarly to native English speakers, but that learnersâ
processing ability to deal with moment-by-moment computations in real time might not be as
fast and efficient as native speakers, especially when the loaded parsing assignments are heavy.
Williams et al. (2001) also argued for qualitative similarities between L1 and L2
processing, which included a stop-making-sense judgment task during self-paced reading to
investigate how L2 learners deal with plausibility constraints, as in (18) and (19), during fillergap processing.
(18) Which girli did the man push ti the bike into ti late last night?
(19) Which riveri did the man push ti the bike into ti late last night?

27

[plausible at V]
[implausible at V]

In both (18) and (19), the canonical position of the filler which girl/which river is after the
preposition into in the adjunct phrase, and the plausibility manipulation is on the main verb push,
the earliest possible extraction location as a landing site for the filler (i.e., push the girl and push
the bike). Williams and colleagues found that all advanced ESL learner groups from various L1
backgroundsâ[wh-]: L1 Chinese & L1 Korean; [wh+]: Germanâshowed sensitivity to the
plausibility constraints similarly to the English controls in that they made use of the left-most
possible extraction site (i.e., push) as a possible landing site for the filler, yielding more stopmaking-sense responses at this region in the implausible condition. However, the analysis of
reading time patterns suggested that L2 learnersâ timing of reanalysis was not as immediate as
the native speaker group and was found to be delayed especially in the implausible (i.e., push the
river) than in the plausible condition (i.e., push the girl) despite the fact that implausibility
information and an incoming determiner the in âthe bikeâ signals a need for reanalysis.
Furthermore, in a separate stop-making-sense judgment experiment, it was observed that the L2
learners had problems in canceling their initial analysis even in the offline task, particularly
when the initial analysis was plausible (see also Williams, 2006). Taken together, Williams et al.
concluded that the way L2 learners process filler-gap processing is qualitatively the same as
native speakers, but they just may be slower in computing syntactic analysis and more prone to
experience greater processing difficulties, especially when the parse has to withdraw its plausible
misanalysis by the additional information that followsâreanalysis problems similar to the
garden-path effect (e.g., Juffs & Harrington, 1996).

28

2.4.2. Shallow structure hypothesis
As briefly introduced in the previous chapter, the main argument of Clahsen and Felserâs
(2006a, 2006b, 2006c) shallow structure hypothesis (SSH) is that there are fundamental
differences between L1 and adult L2 parsing. What distinguishes adult L2 processing from L1
processing under this hypothesis is the limited types of linguistic resources available to the L2
parser, particularly with respect to the availability of the parserâs access to the full-fledged and
hierarchically detailed grammatical representations during online processing. That is, the SSH
assumes that the grammatical representations that feed the L2 parser during online processing
entail structural information that is rather ârudimentary,â âshallower,â and lacking âhierarchical
details,â as compared to those deployed by native speakers (Clahsen & Felser, 2006a, p. 32).
Clahsen and Felser provided two possible reasons to account for learnersâ reduced ability
to access the full parsing route. First the SSH assumes that adult L2 learnersâ interlanguage
grammar system that feeds the parser is likely deficient and/or inadequate to process the target
language input (e.g., Bley-Vroman, 1990). Another possibility is that even if learnersâ L2
grammar is fully detailed and appropriate for parsing, adult L2 learners may not have adequate
parsing mechanisms and efficient processing abilities to compute sufficient information in real
time. Shallower representations thus subsequently may prohibit learners from constructing a
structural representation for the input in a native like manner because they simply donât have
sufficient toolsâfor example, abstract features and grammatical constraints such as copies of
movement traces and subjacencyâfor constructing hierarchically detailed syntactic
representations during online processing. The SSH predicts that learners rely on the shallow
parsing route predominantly instead, which is fed by pragmatic, simple verb-argument, and
lexical information. Evidence that supports the SSH mostly comes from research that tested

29

learnersâ processing of either long distance filler-gap dependency constructions such as (10)
above (e.g., Felser & Robert, 2007; Marinis, et al., 2005) or L2 processing of ambiguous relative
clause (RC) constructions such as the one in (18) below (e.g., Felser et al., 2003; Papadopoulou
& Clahsen, 2003). In the following, I will briefly review the literature that investigated learnersâ
processing of ambiguous RC structures first, and provide more detailed review of research on
processing of filler-gap dependency constructions.

2.4.3. A Review of the empirical research testing the SSH
2.4.3.1. L2 processing of ambiguous RC constructions
In (20) below, the noun phrase (NP) preceding the relative clause is complex, consisting of
two NPs linked by genitive of.
(20) An armed robber shot [NP1 the sister of [NP2 the actor]] [RC who was on the balcony].
The structural ambiguity arises when the parser must determine where to attach the RC. In other
words, the RC can modify either the head of the NP phrase, NP1 (also referred to as high
attachment), the sister, or NP2 (also referred to as low attachment), the actor, thereby inviting
two possible interpretations regarding âwho was on the balconyâ. Languages differ as to how
such ambiguity in (20) is resolved; speakers of some languages prefer to attach the RC to the
NP1 (i.e., the sister was on the balcony; e.g., German, Greek, Spanish, Korean), and speakers of
some other languages prefer the NP2 interpretation (i.e., the actor was on the balcony; e.g.,
English, Romanian, Swedish). Such different interpretation preferences can be explained by
cross-linguistically different structure-based parsing strategies derived from different syntactic
properties between languages. One explanation involves rigidity of word order; rigid word order
languages such as English prefer a low-attachment interpretation (referred to as recency
30

preference), and languages with relatively free word order such as Spanish or Korean prefer a
high-attachment interpretation (referred to as predicate proximity preference). See Gibson,
Pearlmutter, Canseco-Gonzalez, and Hickok (1996) for more theoretical accounts). Felser, et al.
(2003) investigated how advanced L1 Greek and L1 German (NP1 preference, both languages)
learners of English (NP2 preference) process sentences like (18) during self-paced reading and
also during offline reading. They found no clear NP attachment preference at all (i.e., no more
than chance) for either L2 groups in both online and offline reading tests, whereas the English
controls presented a clear NP2 preference as predicted. The authors suggested that no attachment
preferences found in the performance of those L2 learners may be due to their lack of ability to
apply any structure-based parsing strategies linked to the L2 (i.e., neither L1 transfer nor
nativelike), consequently making their attachment decisions random (see, Clahsen and Felser,
2006, and Papadopoulou & Clahsen, 2003, for similar findings; but see Dussias,2003; Dussias &
Sagarra, 2007; Frenck-Mestre, 1997, for counter-evidence).

2.4.3.2. L2 processing of long-distance filler-gap dependencies
Recall the studies discussed in section 2.4.1, which maintained that L2 learners, like
native speakers of the target language, are able to apply relevant syntactic information and
incrementally construct a representation that includes unpronounced syntactic gaps that are
compatible with the grammar, but in a less efficient way when the parser is loaded with complex
linguistic computations. In this regard, however, Clahsen and Felser (2006a) and Marinis et al.
(2005) pointed out that with the types of test materials used in Juffs and Harrington (1995) and
Williams et al. (2001), it is not possible to provide unequivocal evidence that L2 learners indeed
made use of syntactically driven L2 information to postulate a gap. This is because in English, a

31

potential wh-extraction site is always adjacent to a verb, so that the potential gap positions and
verb subcategorization and argument positions always overlap with each other. Consequently, it
is unclear whether the incremental gap search by L2 learners is guided by syntactically driven
trace information (active filler/trace reactivation hypothesis) or lexically driven verb-argument
information (direct or immediate association hypothesis).
In an attempt to disambiguate such obscurity as to what types of knowledge resources are
used by L2 learners, Marinis et al. (2005) adopted the test materials used in Gibson and Warren
(2004) that observed the filler reactivation effect at the non-argument trace positions (specifier of
the CP), as shown in (11) and (12), copied in (21) and (22) below. Slashes indicate how the
sentences were segmented in their self-paced reading task.
(21) The manager whoi / the consultant claimed / [A] ti that / the new proposal /
had pleased [B] ti / will hire five workers tomorrow. (+extraction, + intermediate gap)
(22) The manager whoi / the consultantâs claim / about / the new proposal /
had pleased [C] ti / will hire five workers tomorrow. (+extraction, - intermediate gap)
L2 learners from both wh-movement (German & Greek) and wh-in-situ (Chinese & Japanese)
L1 backgrounds, as well as native English speakers as controls performed a segment-by-segment
self-paced reading task in English. Marinis et al. obtained RT profiles of the English controls that
are similar to the native speaker data in Gibson and Warrenâs study; the native speakers
displayed elevated RTs in the intermediate gap at [A] in (21), relative to its non-extraction
counterpart (see an example in (13)), signaling a filler-activation effect at this non-argument
trace position. The authors also found the reading patterns that were similar to those found in
Gibson and Warren (2004). That is, the filler integration at [B] by the native English speakers
was significantly faster than their filler integrations at [C], suggesting linearly decreased distance
32

between the gap and the filler as a consequence of the filler retrieval at the intermediate gap [A]
eventually facilitated the filler integrations at the ultimate gap site where the filler is integrated
with its subcagorizing verb. In contrast, neither the filler activation effect at [A] nor the
facilitation effect at [C] were found in the L2 learnersâ RT profiles regardless of their L1
backgrounds, which according to the authors is suggestive of no syntactically driven gap
postulations at the intermediate gap positions for learners, presumably due to unavailability of
fully detailed syntactic information from their interlanguage grammar, at least during online
processing where moment-by-moment rapid computations must take place. Based on those
findings, Marinis and colleagues concluded that although L2 learners might search for the gap in
an incremental manner like native speakers, they tend to rely much more on lexical-semantic
information rather than syntactic information, thus attempting to associate the filler directly with
incoming verbs.
However, in their reanalysis of Marinis et al.âs RT data later, Dekydtspotter, Schwartz,
and Sprouse (2006) found that that the German and Japanese groups displayed spillover effects
at the region right after the intermediate gap region [A], the new proposal in the [+intermediate
gap] condition. Dekydtspotter et al. argued that the delayed filler activation effects from learners
might suggest that L2 learners might be slower and less efficient in integrating syntactic
information during online parsing, but that does not necessarily indicate that learnersâ underlying
L2 parsing mechanisms are qualitatively distinct from those of the native speakers.
Using the same test design and materials tested in Marinis et al. (2005), Pliatsikas and
Marinis (2013) probed whether more exposure to naturalistic L2 input has an influence on adult
L2 learnersâ processing. They compared reading profiles of native English speakers with the two
advanced Greek-speaking ESL groups: the NE (naturalistic exposure) group consisting of ESL

33

learners who had been immersed into English-speaking environments (average LOR of 9.42
years) and those whose L2 experiences were limited to classroom exposure (CE) from their
home countries. The authors found that while the CE group showed nonnative-like processing
patterns (no evidence for filler reactivation on the non-argument syntactic gap and no facilitation
effect at the final gap site in the +intermediate gap condition), the reading profiles from the NE
group were found to converge with those from the native speaker group, showing evidence for
the gap postulation at a site consistent with the grammar.
Another study by Felser and Roberts (2007) also explored adult L2 learnersâ use of
intermediate gaps in real time processing. Felser and Roberts implemented a cross-modal picture
priming task to examine whether advanced Greek-speaking [wh-movement, head-initial] ESL
learners, divided by the two groups; low and high WM capacity, can demonstrate a picture
priming effect (i.e., a picture that is identical to the antecedent) as a reference to the filler
reactivation at structurally defined gap positions, as do the native speakers tested in Roberts,
Marinis, Felser, & Clahsen (2007). Felser and Roberts found that performance of both the high
and low WM L2 groups differed from the native speakers in Roberts et al.âs study in that the
learner group showed a priming effect not only at the structurally positioned gap site, but also at
the control gap (a structurally unrelated gap), whereas the native speakers, more precisely only
those with the high WM spans8, showed the priming effect only at the syntactically relevant gap
site. Based on these results, the authors concluded that whereas the learners in their study
activated and maintained the filler information throughout their listening to the target sentences,
they did not make use of grammatical details specifically for the filler reactivation at a gap, but

Note that in Roberts et alâs study, the native speaker group with the low WM spans show
priming effects neither in the control nor in the syntactic gap sites.
8

34

instead relied more on lexical and other non-structural cues to compensate for their limited
grammatical processing.

2.4.3.3. L2 processing of island constraints
More recently, a few studies have begun to explore whether the parser respects syntactic
island constraints in processing long-distance filler-gap dependency structures in the L2. Recall
that the filler cannot be placed within certain structure types (islands) because the to-be-raised
filler cannot move out of those island structures without violating locality constraints. From a
processing perspective, this means that the parser would not postulate a gap when it encounters a
syntactic island structure, as there is no syntactically licit gap in the grammatical representation.
Omaki and Schulz (2011), for example, investigated how advanced Spanish-speaking ESL
learners process wh-constructions such as (23) and (24), as compared to native English speakers
(p. 575).
(23) [No island, Âąplausible]
The book/city that the author wrote___ regularly about___ was named for an explorer.
(24) [Island, Âąplausible]
The book/city that the author who wrote regularly saw___ was named for an explorer.

In their self-paced reading experiment, the authors used a plausibility manipulation as a
diagnostic tool to examine whether learners avoid gap postulations inside the relative clause
island structure. In the non-island condition, a plausibility effect is expected presupposing the
parserâ use of active filler strategy at wrote (i.e., longer reading times to read wrote the city than
wrote the book). On the other hand, no such effect must be expected if the parser brings the

35

detailed syntactic representation of the island constraints into the parse, given that a plausible or
implausible interpretation occurs only when the parser integrates the filler with the potential
subcategorizer (wrote). The authors found that both the native English controls and the Spanish
L2 group showed a plausibility effect only in the non-island condition, suggesting that the
Spanish ESL learners did not attempt to create a gap in the island condition by utilizing the
island constraints during reading. Based on their results, Omaki and Schulz (2011) have
challenged the claims of the SSH, arguing that nonnative speakers, at least those who are
advanced learners, can build abstract and detailed syntactic representations during filler-gap
processing (see also, Aldwayan, Fiorentino, & Gabriele, 2010 [L1 Najdi Arabic <-wh>];
Cunnings et al., 2010 [L1 German <+wh> & L1 Chinese <-wh>]; Felser et al., 2012 [L1 German
<+wh>] for similar findings).
In a more recent study by Kim, Baek, and Tremblay (2015), the authors investigated how
L1 Korean [-wh] and L1 Spanish [+wh] learners of ESL process island constraints in English.
While the test materials they used were similar to those in Omaki and Schulz (2011), Kim et al.
employed a stop-making-sense task in the course of a segment-by-segment self-paced reading.
The target sentences in their experiment are illustrated in (25) and (26).

(25) I wonder / which book | which city / the author / wrote passionately / about /
while / he / was travelling.

[non-island, plausible | implausible]

(26) I wonder / which book | which city / the author / who wrote passionately / saw /
while / he / was travelling.

[island, plausible | implausible]

36

In their results, the authors found in the stop-making-sense task that all group showed a
plausibility effect only in the non-island condition. However, the L1 Korean group showed
different response patterns from the L1 Spanish group and the English controls at while in the
non-island condition. That is, for the native English and L1 Spanish ESL groups, their stopmaking-sense rate increased as soon as they found the object argument of the preposition was
missing in (25), reflecting a reanalysis effect (i.e., cancelling the initial plausible interpretation).
The L1 Korean group, however, did not show such an effect in either island condition. In their
reading time analysis, Kim et al. found somewhat interesting patterns. That is, the reading time
profiles of the native English and L1 Spanish groups showed a significant interaction of
plausibility and island, signaling that these participants did not postulate a gap in the island
condition, and showed a plausibility effect only in the non-island condition. On the other hand,
the L1 Korean group showed a similar reading pattern across the island conditions, with
increased reading times in both island conditions (i.e., no interaction of island and plausibility).
The statistical analysis revealed that unlike the other two groups, the L1 Korean ESL learners
showed no significant interaction of the two factors. Kim et al. interpreted these results as
suggesting that although L1 Korean participants knew that a gap was not allowed inside the
island structure (from their stop-making-sense judgments and offline grammaticality judgments),
their application of the relevant grammatical constraints might have been delayed at an early
stage of processing (from their reading profiles) presumably due to crosslinguistically different
ways of filler-gap formations in Korean. As a result, the authors claimed that unlike the Spanish
ESL group that shares the same [wh] feature property and overt wh-movement characteristics
with English, the Korean ESL learners whose L1 is distinct from English in this respect might

37

have more difficulties applying the grammatical representations immediately in real time (i.e.,
L1 effect).

2.5. The effect of age of immersion (or acquisition) and critical period hypothesis
As discussed earlier, the successful application of syntactic representations during online
processing largely depends on the availability of adequate knowledge of target language
grammars in the first place, without which the parser may not construct fully detailed syntactic
representations, simply because it would not have the necessary tool to work with regardless of
the availability of sufficient parsing strategies and abilities. The question of to what extent adult
learners can acquire target language grammar has been one of the hotly debated topics over the
years in second language research, especially in relation to the role of age of
acquisition/immersion on adult learnersâ ultimate attainment (e.g., Birdsong, 2005; BleyVroman, 1990, 2009; Dekeyser, 2000, 2010; Johnson & Newport, 1989, 1991; Juffs &
Harrington, 1995; Rothman, 2008; Schwarz & Sprouse, 1996; Weber-fox & Neville, 1996;
White & Juffs, 1998). The discussion of age-related effects in adult L2 acquisition starts from the
general observations that the adult L2 acquisition is not as reliable and stable as child L1
acquisition. Such differences between L1 and L2 acquisition often have been explained by the
critical or sensitive period hypothesis (CPH) (Penfield & Roberts, 1959; Lenneberg, 1967). The
CPH assumes that there is a limited developmental period that allows the acquisition of a
language (L1, and perhaps L2) at a normal and native-like level. Once this period is over, the
ability to learn a language declines, due to maturational changes in the neuro-biological system
that is responsible for language learning (Birdsong, 1999; see also Singleton, 2005, for a review
of different ranges of CP across different studies).

38

There are reasons to believe, however, that no such critical or sensitive period exists. As
pointed out by Slabakova (2006), L2 acquisition differs from L1 child acquisition, given the fact
L2 learners already have L1s, meaning that the language learning system in the brain has already
been activated fairly early. She suggested that it would be more appropriate to consider agerelated effects in L2 learning more in general. In the following, I review a few crucial empirical
studies that directly investigated the effect of age of immersion in the acquisition of L2
grammars.
First, Johnson and Newport (1989) tested 46 L1 Chinese or Korean speakers learning
English using an audio grammaticality judgment task that included various types of morphosyntactic structures. The L2 participants varied in their ages of arrival (AOA), ranging from 3 to
26 years old, based on which they were divided into two groups: early arrivals (AOA:3-15) and
late arrivals (AOA:17-35). The two groups were matched for length of residence. Johnson and
Newport found that L2 learners with an AOA of seven and under showed a native-like GJT
performance. For those early arrivals whose AOA fell between 7-15, there was a linear decline in
the GJT scores. The late arrivals performed generally more poorly than the early arrivals, but
there were no further gradual declines between their performance and increasing AOA, and they
showed greater degrees of performance variability regardless of their AOA. Johnson and
Newport suggested that there is a critical period for second language acquisition (see also
Johnson & Newport, 1991, for similar findings from their test of L2 subjacency with L1
Chinese-speaking learners of English).
Using the same type of an audio grammaticality judgment task developed from Johnson
and Newport (1989), Dekeyser (2000) tested 57 L1 Hungarian-speaking ESL learners whose
ages ranged from 16 and 81, with a minimum of 10 years of length of residence. The participants

39

were divided into two groups based on their ages of arrival: early learners (AOA between 1-16)
and late learners (AOA between 17-40). In addition to the GJT, Dekeyser also measured L2
learnersâ verbal aptitude using the Hungarian version of the MLAT (Modern Language Aptitude
Test). Dekeyser found a strong negative correlation between learnersâ AOA and their GJT
performance. However, differently from Johnson and Newport (1989), neither late learners nor
early learners showed a linear decline when the two subgroups were analyzed separately.
Dekeyser also found that there was a positive correlation between the GJT performance and the
verbal aptitude scores for the late learners, whereas no such correlation was found among the
early learners. Dekeyser used this finding to support Bley-Vromanâs (1990) fundamental
difference hypothesis, claiming that whereas early learners reached native-like levels of
proficiency independently from their language aptitude, late learners cannot acquire native-like
L2 competence unless they had above average language aptitude that signals more explicit and
analytic language analysis and general problem solving skills.
Birdsong (2014) later conducted an additional correlation analysis with the data in
Dekeyser (2000, see appendix A in Dekeyserâs paper). Interestingly, he found that for all AOAs
together, learnersâ years of schooling was significantly correlated with their grammatical
proficiency. Birdsong also found that learnersâ levels of education was positively correlated with
their GJT scores, not only for the late learners, but also for the early learners in Dekeyserâs study,
indicating that the âeducation effect is systemic: significant correlations are not restricted to
certain AOA spans or certain aptitude levelsâ (p. 48). See also, Hakuta, Bialystok, & Wiley,
2003, for a similar finding regarding the role of the amount of formal education.
As shown above, it appears that there is a role of age of acquisition and/or age of
immersion in the acquisition of L2 grammar at least to a certain degree. However, the question of

40

how strictly certain critical periods constrain the degree to which adult learners can develop their
target language grammar remains to be seen.

2.6. The role of working memory on L2 parsing
The role of individual differences in working memory (WM) has been receiving more
attention in L2 processing research recently (e.g., Dussias & PiĂąar, 2010; Felser & Roberts,
2007; Juffs, 2004, 2005; Juffs & Harrington, 2011; Sagarra & Herschensohn, 2010). WM is a
âmulticomponent system responsible for active maintenance of information in the face of
ongoing processing and/or distractionâ (Conway, et al., 2005, p. 770). According to Baddeleyâs
(2003) most recent WM model, WM is made up of 3 sub-components, namely, the central
executive, the short-term storage system (subdivided into the visuospatial sketchpad and
phonological loop), and the episodic buffer. Under this system, the central executive supervises
processing of perceived information (e.g., auditory/reading input) and controls the flow of this
information to the other subcomponents. These subcomponents are a) the episodic buffer for
linking to the long-term memory system, and b) the phonological loop for temporarily storing
information (specifically, auditory input) in the phonological short-term store9 and maintaining it
through the rehearsal process while other information is processed. In sum, WM involves a
storage that can maintain a limited amount of information (e.g., trying to retain a filler) in the
face of simultaneous processing (e.g., concurrent processing of incoming input for integrating
them into meaningful units).

9

WM differs from phonological short term memory (PSTM). One such example for PSTM is
the Non-Word Repetition (NWR) test, which have been often used in SLA research (e.g.,
Hummel, 2010) especially in relation to L2 vocabulary development. Since PSTM catches the
memory capacity that occurs only within the phonological loop without simultaneous processing
consideration, it is different from WM.
41

In regard to the role of working memory in L2 domain, a few studies reported a positive
correlation between WM span measures (particularly, reading-span tests) and L2 reading skills
(e.g., the grammar and reading sections in the TOEFL test; Harrington & Sawyer, 1992), and
grammaticality judgment tests (e.g., Robinson, 2002). However, how WM has an influence on
L2 online processing, especially processing of filler-gap dependencies and ambiguous relative
clause constructions, has been investigated only in a limited number of studies thus far, and the
results are mixed. For example, in his replication study of Juffs and Harrington (1995), Juffs
(2005) used a series of working memory capacity measures to see if individual differences in
WM affect learnersâ reading patterns during filler-gap processes. Specifically, Juffs wondered
whether individual differences in WM have an influence on L2 learnersâ processing, especially
when they were under greater processing pressures for computing a filler-gap reanalysis at killed
in reading Whoi did the police know ti killed the pedestrian? (see section 2.4.1). The measures
included a L2 English reading-span test (Daneman & Carpenter, 1980), an L1
Spanish/Chinese/Japanese reading test (e.g., Osaka & Osaka, 1992), and a word-span test in L1
and in L2 (Baddeley et al., 1998). Juffs found no correlations between individual learnersâ
reading patterns at the critical region and any of the WM measures. However, Juffs and
Rodriguez (2015) later noted that non-significant associations between processing and WM in
that study might have been due to the older methods employed in their tests (e.g., manually with
cards).
Felser and Robertsâs (2007) study that used a reading span test (Harrington & Sawyer,
1992) in learnersâ L2 also found no WM effect for L2 learners in that participants with both high
and low working memory failed to present a position specific antecedent reactivation effect. In
other words, although the presentation of the antecedent (filler)-matched picture is supposed to

42

facilitate participantsâ reaction (i.e., priming effect) only at a structurally possible gap position,
those L2 learners showed a reactivation effect not only at the gap site, but also at the non-gap
site, making it difficult to interpret the reactivation effect as a result of use of structural
information. Note however, the low WM native English group in their study did not show any
reactivation effect either, even in the no-gap control condition, making it difficult to interpret the
differences between the L2 group and the NS English group with low WM. Thus, it could be that
the cross modal (listening and visual) picture priming task they implemented might have been
too difficult even for some of the native English speakers, which might have masked a potential
role of individual differences in WMC (see also Nakano, Felser, & Clahsen, 2002, for similar
native speaker results from the same experiment type).
Another study that found no WM effect on L2 processing is Felser et al. (2012).
Although they did not include the WM results in their article, Felser et al. noted that they
implemented a reading span test (L1- Daneman & Carpenter, 1980, and L2- Harrington &
Sawyer, 1992) in their eye-tracking research on L2 processing of island constraints with
proficient German-speaking learners of English. However, they dropped the WM results as no
WM-related effects were found in their analysis.
On the other hand, Dussias and PiĂąar (2010) found some reliable effects of WM in their
L2 English long distance filler-gap processing study with proficient L1 Chinese learners of
English. They used a reading span test adopted from Waters et al. (1987) and Waters and Caplan
(1996)âs reading span WM measure, which has a plausibility judgment of sentences as a
processing component and recall of the last word of each sentence as a memory component. The
test was given in the L2 (i.e., English). In the analysis, Dussias and PiĂąar divided each L1 group
into two subgroups (high and low WM) by using the median WM score as a splitting point. They

43

found that the high WM learners, but not the low WM learners, showed evidence of filler-gap
reanalysis in a similar way to the native English controls (see OâRourke, 2013; Sturt et al, 1999
for similar findings in L1 processing research on processing of garden-path and filler-gap
dependency).
In a more recent study by Hopp (2014), the author investigated the effect of individual
differences in working memory and lexical decoding skills on processing of globally (offline)
and (temporarily) ambiguous relative clause constructions in L2 English by German speakers.
The author used a reading span test developed by Ariji, Omaki, and Tatsusa (2003), in that
participants were asked to perform a segment-by-segment self-paced reading followed by an
acceptability judgment about each sentence. The target to recall was one of the words in each
sentence that was printed in capitals. Along with other results, Hopp found that higher WM L2
learners tended to prefer to attach the relative clause to a lower NP during their offline judgment
test, suggesting that they employed phrase-structure-based parsing strategies more, whereas the
lower WM L2 learners adopted chunking strategies and preferably attach the relative clause to a
more discourse prominent higher NP. However, in his online eye-tracking reading experiment,
Hopp found that learnersâ lexical decoding skills (as measured by a lexical decision task) were a
better predictor for L2 learnersâ behaviors.

2.7. Research Questions
The main goal of the present study is to investigate how advanced early and adult ESL learners
process structurally complex filler-gap dependency constructions in the L2 during online
processing. By implementing an eye-tracking method, the focus of the study is primarily on 1)
whether these learners are sensitive to the structural cues and making use of relevant

44

grammatical knowledge of island constraints in an appropriate way, and 2) whether individual
differences in working memory capacity have influence on learnersâ application of the
grammatical information in real time. This dissertation is guided by the following research
questions:
1. The effect of age of acquisition/immersion on L2 processing
Do early ESL learners, adult ESL learners, and native English speakers show any different
processing behaviors across the experimental conditions while processing filler-gap
constructions in English?
1.1. Early gap: Use of active filler strategy and online application of island constraints
At the earliest possible gap site (Region1), do native speakers, advanced early and adult
ESL learners show evidence for active gap creation in the non-island condition, but not in
the island condition, thereby showing a reliable interaction of plausibility and island
constraints? Specifically,
(A) Do all three groups attempt to postulate a gap and integrate the filler, thereby presenting
a sensitivity to the plausibility manipulation in the non-island condition, as measured by
longer reading times and more regressions in the implausible than in the plausible
condition?
(B) Do all three groups avoid postulating a gap when encountering the verb inside the
embedded relative clause island, thereby presenting no plausibility effect, thus showing
evidence that they integrate detailed grammatical information of the relative clause island
constraint into the parse?
1.2. Ultimate gap: The effect of filler-gap (re)analysis
At the ultimate gap site (Region3), do native speakers, advanced early and adult ESL

45

learners show evidence for filler-gap reanalysis in the non-island condition, but not in the
island condition? Specifically,
(A) Do all three groups show a reanalysis effect in the non-island condition, displaying more
difficulties in cancelling and revising their misanalysis in the plausible than in the
implausible counterpart, as measured by longer RTs and more regressions in the plausible
than in the implausible condition?
(B) Do all three groups show no reanalysis effect in the island condition, in consequence of
no gap-postulation inside the relative clause island?
2. The effect of individual differences in WMC on L2 processing
How do individual differences in working memory capacity (WMC) influence the way native
English speakers, early and adult ESL learners process filler-gap dependencies in L2 English?
2.1. Do differences in WMC of the native English speakers, early and adult ESL learners
influence the way they respond to the plausibility manipulation in the non-island condition?
Specifically, do lower WMC readers show any evidence that they are less sensitive to the
plausibility manipulation in the non-island condition?
2.2. Do differences in WMC of the native English speakers, early and adult ESL learners
influence the way they integrate the knowledge of island constraints into the parse? That is,
do lower WMC readers show any evidence that they are more likely to postulate an illicit gap
inside the island structure?
2.3. Do differences in WMC of the native English speakers, early and adult ESL learners
influence the way they perform a reanalysis in the non-island condition at Region3?
Specifically, do lower WMC readers show any evidence that they are less sensitive to the
need for a reanalysis in the non-island condition?

46

CHAPTER 3: METHOD
3.1. Participants
A total of 52 advanced learners of ESL took part in the current study. They varied in
terms of their age of arrival to the United States from two to thirty-one years. A group of 25
native English speakers also participated as controls. Data from three ESL learners had to be
excluded from the analyses due to their overall lack of comprehension of the target sentences in
the eye-tracking experiment, details of which are provided later in this chapter. In addition, data
from one native English speaker displayed a high percentage of track loss consistently across the
trials during eye-movement recording, and his/her data were removed from the analyses. As a
result of these exclusions, the sample size was adjusted to 49 for the L2 learners and 24 for the
English controls.
Most of the native English speaker controls (N = 24: 14 female & 10 male, mean age:
23.42, SD = 7.79, range: 18 - 51) were either undergraduate (n = 15) or graduate (n = 7) students
studying at Michigan State University. The remaining two participants were recent MA
graduates who were working as ESL instructors at the time of testing. The L2 participants had
either L1 Chinese (n = 13) or L1 Korean background (n = 36), but the two L1s were collapsed to
represent the L2 learners in this study. Crucially, Chinese and Korean are both wh-in-situ
languages, thus providing a good testing ground to evaluate whether these ESL learners have
acquired relevant L2 grammatical knowledge of wh-movement constraints that is not instantiated
in their L1 syntactic representations, and if so, whether they can make use of the knowledge in
order to construct detailed English filler-gap dependency constructions in real time.
To explore whether there is an age-related effect on L2 processing, the ESL learners were
assigned into one of two groups based on their age of arrival in an English-speaking

47

environment, an early ESL group (N = 21) and an adult ESL group (N = 28). As discussed
earlier, adult ESL learners in the current study were operationalized as those whose ages of
arrival were after the age of 16 (i.e., AOA >17; e.g., Johnson & Newport, 1989, 1991). The early
ESL learners were operationalized as those who were immersed into an English-speaking
environment from before the age of 12. The ESL learnersâ biodata and English learning
background are provided in Table 1.

Table 1. Biodata and English learning background of the ESL learners
Early ESL (N = 21)

Adult ESL (N = 28)

23.67 (5.75)

31.11 (4.98)

18 female & 10 male

7 female & 14 male

L1 background

16 Korean & 5 Chinese

20 Korean & 8 Chinese

Age of Arrival

7.43 (2.25)

26.29 (3.05)

Length of Residence (yrs)

16.44 (5.39)

4.76 (4.65)

Age
Gender

The adult ESL learners varied in their ages (range: 25 â 46 years old, M = 31.11, SD =
4.98), AOA (range: 18 â 31 years old, M = 26.29, SD = 3.05), and length of residence (LOR)
(range: 4 months â 20 years, M = 4.76 years, SD = 4.65). They were mostly graduate students
from a variety of majors enrolled in either masterâs (n = 6) and doctoral (n = 18) degree
programs at Michigan State University, with the exception of four participants: Three
participants were academic faculty teaching at the same institution, and one participant was a
recent MA graduate working at an American corporation at the time of her participation. They
had started learning English between ages seven and thirteen either as part of formal education at
school or in the form of tutoring at private institutes in their home countries (M = 11.61, SD =

48

1.59). However, none of them had extensive English immersion experiences prior to their current
residence in the United States. All the adult learners received their primary and secondary
education in their home countries. As one way to estimate their overall level of English
proficiency, the adult learners were asked through a language background questionnaire to report
any type of standardized English proficiency test scores if available (See Appendix A). Twentyfour of the 28 participants responded, providing their TOEFL scores. With individual scores
ranging from 94 to 116, the mean self-reported iBT TOEFL score of the adult learners was
104.54 (SD = 7.05), indicating that that the adult ESL learners had, by and large, high levels of
English proficiency.
The early ESL learnersâ ages of arrival ranged from two to eleven years old (M = 7.43,
SD = 2.25), presenting a significant difference from the adult ESL learners in this respect, t (47)
= 23.84, p < .001, d = 7.033. In contrast to the adult ESL learners, the early learners all received
their primary and secondary education in the United States. Their ages ranged from 18 and 39 (M
= 23.67, SD = 5.75). Fourteen participants were undergraduate students, and five participants
were graduate students enrolled in either masterâs (n = 4) or doctoral (n = 1) programs at MSU.
The remaining two participants were college graduates who were working at American
corporations at the time of testing.

49

3.2. Materials
3.2.1. English proficiency measures
Two different types of proficiency measures were used to evaluate levels of ESL
learnersâ L2 proficiency, self-rated English proficiency ratings obtained from individual learners
and a web-based measure called LexTALE (Lexical Test for Advanced Learners of English,
available from http://www.lextale.com). The details of each measure are discussed below.
Self-rated English proficiency. The L2 participants were asked to self-assess and
indicate their levels of English proficiency for each language skill, on a scale from zero (not
proficient at all) to 10 (near native-like) in the background questionnaire. The results of the two
groupsâ self-rated proficiency are summarized in Table 2.
Table 2. Self-rated English proficiency of the ESL learners for each language skill
Early ESL

Adult ESL

(N = 21)

(N = 28)

M (SD)

M (SD)

Listening

9.14 (.66)

Speaking

df

t

p

d

7.89 (.69)

47

6.440

p < .001

1.865

8.95 (.59)

7.43 (.88)

46.514

7.523

p < .001

2.031

Reading

8.76 (.83)

8.46 (.57)

47

1.481

p = .145

.421

Writing

8.52 (.60)

8.21 (.79)

47

1.502

p = .140

.442

Grammar

8.24 (.70)

8.46 (.58)

47

1.240

p = .221

.342

Overall

8.76 (.77)

8.14 (.71)

47

2.926

p = .005

.832

Overall, both groups showed fairly high proficiency ratings across different areas of English
skills. The early ESL learners tended to assess their proficiency higher than the adult ESL
learners in all language skills with an exception of the grammar part. Sets of independent-

50

samples t-test were carried out for each language skill and the overall proficiency to examine
whether there is a reliable difference between the two groups. As shown in Table 2, there were
significant differences between the two groups in Listening (p < .001), Speaking (p < .001), as
well as in overall proficiency ratings (p = .005), in that the early ESL learnersâ proficiency
ratings were significantly higher than the adult ESL learners. However, the two groups did not
differ in reading (p = .145), writing (p = .140), and grammar (p = .221).
LexTALE measure. In addition to obtaining learnersâ self-rated English proficiencies
discussed above, individual participantsâ general English proficiency was also measured
independently using the LexTALE measure (LemhĂśfer & Broersma, 2012). The LexTALE is an
untimed lexical decision task designed primarily to evaluate vocabulary knowledge of highly
advanced ESL learners. However, the test has also been found to be a good predictor of learnersâ
general English proficiency as well (LemhĂśfer & Broersma, 2012), thus allowing researchers to
use the test result as an indication of learnersâ general L2 proficiency (e.g., Declerck, LemhĂśfer,
& Grainger, 2016; Mirdamadi & De Jong, 2015; Zufferey, Mak, & Degand, 2015). The test
consisted of 3 practice and 60 vocabulary items adapted from Meara (1996): 40 items were lowfrequency English words and 20 items were non-words (see Appendix B for the list of the items).
The participants were instructed to indicate, using the computer mouse, whether each word on
the screen is an existing word in English (by clicking a âyesâ button) or not a word in English (by
clicking a ânoâ button). See Figure 1.

51

Figure 1. A screenshot of the LexTALE Test

The NS English group also took this test in order to estimate how close the levels of ESL
learnersâ English proficiency are compared to native English speakers. The summary of the
LexTALE results of the three groups is presented in Table 3.

Table 3. LexTALE scores (in percent) of the native speakers and the ESL learners
M

SD

Range

NS English (N = 24)

91.95

5.80

78.75 - 100

Early ESL (N = 21)

88.69

6.79

75 - 100

Adult ESL (N = 28)

83.90

7.18

70 â 97.5

Note. Score in % = [(No. of English words correct /40*100]) + (No. of nonwords
correct/20*100)]/2 (see, LemhĂśfer & Broersma, 2012, for the scoring method)

52

According to the test developers, the average LexTALE score of a large group of
advanced ESL learners in LemhĂśfer & Broersmaâs (2012) study was 70.7 (in percent).10 Given
this information, the two ESL groups in the current study showed fairly high LexTALE scores,
presenting the mean scores of 88.69 and 83.90, for the early and adult ESL groups, respectively.
A one-way analysis of variance (ANOVA) was performed to examine if there were any
proficiency differences between the three groups including the NS English controls. The result
showed a reliable difference among the groups, F (2, 72) = 9.700, p < .001, Îˇ2 = .217 The
follow-up Bonferroni post-hoc comparisons revealed that the scores of the adult ESL learner
group were significantly lower than both the English control group (p < .001) and the early ESL
group (p = .044). The scores of the early ESL groups were slightly lower than the NS English,
but the difference was not significant (p = .313).

3.2.2. Working memory capacity measures
Participantsâ working memory capacity (WMC) was measured using two subsets of a
battery of the automated complex WM span tests developed by Oswald, McAbee, Redick, &
Hambrick (2015). Of the three different processing modalities in the test setâthat is, operation
span, symmetry span, and reading spanâthe participants took the operation span (O-Span) and
the symmetry span (S-Span) tests. As discussed in the previous chapter, it has been a common
practice to implement a reading span test (e.g., Daneman & Carpenter, 1980) both in L1 and L2
sentence processing research, mainly because the reading span test shares the same type of
processing component (e.g., reading plausible or implausible sentences) with tasks in the
10

The current study set the LexTALE score of 70 percent as the prerequisite for the participation
to ensure learnersâ high L2 proficiency. Scores of four adult L2 learners did not meet this
requirement. Those participants received a portion of small payment and did not participate in
the rest of the tasks.
53

reading-based language processing research. The current study, however, used the two nonverbal WM span tests above for the following reasons: First, although the L2 participants in this
study were arguably high advanced ESL learners, administering a reading span test in their L2
could still present a proficiency confound at least for some learners (see Gass & Lee, 2011, for
related discussion). In other words, a lower proficiency level in English will add additional
burden in the reading, consequently affecting the size of the memory span. In this case, the
observed WM span size could be the result not only of participantsâ WMC, but also of their
lower English proficiency. Alternatively, the reading span test may be administered in
participantsâ L1s. An attempt to create word-for-word translations into different languages,
however, might inevitably cause some divergence among the three different language versions.
For example, it would be difficult to match the length of each sentence or location of the critical
area of the sentence that is directly related to the given task11. Consequently, it may be difficult
to maintain the test reliability across the different language formats. In addition, a confound with
language proficiency could feasibly work in the other direction when giving a reading span test
in L1; that is, for some early ESL learners the proficiency level of their L1 may be too low to
take the reading span test in the L1.
Taking into account the concerns discussed above, the current study used the non-verbal
O-span and S-span tests. Crucially, these tests were reported to be not only highly compatible
with the reading span test in the same test set, but also reliable and valid as a measure of WMC
(Oswald et al., 2015; see also Conway et al., 2005; Redick et al., 2012, for discussion on
reliability and validity of the WMC measures). In this respect, Conway et al. (2005) noted that
Note that depending on where the critical region that determines âplausible or implausibleâ or
âmaking sense or notâ is located in a sentence, the amount of pressures during reading may be
different. Matching the location of this spot in two different languages will be very challenging
especially considering the different word orders between English/Chinese and Korean.
11

54

although different measures of WMC are assumed to measure the same underlying construct
(i.e., WMC) reliably well, it may be dangerous to rely only on a single WM measure, given the
fact that different measures (with different processing modalities) would unavoidably tap into
different areas of test-takersâ abilities (e.g., mathematical ability for the O-span test). To
overcome such shortcomings, Conway and colleagues suggested to run more than one WM span
test and then use the composite scores on all the tasks as the measure of WMC. With this in
mind, this study used these two tests and calculated the average scores of the two to obtain more
reliable WMC measures (see also, Barrouillet & Lepine, 2005; Leeser, 2007, for empirical
studies that used the composite scores as a WMC measure).
As discussed earlier, WMC is the magnitude of the memory storage that maintains a
limited amount of information in the face of ongoing processing. Thus, the WM test consists of
two parts; the processing component, and the storage component. The details of each span test
are provided in turn.
O-span test In the O-span test, the processing component was judging whether a given
arithmetic operation is correct or incorrect, and the storage component (i.e., target to recall) was
remembering an English letter. Figure 2 illustrates a sequence of the O-span test. Participants
first received a simple math problem (left), and then they were asked to judge within a limited
amount of time whether the given answer was true or false (middle). Upon their response, they
were presented with a to-be-recalled item, which remained on the screen for 800 milliseconds
until the screen advanced to the next math problem. The test included a total of 30 operationstorage pairs, divided into 6 trials with two trials for each set size of four, five, and six. At the
end of each set, participants were presented with a response screen, in that they were asked to

55

provide the recalled items in the same order they were presented. For example, when the set size
was four, then they had to report 4 English letters in the same presentation order.

Figure 2. Processing and storage component of the operation span test

S-span test In the S-span test, the processing component was determining whether the two sides
(left and right) of a given picture is symmetrical or asymmetrical, and the storage component was
recalling a location of the red square in a 4 x 4 matrix, as illustrated in Figure 3.

Figure 3. Processing and storage component of the symmetry span test

56

The test procedure was same as the O-span test. The S-span test includes a total of 24 symmetrystorage pairs, divided into six trials with two trials for each set size of three, four, and five.
The order of the two tests was counterbalanced so as to avoid any potential test fatigue and/or
familiarity confounds. Half of the participants took the operation span test first, and the other
half took the symmetry span test first.
For both tests, participants were informed during the practice session that in order for the
researcher to be able to use their WM span scores (i.e., the number of recalled items), it is
important that they score at least 85 percent on the processing part (i.e., math problems and
symmetry judgments, respectively). This was to ensure they are indeed engaged in both
processing and recall parts, rather than rehearsing the to-be recalled items (i.e., phonological
short-term memory) without much effort on processing. By the test design, participants were
provided with their current processing score on the screen after each set so that they could keep
balance between the two parts as the test progresses. All participants showed acceptable
processing performance. The processing scores ranged from 80 to 100 percent accuracy in the Ospan test, 75 to 100 percent in the S-span test12.

12

The test development team addressed through their FAQ section on their website
(http://englelab.gatech.edu/faq.html) that they generally remove participant data when accuracy
on the processing part (e.g., math problem) is below 85 percent, although they also
acknowledged that the 85 percent threshold was an âarbitrary rule of thumbâ. They commented,
âit is not so much the actual accuracy that matters but more so if the participant was attending to
the processing trials,â In this study, there were eight participants whose math accuracies ranged
between 80.00 and 83.33 percent in the O-span test, and there were six participants whose
symmetry accuracy ranged between 75 to 83.33 percent in the S-span tests. The observation of
those participantsâ WM span scores showed relatively much lower than the averages of the group
to which they belong, suggesting that they did not take much benefit of scoring high WM span
scores at the cost of less attention to the processing part. Rather, it would be more likely that
their processing abilities (e.g., math ability) might have been slightly lower thus making the
test(s) more demanding for both parts. For this reason, I decided not to exclude their data.
57

3.2.3. Main experiment: Eye-tracking reading
3.2.3.1. Reading materials
The eye-tracking reading materials consisted of 7 practice, 28 target, and 54 filler items.
The target sentences were developed based on the materials used in Omaki and Schulzâs (2011)
self-paced reading experiment, primarily to investigate whether the early and adult ESL learners
with wh-in-situ L1 backgrounds (i.e., Chinese and Korean) in this study can make use of relevant
syntactic island constraints in L2 English, thus avoid postulating a gap within the relative clause
island structures. Each target sentence had four experimental conditions in a 2 x 2 Latin square
design, with plausibility (i.e., plausible and implausible) and island (non-island and island)
manipulation (see Appendix C for the target sentences). Four different experiment subsets were
then created with each subset including only one of the four versions of each trial, such that
individual participants received only one version of each trial. The two plausibility conditions
and the two island conditions were counterbalanced across the items in each subset, including
seven target trials for each experimental condition. Examples of the four experimental conditions
are illustrated in (27) through (30), in which the two regions in bold indicate the two critical
regions (Region1 and Region3), and the two underlined regions are the two spillover regions
(Region2 and Region4).

(27) [plausible, non-island]
The book that the journalist wrote ti fairly regularly about ti was named for an explorer.
(28) [implausible, non-island]
The city that the journalist wrote ti fairly regularly about ti was named for an explorer.

58

(29) [plausible, island]
The book that the journalist who wrote fairly regularly mentioned ti was named for
an explorer.
(30) [implausible, island]
The city that the journalist who wrote fairly regularly mentioned ti was named for
an explorer.

As shown in the examples, the sentences in the two plausibility conditions were all
identical except in the filler nouns (e.g., the book and the city) that differed for the plausibility
manipulation at the first verb wrote (i.e., wrote the book and wrote the city). The length and word
frequency of the filler nouns were matched between the plausible and implausible condition. The
length of those nouns ranged from 4 (e.g., book) to 9 (e.g., crocodile) characters, and the mean
length was matched to 6.214 characters for both plausibility conditions (SD plausible = 1.49, SD
implausible

= 1.52). The word form frequency was checked using the American English Subtitles

(SUBTLEXus) corpus data (Brysbaert & New, 2009): the mean frequency of the plausible nouns
was 76.58 per million (SD = 107.44), and the mean frequency of the implausible nouns was
73.76 (SD = 106.08) per million, which were not statistically different from one another, t (54)
= .099, p = .922. The sentences in the island condition, as in (29) and (30), included an additional
relative clause embedded in another relative clause, which is preceded by the relative pronoun
who (e.g., the book [RC that the journalist [RC who wroteâŚ]âŚ]). The relative pronoun who in this
case forms a relative clause island because the position that the relative pronoun occupies is the
only place that a to-be-raised constituent can make a legal movement. The presence of who,
however, blocks such movement, as illustrated in (4) in Chapter 2. As a result, the syntactic
representations of the sentences in the island condition do not have a silent copy of the wh-trace

59

or empty category within the embedded relative clause. In other words, the parser must not
postulate a gap at the verb wrote in (27) and (30).

3.2.3.2. Areas of Interest for analyses
First critical region (Region1) The first critical region (Region1) includes the first verb
that the parser encounters in the sentence in all four experimental conditions, which, according to
the active filler hypothesis, is the earliest structurally possible gap site where the parser can
(temporarily) retrieve the filler from the WM and make syntactic and semantic analyses of its
goodness of fit in the non-island condition in (23) and (24). In this respect, Region1 is the site
where a plausibility effect can take place by virtue of integrating the filer into the grammatical
gap and analyzing it as the object of the verb wrote at the moment of processing: One yields a
plausible interpretation (i.e., the journalist wrote the book), and the other yields an implausible
interpretation (i.e., the journalist wrote the city). The implausible interpretation obtained in (24)
may challenge the interpretive processes of the parser at the moment, likely resulting in a
plausibility effect with elevated RTs and/or more regressive eye-movement patterns at the verb
wrote in (24), compared to its counterpart in (23).
On the other hand, no such plausibility effect should be expected in the island condition
in (25) and (26), under the assumption that the parsing is guided by fully detailed syntactic
information. The reason is that the verb (i.e., wrote) is located inside another relative clause (i.e.,
[the jourrnalist [who wrote fairly regularly]âŚ]) in the island condition, and the filler cannot be
moved out of the relative clause island by the grammar, as shown earlier in (4). In other words,
postulating an object gap on the verb wrote inside the relative clause island is not possible
because there is no empty category or object trace posited within the island in its structural

60

representations. For this very reason, the verbs located in Region1 were always optionally
transitive verbs (e.g., wrote, read, advise, perform), so that the target sentences in the island
condition can eventually be grammatical. The absence of a gap postulation at the verb wrote in
(25) and (26) would consequently contribute to a no plausibility effect, with relatively
comparable reading patterns between the plausible and implausible sentences. In contrast, if the
ESL learnersâ knowledge of the movement constraint in English is deficient, and/or if they are
not capable of deploying the syntactic information during in real time for some reasons (e.g.,
lower WMC), thus relying on lexical-thematic and semantic information instead as the SSH
would predict, then they may display the plausibility effect in the island condition as well,
forming filler-gap dependencies between the filler and the verb inside the relative clause island.
Taken together, a significant 2-way interaction (i.e., plausibility x island) may be observed at
Region1 only if the participants utilize the relevant movement constraints of English at the right
moment during reading.
Second critical region

The second critical region (Region2) is where 1) the parser is

supposed to cancel its previous analysis computed in the first critical region (Region1), followed
by a filler-gap reanalysis in the case of non-island sentences, and 2) the parser creates a gap for
the first timeâassuming no gap postulations inside the relative clause islandâon the verb
mentioned in the case of island sentences. As shown above, Region2 consists of two words, the
preposition about and the adjacent auxiliary verb was in the non-island sentences, and the verb
mentioned and the same auxiliary verb was in the island sentences, in that the fillers (the book &
the city) are the object of the preposition about in the non-island condition, and is the object of
the second verb mentioned in the island condition. In the non-island condition, the parser must
initiate the reanalysis as soon as it recognizes no presence of an argument of the preposition

61

about (i.e., about ti was). In doing so, the parser first needs to cancel its previous analysis of the
filler as the object of wrote at Region1, and then take the filler as the object of about. In this
regard, Pickering and Traxler (1998) claimed, based on their L1 eye-tracking studies on
processing of garden-path sentences, that the parser may be more taxed when it has to withdraw
its earlier analysis that is more plausible, because the level of (L1) readersâ commitment to a
semantically plausible interpretation is relatively much deeper. With this in mind, the
participants may display a type of garden-path effect at Region2 about was in the plausible
condition in (23) with longer RTs and/or more regressive eye-movement patterns, compared to
its counterpart in (24). On the other hand, no such different reading pattern should be found
between the plausible and implausible sentences on mentioned was in the island condition, when
taking into account the assumption that there is no gap postulation up until Region2 (i.e., no
reanalysis).
Spillover Regions (Region2 and Region4) Previous research has shown that L2 learners
may display qualitatively similar processing patterns that are comparable to those of native
speakers of the target language, but that some expected effects such as the plausibility effect or
filler integration effect may occur with some delays in L2 processing, possibly due to learnersâ
slower and less efficient processing abilities (e.g., Dekydtspotter et al., 2006; Williams et al.,
2001). In addition, it may be possible that the participants, especially the ESL learners with
lower WMCs, may be slower or less efficient in releasing the filler information on the gap sites,
displaying somewhat delayed plausibility effect at a later region. Taking theses into account, the
two regions that come right after the two critical regions (Region1 and Region3) were also
analyzed to examine any potential spillover effects. The spillover region (Region2) that comes
immediately after the the first critical region (Region1) always included two adverbs (e.g., fairly

62

regularly) in all four experimental conditions. Another spillover region (Region4) next to
Region3 was always [passive participle + preposition, e.g., named for] for all four experimental
conditions.

3.2.3.3. Eye-tracking reading task design and procedures
The eye-tracking reading task was programmed using Experiment Builder (version 1. 10.
1630), an experiment programming software for the EyeLink 1000 Desktop-mounted system (SR
Research Ltd. http://www.sr-research.com) used in the current study. The experiment consisted
of four pseudorandomized blocks with each block including seven to twelve sentences
intermixed with the target and filler trials. The participants were able to take a short break
between the blocks, so as to minimize task fatigue. At the outset of each block, the participants
went through the calibration and validation process with the researcher to setup the camera. In
addition, drift correct was implemented before each trial appeared on the screen to ensure the
accuracy of the eye-movement recording. The participants read each sentence on a 19-inch
computer screen while the eye-tracker in front of the screen collected their eye movements on
the sentence. All sentences fit on a single line. The font type of the text was Serif and the size of
the text was 19 for all trials. The participants all had normal or corrected to normal vision at the
time of participation.
The sentences (e.g., The wall that the soldier throw quite forcefully toward was covered
with moss.) were followed by a comprehension check in the form of a true or false question, in
which the participants indicated by pressing one of the two designated buttons whether a
statement on the question screen (e.g., The wall was covered with moss) was true [green button]
or not [red button] based on their comprehension of the sentence. The inclusion of the

63

comprehension questions was to ensure that the participants paid attention to the reading (for
meaning), and also to monitor whether participants comprehended the complex target sentences
well. As mentioned earlier, the data from three participants were excluded from the analyses due
to their lack of comprehension on the target sentences. A comprehension accuracy score of 70
percent on the target sentences was set as a cutoff. Two adult ESL participants who scored 57.14
and 60.71 percent respectively were thus removed from the analyses. In addition to checking the
comprehension scores, the researcher conducted a brief interview with each participant after they
completed all required tasks. The participants were presented with four target sentences they had
read during the reading task, with one sentence for each experimental condition. They were
asked to paraphrase those sentences verbally and explain to the researcher how they
comprehended those sentences during the task. Two adult ESL participants (one was the same
participant whose comprehension score was below 70%) reported that they could not understand
the sentences well most of the time, especially those in the island condition. Both participants
mentioned that they considered them as not grammatical or typos and just tried to somehow
catch the meaning to answer the questions. The data from those participants were also excluded
from the analyses.

64

3.2.3.4. Eye-tracking dependent variables
The eye-movement measures examined in the current study are as follows: first fixation
duration, first-pass reading time, and first-pass regression, regression path duration, and total
reading times. An illustration of eye-movements during reading is provided in Figure 4.

Figure 4. An illustration of eye-movements during reading
Each circle presents a fixation and the numbers inside indicate the order of the fixation
occurrences. Note that in an actual recording, the fixations are marked on the text in normal
circumstances. The fixations in the example above were placed below the text intentionally for
demonstration purposes.

First fixation duration First fixation duration refers to the duration of the first entered
fixation in an interest area (or word), provided that there is no fixation in later regions marked
prior to the current first fixation (i.e., the first fixation during first-pass). For example, the first
fixation at Region3 in Figure 4 is âĄ, but the first fixation duration at Region2 is not the duration
of âŁ, but zero (also referred to as âskipâ), because there are fixations in a later region before âŁ

65

is fixated13 (i.e., âĄ & â˘ at Region3).
First-pass reading time (First-pass RT) First-pass reading time (also referred to as gaze
duration, especially when an interest area consists of a single word; Roberts & SiyanovaChanturia, 2013) is the sum of all eye fixations in an interest area, from its first entrance until the
eye leaves the interest area in any direction either to the left (regressive) or right (progressive)
boundary of the area, provided that, like first fixation duration, there is no fixation in later
regions recorded before the first entering fixation in the current interest area (i.e., the first
fixation during first-pass). Thus, the first-pass reading time at Region3 includes the fixations of
âĄ and â˘.
First-pass regression (probability) First-pass regression is defined as the percentage of
regressive eye movements from the interest area to a preceding area that occur during the firstpass reading. Unlike the other eye-tracking measures that provide processing time course
measures (in milliseconds), the first-pass regression provides binary data, in that it gets â1â if
there was a regressive movement out of the area during first-pass, and it receives â0â for no
regression during first-pass. At Region3, there is a backward movement from â˘ to the âŁ at
Region2, consequently, the first-pass regression probability will get one point in this area.
Regressive eye-movements during reading often indicate some processing difficulties at the
moment, for example, in associating currently processed unit with previous parts of the sentence
(Clifton et al., 2007; Vasishth & Drenhaus, 2011).
Regression path duration Regression path duration refers to the sum of all fixations recorded
from its first entrance to an interest area up until the eye exits the right boundary of the interest

13

In this case, the first fixation duration at Region2 is zero, and the sum of âŁ and â¤ will be
recorded as a second pass reading time, not first-pass reading time at this region.
66

area (i.e., progressive eye-movements passing the interest area). When there is a regressive eye
movement out of the current interest area during the first-pass reading (i.e., first-pass regression
= 1), regression path duration also includes the time spent at earlier regions on the left side (for
left to right reading as in English) after the regression is initiated. Consequently, the regression
path duration at Region3 is the sum of the fixations of âĄ, â˘, âŁ, â¤, âĽ, and âŚ. When the
region involves no regression, the regression path duration is as same as the first-pass reading
time.
Total reading time (Total RT) Total reading time (also referred to as total duration) is the
sum of all fixations recorded within an interest area, indicating how much total time a reader
spent at the region during the entire course of reading (e.g., âĄ + â˘ + âĽ + âŚ at Region3).
which is generally considered as a late measure that may reflect readersâ later processes related
to text comprehension and information reanalysis during later stages of processing, and recovery
from misanalysis and/or reanalysis (Clifton, et al., 2007; Roberts, et al., 2012).
Of the five eye-tracking measures discussed above, the first three measures, namely first
fixation duration, first-pass reading time, and first-pass regression, are generally considered to
reflect readersâ early stage of processing, likely at the level of morphology (e.g., lexical access)
and syntax (e.g., integration of words into phrases). The regression path duration and total
reading time are known to index readersâ processes at later stages of processing, related to text
comprehension and information reanalysis, and a recovery from misanalysis/reanalysis.

67

3.3. Overall procedures
All data were collected in a laboratory equipped with the EyeLink 1000 Desktopmounted system. Individual participants attended a single 40 - 60 minute session. Upon arrival,
participants completed the following tasks in this order: [1] LexTALE proficiency test, [2] eyetracking reading task, [3] one set of the WM span test, [4] background questionnaire, [4] the
remaining WM span test set, and [5] a brief oral interview with the researcher for a
comprehension check. All participants were paid 20 US dollars for their participation.

3.4. Data Analysis
3.4.1. Preparation of the data for analyses
Eye-tracking data trimming In preparation of the online reading data for analyses, any
fixations that were shorter than 80 milliseconds were automatically filtered out before extracting
the data set from the eye-tracking system14. The data where the participants skipped the area
either during the first-pass reading or during the entire reading process (i.e., no fixation recorded
on an area of interest) were replaced with missing values. Additionally, for each measure (except
the first-pass regression), RTs that were beyond the 2.5 SDs from individual participantsâ mean
RTs on the same region were also replaced with missing values, which all together affected 2.44
percent of the entire data (approximately 3.37% of the control group, 2.09% of the early ESL,

14

Note that it is a common practice in eye-tracking reading research to eliminate extremely short
(generally < 80ms) or long (generally > 800ms) fixations as they are considered as noise rather
than a reflection of readersâ cognitive processes (see Rayner and Pollatsek, 1989). These
thresholds have been applied in many L1 reading research, and recently carried over to L2
reading research. However, applying the same thresholds used in L1-based research to L2
reading study may be potentially problematic, especially the threshold for the longer fixation,
considering L2 learnersâ generally slower reading speed. Furthermore, the target sentences used
in this study were all structurally complex even for native speakers. For these reasons, the
removal of fixations larger than 800ms was not performed in this study.
68

and 1.90% of the adult ESL group data). After those data trimming processes, individual
participantsâ mean RTs and the first-pass regression probability ratios (for by-subjects analysis:
F1), and mean RTs and first-pass regression ratios on each target sentence (for by-items analysis:
F2) were calculated for each measure across the interest areas for the main analyses.
An initial inspection of the raw eye-tracking reading data showed that they were not
normally distributed for the most measures across the interest areas, displaying a range of
(mostly positive) skewness across the experimental conditions and groups. Therefore, a log
transformation was performed on each measure to correct this issue. The transformed data
largely met the normality and the homogeneity of variance assumptions for ANOVA15.
However, the first-pass regression data (both F1 and F2) showed a wide range of violations
across the regions. The regression path duration and total reading time data at Region3 were also
found to violate both the normality and equal variances assumptions across the experimental
conditions and groups. To resolve this problem, the following analyses were used alternatively:
For the analysis of the first-pass regression data, a logistic random effects regression model with
the option of the robust covariance matrix estimation was performed using the raw binary
regression data (1= regression, 0 = no regression). In this model, both subjects and items were
taken as random effects. For the analysis of the regression path duration and total reading time
data at Region3, a set of nonparametric Wilcoxon signed ranks tests were used instead for each
group separately, for both by-subjects and by-item analyses. Lastly, the Greenhouse-Geisser
correction was applied in case when the sphericity test showed a violation (Field, 2009).

15

Normality was tested using the Shapiro-Wilk goodness-of-fit test (Larson-Hall, 2010)
supplemented with the normal Q-Q plots. Homogeneity of variance was checked using Leveneâs
test.
69

Working memory span scores For the scoring of the WM capacity of individual
participants, the current study adopted a partial-credit loading scoring method, which is one of
the most widely used scoring methods for span measures (Conway et al., 2005; see also Juffs &
Harrington, 2011 for some other scoring methods used in L2 research). A partial-credit loading
score calculates the sum of all correctly recalled items in the right order across the trials.
Consequently, the maximum raw score was 30 for the O-span and 24 for the S-span test. An
initial correlation analysis was performed to examine how reliable participantsâ performance was
in the two tests. The result showed a significant moderate to strong correlation between the two
tests, r (71) = .558, p < .01. Individual participantsâ two span test scores were inspected before
calculating the composite scores for the main analyses. There were two participants (one early
ESL and one adult ESL) who showed extremely contrasting results between the two testsâ
100% in the O-span, but 33.33 % in the S-span, and similarly 100 % in the O-span, and 54.17%
in the S-spanâthus making the results rather unreliable. The WM data of those learners were
replaced with missing values. A subsequent correlation analysis was performed again after
excluding the two participants. The results revealed a strong positive correlation between the two
tests, r (69) = .727, p < .01, reflecting a high reliability between the two tests (cronbachâs alpha
= .840). Such high reliability also provided a reasonable basis for using the composite WM span
scores. The summary of the results of the two WM-span tests are provided in Table 4.

70

Table 4. Summary of the WM span test results in percent
O-span

S-span

M (SD)

Range

M (SD)

Range

NS English (N = 24)

75.97 (15.42)

43.33 - 100

71.18 (14.32)

41.67 â 100

Early ESL (N = 20)

74.83 (13.40)

50 - 93.33

73.75 (12.32)

54.17 â 91.67

Adult ESL (N = 27)

78.27 (14.86)

50 - 100

77. 47(12.62)

41.67 - 100

Overall, the adult ESL group presented slightly higher WM span scores than the other
two groups on both tests. The comparison of the three groups with a pair of ANOVAs for each
measure, however, showed no significant differences among the groups, F (2, 70) = .342, p
= .712, Îˇp2 = .010, in the O- span, and F (2, 70) = 1.479, p = .235, Îˇp2 = .042, in the S-span test,
indicating that the three groups did not statistically differ from one another with respect to their
level of working memory capacities (WMCs). To minimize multicollinearity (Marquardt, 1980),
each span score of individual participants was standardized (i.e., Z-score), and then the two Zscores were averaged to obtain the composite WM span scores (hereafter, WM span scores) for
each participant (e.g., Barrouillet & Lepine, 2005; Leeser, 2007).

3.4.2. Main Statistical analyses
Effect of age of immersion

To investigate the extent to which the reading behaviors by

the early and adult ESL groups converge on or diverge from those by the native English
speakers, statistical analyses of participantsâ eye-movements were conducted on the four regions
of interest, namely the earliest gap (Region1) and the following spillover region (Region2), and
the ultimate gap (Region3) and the following spillover region (Region4), respectively. The

71

analyses included both by-subjects (F1) and by-items (F2) analyses. As a preliminary step, a 3way (3 x 2 x 2) mixed design ANOVA was carried out for each time-course eye-tracking
dependent measure (first fixation duration, first-pass RT, regression path duration, and Total RT)
at each interest area, with group as the between-subject variable and the two item conditionsâ
plausibility and island constraintsâas the within-subject variables. As addressed above,
participantsâ mean first-pass regression data were not appropriate for ANOVA analysis. Thus, a
mixed effects logistic regression analysis was used to model the raw binary outcome variables
(1= regression, 0 = no regression), with group, plausibility, and island constraint as fixed effects,
and subjects and items as random effects. When this preliminary analysis presented any
significant group related interactions, then a 2 (plausibility) x 2 (island constraints) repeated
measures ANOVAs for reading time measures, and a mixed effects logistic regression model for
the first-pass regression measure, were carried out separately for each group, in order to better
interpret the significant interactions and get a clearer picture of different reading patterns among
the three groups. Lastly, when the follow-up analysis of each group displayed a significant
interaction between the two factors (i.e., island constraints and plausibility), subsequent planned
paired sample t-tests were performed for each island condition to further examine how
plausibility functioned differently across the two island conditions.
WMC effect To examine the effect of different working memory capacities of
individuals on their processing of filler-gap dependency constructions, a series of repeated
measures analysis of covariance (ANCOVAs) was performed for each time course eye-tracking
dependent variable, separately for each group; with the two item conditionsâplausibility and
island constraintsâas the within-subject variables, and the WM span scores as a continuous
covariate. For the analysis of first-pass regression probability, the logistic random effects

72

regression model was applied as before, with the two item conditionsâplausibility and island
constraintsâand the WM span scores as fixed effects, and subjects as a random effect.
The results were reported when there was a significant WM main effect and/or a
significant WM-related interaction. In interpreting the results of the reading time data analysis,
the parameter estimates (beta coefficients, Î˛Ě) for the WM span scores were examined across the
four experimental conditions to identify the directions of the relationship between the WM span
scores and the dependent measures. When a value of the beta coefficient is positive (i.e., Î˛Ě > 0),
this means that the outcome variable increases by the amount of the Î˛Ě as the standardized
composite WM span score increases by one unit. On the other hand, a negative coefficient values
(i.e., Î˛Ě < 0) indicates that the reading time decreases by the amount of the Î˛Ě as a function of one
unit increase in the WM span score. Generally, the beta coefficient indicates an approximate
amount of a predicted change in the unit of the dependent measure (e.g., milliseconds in case of
reading times). However, because all the dependent measures in the current study were logtransformed as discussed earlier, the obtained beta coefficients presented the approximate
Ě

amount of changes in percentage (10Î˛ Ă100%). As a result, a beta coefficient value that is larger
than 1 (Î˛Ě > 1) indicates a positive relationship and a beta coefficient value that is less than 1 (Î˛Ě <
1) indicates a negative relationship.
In interpreting the results of the first-pass regression analysis, an odds ratio (OR) was
calculated to diagnose the relationship between WM span and first-pass regression. An OR is an
âindicator of the change in odds resulting from a unit change in the predictorâ Field (2009, pp.
270). Thus, it provides the probability of the occurrence of making a first-pass regression when
the WM span score decreases or increases by one unit. Similar to the beta coefficient, a positive
OR (i.e., OR > 1) signals an increase in probability of making first-pass regressions by the

73

amount of the OR value as the WM span scores are increased by one unit, and a negative OR
(i.e., OR < 1) indicates a decrease in probability of making first-pass regressions as the WM span
score decreases. Note, however, the information that the beta coefficients and the ORs provide
(i.e., the relationships between the dependent measures and the WM span as the predictor) is
limited only to each experimental condition, thus making it rather difficult to interpret
interactions of the WM span scores and the other factors in some cases. For example, if the beta
coefficient for the WM span scores in the [non-island, implausible] condition is positive (e.g.,
Î˛Ě = 1.5), and if it is larger than the positive Î˛Ě value found in the [non-island, plausible]
condition (e.g., Î˛Ě = 1.25), it does not necessarily entail that higher WM participantsâ reading
time in the [non-island, implausible] condition was longer than their own reading time in the
plausible counterpart, because what Î˛Ě and OR indicate is relative amount of changes between
higher WM participants and lower WM participants in the same experimental condition.
Consequently, depending on what the lower WM participantâs reading times in the two
experimental conditions were, reading patterns of the higher WM participants in this example
could go either direction. Taking into account this shortcoming in interpreting interactions
among the factors, in cases when the analysis of the parameter estimates and ORs for the WM
span scores did not provide enough information to interpret a significant WM associated
interactions, the group was divided into two subgroups based on their WM span scores, namely
the higher WM and lower WM subgroups, and the descriptive statistics for those two subgroups
were created with their raw data to supplement the analysis.

74

CHAPTER 4. RESULTS
4.1. Comprehension Accuracy
The comprehension accuracy of the NS of English and the ESL learners are summarized
in Table 5. The total comprehension score includes participantsâ responses on both the filler and
target sentences, and the target comprehension score includes only the responses on the target
sentences.
Table 5. Mean comprehension accuracy in percent in the reading task
Total comprehension
M (SD)

Range

Target comprehension
M (SD)

Range

NS English (N = 24)

92.21 (3.80)

84.21 â 98.68

90.62 (6.39)

75.75 â 100.00

Early ESL (N = 21)

90.65 (3.85)

80.00 â 96.05

88.77 (5.09)

82.14 â 100.00

Adult ESL (N = 28)

87.04 (5.39)

73.08 â 96.05

86.35 (7.65)

71.43 â 100.00

Note. The accuracy scores were all rounded off to two decimal digits.

Overall, both the NS English group and the two ESL groups showed high rates of accuracy on
both accuracy measures, with the mean accuracy scores of the three groups ranging from 87.04
(Adult ESL) to 92.21 (NS English) percent in the total comprehension, and from 86.35 (Adult
ESL) to 90.62 (NS English) percent in the target comprehension. This indicated that the
participants paid attention to the reading, and they were able to understand the structurally
complex target sentences correctly. A one-way ANOVA was run on each comprehension
measure to examine if there was any significant difference among the groups. The analysis of the
total comprehension scores showed a reliable difference among the three groups, F (2, 70) =
9.133, p < .001, Îˇ2 = .208. The following Bonferroni post hoc analysis revealed that total
comprehension score of the adult ESL group was significantly lower than both the early ESL
75

group (p < .021), and the NS English control group (p < .001). The early ESL learnersâ overall
accuracy was slightly lower than the native English speakers, but the two groups were not
statistically different from one another (p = .745). The target comprehension scores of the adult
ESL learners were also slightly lower than the other two groups, but the result showed no
significant difference among the groups, F (2, 70) = 2.752, p = .071, Îˇ2 = .073.

4.2. Overview of reading profiles
Prior to the main analysis, a series of fixation duration-based heatmaps were created with
the raw fixation data16 for each target structure and for each group, in order to review overall
reading profiles of the three groups on the target sentences. Figure 5, Figure 6, and Figure 7
provide a set of heatmaps of the NS English group, early ESL group, and the adult ESL group,
respectively. Each map reflects participantsâ aggregated fixations recorded on the same target
structure type across the trials (e.g., all trials in the non-island, plausible condition), and the text
included in the map is one of the trials selected from a group of the same structure. In order to
make comparisons possible across the experimental conditions and across the groups, the same
maximum fixation value of 1200 milliseconds was applied across the maps (see the legend on
the right bottom), so that the same color schemes could be applied to reflect the same fixation
durations17. In the heatmaps, a more reddish color represents more aggregated fixation durations
on a spot, which may indicate more processing burden during reading18.
The trimmed reading time data, specifically the participantsâ raw reading times that were over
2.5 SDs from their mean RTs, are not included in the maps.
17
When the peak fixation values (i.e., a largest single fixation) between maps differ, different
colors are used to present the fixation duration, making it difficult to make a direct comparison
between maps across the experimental conditions (see EyeLink Data Viewer Userâs manual for
more information).
18
Note that the number of participants differed between the groups (24 NS English, 21 early
ESL, and 28 adult ESL), meaning that the number of trials reflected in the maps was different
16

76

Figure 5. Fixation map: Reading profiles of the NS English speakers

As shown in Figure 5, the NS English group displayed slightly more aggregated fixations
at the early gap Region1, the verb wrote in the [non-island, implausible] condition, compared to
its counterpart in the [non-island, plausible] condition. Such reading pattern was more clearly
shown in the two ESL groups in Figure 6 and Figure 7, in that both the early and adult ESL
learners tended to spend more time when encountering implausible interpretations (i.e., wrote the
city), indicating greater processing difficulties at this point over the course of reading.

among the groups, with the adult ESL group including the most (n = 196 trials per structure) and
the early ESL group including the least trials (n =148 trials per structure). Different sample size
of the groups therefore must be taken into account when comparing the maps between the
groups, although this should not be the case when comparing the maps of different structures
within the group.
77

Figure 6. Fixation map: Reading profiles of the early ESL learners

Figure 7. Fixation map: Reading profiles of the adult ESL learners

78

At Region3, about was, a reverse reading pattern was observed for all groups in the nonisland condition. The native English speakers seemed to have spent slightly, but visibly more
time at this region in the plausible than in the implausible sentences. The two ESL groups
showed a similar pattern, but the differences between the two plausibility conditions were shown
to be much greater for both learner groups. As discussed, Region3 is the ultimate gap position
where the parser must withdraw its initially established dependency analysis at Region1 (i.e.,
wrote the book for plausible, and wrote the city for implausible), followed by an immediate
reanalysis of relocating the filler as the object of the preposition about when reading sentences in
the non-island condition. This reanalysis can be more taxing, especially when the initial analysis
bears a more plausible interpretation (i.e., wrote the book as opposed to wrote the city). Taking
this into account, the overall reading profiles of the three groups at Region3 appeared to be in
line with this account. In the island condition, more reddish colors in both plausible and
implausible conditions suggest that all groups tended to have more difficulties to digest
structurally more complex sentences, as shown by more aggregated fixation durations across the
regions, compared to their reading profiles in the non-island condition. However, the comparison
of the two fixation maps in the island condition seemed to suggest that plausibility effects
observed at Region1 and Region3 in the non-island condition did not seem to be present, or
relatively weaker for all groups at the least. Note that the plausibility effects in the non-island
condition are the byproducts of the fillerâs attempt to fill the gap as soon as possible. However, at
the same verb, wrote, the parser may not postulate a gap in the island condition (only if it
respects the island constraints), thereby yielding no plausible or implausible interpretations. In
the same vein, there should be no reanalysis effect at Region3 as well in the island condition,
because the verb mentioned is the structurally earliest gap available for the parser in this case.

79

Consequently, Region3 should be the place where an initial gap postulation should occur, not a
reanalysis. Given that an integration of the filler (either the book or the city) as the object of
mentioned does not render any semantically diverging manipulation at this region (i.e., no
plausibility effect, and both should be plausible), reading profiles between the two plausibility
conditions at this region should be comparable to one another in the case of the island condition.
The observation of the overall reading profiles across the experimental conditions
indicated that the reading patterns of the advanced early and adult ESL learners in the current
study were similar to those of the NS of English. However, it does not necessarily mean that the
learners utilized the same types of linguistic information as the native English speakers for
comprehension. Note that the maps included all fixations recorded over the course of the entire
reading. For that reason, although the heatmaps could help identify the sources of the spots that
make differences across the experimental conditions, it is still not clear whether some effects
(e.g., plausibility effect) observed in the maps are lingering effects that started from an early
stage of processing, or effects that occurred at later stages of processing. As discussed earlier,
operations of syntactic information such as the island constraints are generally considered to
occur at an earlier stage of processing. Therefore, it is necessary to examine more detailed time
course reading processes, from an earlier point to later stages of processing, to better understand
what kinds of parsing mechanisms and processing strategies learners computed to comprehend
the L2 input. The next sections attempted to look into this aspect more in detail by analyzing
multiple fine-grained eye-movement data, specifically at the aforementioned two critical regions.
Region1 and Region3), and the two spillover regions (Region2 and Region4).

80

4.3. The effect of age of immersion on L2 processing of filler-gap dependencies
4.3.1. Active filler strategy and application of island constraints: Initial gap
4.3.1.1. Analysis of reading patters at the first critical region (Region1)
Descriptive statistics of the three groupsâ RTs and first-pass regression probability at
Region1 across the four experimental conditions are summarized in Table 6.

Table 6. Descriptive statistics for RTs in and first-pass regressions in percent at Region1

island
factor

NS

plausibility
factor

FFD

F-pass RT

RPD

Total RT

REGR

M (SD)

M (SD)

M (SD)

M (SD)

M (SD)

non-

plausible

240 (42)

300 (90)

337 (105)

684 (235)

.08 (.09)

island

implausible

281 (56)

379 (143)

487 (215)

822 (344)

.21 (.12)

plausible

240 (50)

309 (138)

475 (247)

888 (257)

.21 (.15)

implausible

231 (43)

300 (108)

442 (230)

844 (260)

.20 (.18)

non-

plausible

258 (44)

443 (199)

593 (229)

944 (389)

.15 (.14)

island

implausible

307 (67)

554 (248)

816 (331)

1268 (582) .30 (.16)

plausible

275 (46)

420 (142)

829 (391)

1122 (319) .33 (.18)

implausible

273 (51)

423 (161)

784 (344)

1095 (270) .31 (.16)

non-

plausible

285 (52)

513 (196)

680 (288)

1064 (458) .15 (.12)

island

implausible

314 (78)

629 (306)

1160 (716)

1601 (664) .30 (.26)

plausible

280 (41)

481 (174)

1079 (709)

1326 (488) .33 (.24)

implausible

294 (50)

514 (217)

1145 (602)

1385 (432) .37 (.26)

English
island

Early
ESL

island

Adult
ESL

island
Note. FFD = first fixation duration, F-pass RT = first-pass RT, REGR = First-pass regression,
RPD = regression path duration.

81

As shown, all groups showed increased RTs in the [non-island, implausible] condition for all
eye-tracking measures, compared to its plausible counterpart. This means that the participants
attempted to fill the gap at this early region, thereby experiencing difficulties in dealing with
implausible interpretations. In this case, the adult ESL group showed a largest plausibility effect,
as measured by regression path duration (plausible: 680ms; implausible: 1160ms) and Total RTs
(plausible: 1064ms; implausible: 1601ms). The early ESL learners also exhibited strong
plausibility effects on those same measures, although the differences between the plausibility
conditions were not as much as those of the adult ESL group. The first-pass regression in the
non-island condition also patterned similarly between the plausible and implausible condition
when there was no island structure (i.e., non-island condition), in that all three groups made more
regressions as soon as they encountered implausible interpretations. Somewhat different patterns
emerged between the adult ESL group and the other two groups in the island condition; the NS
English and the early ESL groups presented slightly longer RTs and more regressions in the
plausible condition. On the other hand, the adult ESL group showed, albeit marginal, increased
RTs and more regressions in the implausible condition, similarly to their reading patterns in the
non-island condition. Preliminary 3 (group) x 2 (island) x 2 (plausibility) mixed design
ANOVAs were carried out separately for each dependent measure to examine if there were any
differences in reading patterns among the groups across the experimental conditions. A summary
of the inferential statistics is provided in Table 7.

82

Table 7. Summary of the results of preliminary analyses at Region1

df
I
P
G
first
IxG
fixation
duration
PxG
IxP
IxPxG
I
P
G
first-pass
IxG
RT
PxG
IxP
IxPxG
I
P
G
regression
IxG
path
duration
PxG
IxP
IxPxG
I
P
G
IxG
Total RT
PxG
IxP
IxPxG

first-pass
regression

1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70

I
P
G
IxG
PxG
IxP
IxPxG

by-subject (F1)
f
p
5.877
25.119
10.160
1.595
.191
14.905
3.744
19.665
16.834
13.879
.875
.003
14.837
.980
9.087
27.431
17.076
.717
1.602
31.716
.709
10.409
30.188
12.687
.997
2.906
34.063
1.433

.018
.001
.001*
.210
.826
.001*
.029
.001*
.001*
.001*
.888
.997
.001*
.380
.004
.001*
.001*
.492
.209
.001*
.495
.002
.000*
.000*
.374
.061
.000*
.245

Îˇp2

df

.078
.264
.225
.044
.005
.176
.097
.219
.194
.284
.024
.001
.175
.027
.115
.282
.328
.024
.044
.312
.020
.129
.301
.266
.028
.077
.327
.039

1, 27
1, 27
1.6, 44.5
2, 54
2, 54
1, 27
2, 54
1, 27
1, 27
2. 54
2, 54
2, 54
1, 27
2, 54
1, 27
1, 27
2. 54
2, 54
2, 54
1, 27
2, 54
1, 27
1, 27
2. 54
2, 54
2, 54
1, 27
2, 54

df

f

p

1, 2032
1, 2032
2, 2032
2, 2032
2, 2032
1, 2032
2, 2032

29.077
28.077
6.323
.255
.320
33.764
.629

.001*
.001*
.002
.775
.726
.001*
.533

by-item (F2)
f
p
8.510
9.827
22.106
.996
.215
14.872
2.432
14.257
13.256
65.768
1.881
.615
13.185
.567
12.092
27.034
188.302
1.248
1.836
33.658
.012
9.058
19.196
130.396
2.066
3.027
23.557
.817

.017
.004
.001*
.376
.807
.001
.097
.001
.001
.001*
.162
.544
.001
.571
.002
.001*
.001*
.295
.169
.001*
.404
.006
.001*
.001*
.075
.057
.001*
.447

Notes. 1. I = island constraints factor, P = plausibility factor, G = group. .001* = p < .001.
2. The first-pass regression analyses took into account both the subjects & items as
random factors.

83

Îˇp2
.238
.267
.450
.036
.008
.355
.083
.346
.329
.709
.065
.022
.328
.021
.309
.500
.875
.044
.064
.555
.069
.251
.416
.828
.102
.101
.466
.029

The results of the preliminary analyses showed that for all dependent measures in both bysubjects (F1) and by-items (F2) analyses, there was a significant main effect of island condition,
likely due to elevated RTs (first fixation duration & first-pass RT) and increased regression ratios
reading in the non-island condition, and because of the generally longer RTs in the island
condition (both plausible and implausible) for regression path duration and Total RT. There was
also a significant main effect of plausibility, presumably because of longer RTs in the
[implausible, non-island] condition, and a significant group effect, arguably due to significantly
faster reading speed of the NS English group than the ESL learners19. A significant plausibility
by island interaction was also observed in both F1 and F2 analyses across the measures, reflecting
a strong plausibility effect (i.e., longer RTs in the implausible than in the plausible condition)
that was restricted to the non-island condition. However, no group related interaction was found,
except in the first fixation duration, and marginally in the Total RTs (p1 = .061, p2 = .076). For
the first fixation duration, a significant 3-way interaction was found in the F1 analysis; F1 (2, 70)
= 3.744, p = .029, Îˇp2= .097, and marginally in the F2 analysis; F2 (2, 54) = 2.432, p = .097,
Îˇp2= .083. The reading patterns of the three groups at Region 1 are plotted in Figure 8 (first
fixation duration & first-pass RT) and Figure 9 (regression path duration & Total RT).

Because native speakersâ reading speed is generally much faster than nonnative speakers (see
e.g., Juffs, 2005), an observation of a significant main group effect is not surprising and less
informative. Therefore, a main group effect will not be further addressed in the discussion of the
results, but all the results are provided in the summary of the preliminary analyses in the tables.
19

84

Figure 8. Reading patterns of the three groups during early stages of processing at Region1

85

Figure 9. Reading patterns of the three groups during late stages of processing at Region1

86

As the preliminary analyses revealed a group-related interaction on the first fixation
duration, a series of follow-up 2 x 2 repeated measures ANOVAs were performed for each group
separately to better identify the source of the significant interaction. First, the NS English group
showed a significant island effect, F1 (1, 23) = 7.753, p = .011, Îˇp2= .252; F2 (1, 27) = 10.146, p
= .004, Îˇp2= .273, a significant plausibility effect in the F1 analysis, F1 (1, 23) = 7.784, p = .010,
Îˇp2= .253, and marginally in the F2 analysis, F2 (1, 27) = 3.698, p = .065, Îˇp2= .120, and,
crucially, a reliable island by plausibility interaction, F1 (1, 23) = 11.631, p = .002, Îˇp2= .336,
power = .904; F2 (1, 27) = 15.572, p = .001, Îˇp2= .366. The source of such significant main
effects and interaction for the NS English group should be attributed to the plausibility effect
(i.e., RT discrepancies between the plausible and implausible condition) that was restricted only
to the island condition. Subsequent planned paired sample t-tests confirmed this account. The
mean RTs of the NS English group in the implausible condition were significantly longer than
those in the plausible condition, only within the non-island condition: non-island: [t1 (23) =
5.484, p < .001, d = .845; t2 (27) = 4.663, p < .001, d = 1.872]; island: [t1 (23) = .785, p = .440, d
= .161; t2 (27) = .796, p =.433, d = .229].
The early ESL group showed no reliable RT differences across the island conditions, F1
(1, 20) = .243, p = .627, Îˇp2= .012; F2 (1, 27) = .941, p = .341, Îˇp2= .034; although they also
showed a plausibility effect in the non-island condition, their RTs in the island condition were
overall high, offsetting the longer RTs in the [non-island, implausible] condition. A main
plausibility effect was found only in the F1 analysis, F1 (1, 20) = 10.526, p = .004, Îˇp2= .345; F2
(1, 27) = 2.776, p = .107, Îˇp2= .093. Importantly, however, the early ESL group also displayed a
significant plausibility by island interaction in both F1 & F2 analyses, F1 (1, 20) = 11.194, p
= .003, Îˇp2= .359; F2 (1, 27) = 5.743, p = .024, Îˇp2= .175. The following planned paired t-tests

87

presented, like the NS English group, a clear plausibility effect only in the non-island condition;
non-island: [t1 (20) = 4.264, p < .001, d = .866; t2 (27) = 2.478, p = .020, d = .807]; island: [t1
(23) = .280, p = .782, d = .030: t2 (27) = .269, p = .790, d = .069].
The adult ESL group, like the early ESL group, also showed no significant main effect of
island constraints, F1 (1, 23) = .975, p = .332, Îˇp2= .035; F2 (1, 27) = 2.935, p = .098, Îˇp2= .098.
A significant plausibility effect was found in both F1 and F2 analyses, F1 (1, 23) = 7.964, p
= .009, Îˇp2= .235; F2 (1, 27) = 8.610, p = .007, Îˇp2= .242, presumably due to increased RTs on
the implausible sentences not only in the non-island condition, but also marginally in the island
condition. This rendered no significant interaction of the two factors, F1 (1, 23) = .231, p = .634,
Îˇp2= .008; F2 (1, 27) = .039, p = .846, Îˇp2= .001, in contrast to the other two groups (see Figure
8).

88

4.3.1.2. Analysis of reading patters at the spillover region (Region2)
Descriptive statistics of the three groupsâ RTs and first-pass regression probabilities across
the four experimental conditions at the spillover region (Region2) are given in Table 8.

Table 8. Descriptive statistics for RTs and first-pass regressions at Region2

NS
English

Total RT
M (SD)
1117 (418)

REGR
M (SD)
.15 (.14)

implausible 277 (52) 495 (122)

714 (179)

1126 (385)

.21 (.12)

248 (43) 473 (146)

729 (204)

1589 (541)

.23 (.18)

implausible 255 (51) 442 (152)

694 (236)

1445 (516)

.20 (.16)

808 (286)

1703 (832)

.14 (.14)

implausible 289 (46) 657 (145) 1198 (425)

1610 (633)

.29 (.16)

nonisland

plausible

nonisland
island

Adult
ESL

RPD
M (SD)
646 (221)

Plausibility
Cond.

island

Early
ESL

FFD
F-pass RT
M (SD)
M (SD)
253 (48) 520 (159)

Island
Cond.

nonisland

plausible
plausible
plausible

280 (46) 668 (218)

983 (365)

1576 (472)

.22 (.19)

implausible 263 (36) 625 (184)

966 (316)

1563 (389)

.24 (.13)

908 (292)

1839 (690)

.13 (.15)

implausible 300 (65) 802 (222) 1519 (631)

1934 (661)

.42 (.25)

279 (51) 753 (199) 1115 (407)

1850 (500)

.21 (.17)

implausible 277 (44) 764 (215) 1156 (405)

1827 (390)

.24 (.17)

plausible
plausible

island

266 (49) 589 (131)

272 (39) 712 (147)

Overall, the RT patterns of the three groups across the four experimental conditions at Region2
were somewhat similar to the critical region (Region1), but some differences emerged among the
groups, especially in the non-island condition. For the NS English group, they tended to spend
slightly more time reading implausible sentences than in reading plausible sentences, as
measured by first fixation duration and regression path duration, indicating a plausibility effect at
least to a certain degree. However, the size of the effect looked to be slightly decreased across
those measures (i.e., smaller reading time differences between the two plausibility conditions),
compared to the effects they exhibited at the previous region. The first-pass RT, first-pass

89

regression, and Total RT of the NS group showed neither a plausibility effect nor a reliable
interaction of plausibility and island constraints, there was no plausibility effect in either island
condition; the reading patterns of the NS English group in the island condition remained the
same as those at Region1, without much difference between the two plausibility conditions.
The early ESL group, by and large, patterned similar to the NS English group across the
measures, but their RT patterns in the non-island condition displayed relatively clearer spillover
effects for regression path duration and first-pass regression as shown by longer RTs and more
regressions in the implausible than in the plausible condition. This trend was also shown in the
adult ESL group for those measures (i.e., regression path duration & first-pass regression), in that
the plausibility effect they displayed in the non-island condition even appeared to be greater than
the effects they had at the critical region, with slowdowns in RT and more regressions to greater
degrees in dealing with implausible interpretations. However, both ESL groups, like the NS
English group, did not show such plausibility effect in the island condition across the measures.
Recall that the adult ESL group showed similar reading patterns between the two island
conditions at Region1, with slightly longer RTs in reading of implausible sentences in the island
conditions. At Region2, however, the RT differences the adult ESL learners showed in the island
condition looked to be much smaller, compared to their RT differences at the previous region.
The preliminary analyses on each measure revealed the following significant group
associated 2-way interactions: group by plausibility [regression-path duration; p1 = .003, p2
= .009, and first-pass regression; p = .002], and group by island [Total RT; p1 = .018, p2 < .001].
See Table 9 for a complete summary of the preliminary analyses at Region2. Also, the reading
patterns of the three groups at Region 2 are plotted in Figure 10 (first fixation duration & firstpass RT) and Figure 11 (regression path duration & Total RT).

90

Table 9. Summary of the results of preliminary analyses at Region2
by-subject (F1)
I
P
G
first
I
x
G
fixation
duration
PxG
IxP
IxPxG
I
P
G
first-pass
IxG
RT
PxG
IxP
IxPxG
I
P
G
regression
IxG
path
duration
PxG
IxP
IxPxG
I
P
G
IxG
Total RT
PxG
IxP
IxPxG

first-pass
regression

df
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70

I
P
G
IxG
PxG
IxP
IxPxG

f
2.492
6.514
3.007
.192
.657
14.318
.659
2.068
.372
31.56
2.718
3.416
3.488
.456
.005
31.869
23.287
.321
6.277
32.903
1.464
5.835
.453
10.911
4.246
1.263
.604
.042
df
1, 2032
1, 2032
2, 2032
2, 2032
2, 2032
1, 2032
2, 2032

by-item (F2)
Îˇp2
.034
.081
.079
.005
.018
.170
.018
.029
.005
.474
.072
.082
.047
.013
.001
.313
.400
.009
.152
.320
.040
.077
.006
.238
.108
.035
.009
.002

p
.119
.016
.056
.826
.522
.001*
.521
.155
.544
.001*
.073
.049
.068
.636
.944
.001*
.001*
.726
.003
.001*
.238
.018
.504
.001*
.018
.289
.440
.840

f
.466
29.291
.782
.602
6.436
20.756
2.521

df
f
1, 27
2.091
1, 27
5.895
1.6, 44.5 5.741
2, 54
.305
2, 54
.733
1, 27
4.994
2, 54
.178
1, 27
.100
1, 27
.015
2. 54
120.77
2, 54
3.158
2, 54
1.228
1, 27
1.970
2, 54
.221
1, 27
.736
1, 27
17.695
2. 54
99.266
1.4, 38.0
.781
2, 54
5.112
1, 27
18.867
2, 54
2.346
1, 27
6.057
1, 27
.652
2. 54
133.594
2, 54
15.433
2, 54
2.646
1, 27
.042
2, 54
.929

p
.160
.022
.005
.731
.485
.035
.831
.754
.903
.001*
.050
.301
.172
.802
.399
.001*
.001*
.424
.009
.001
.106
.021
.427
.001*
.001*
.061
.840
.401

p
.495
.001*
.458
.548
.002
.001*
.081

Note. I = island constraints factor, P = plausibility factor, G = group, // .001* = p < .001.

91

Îˇp2
.072
.179
.175
.011
.026
.155
.007
.004
.001
.817
.105
.043
.068
.008
.027
.396
.786
.028
.159
.411
.080
.183
.024
.832
.364
.098
.002
.033

Figure 10. Reading patterns of the three groups during early stages of processing at Region2

92

Figure 11. Reading patterns of the three groups during late stages of processing at Region2

93

Regression path duration It appeared that the significant group by plausibility
interaction on regression path duration was on the non-significant main effect of plausibility for
the NS English group, F1 (1, 23) = .222, p = .642, Îˇp2= .010; F2 (1, 27) = .098, p = .757, Îˇp2
= .004], in contrast to the two ESL groups that showed significant plausibility effects, apparently
because of significantly longer RTs in the [non-island, implausible condition] for both learner
groups; the early ESL, F1 (1, 23) = 14.880, p = .001, Îˇp2= .427; F2 (1, 27) = 4.943, p = .035, Îˇp2
= .155, and the adult ESL, F1 (1, 23) = 44.733, p < .001, Îˇp2= .624; F2 (1, 27) = 45.189, p < .001,
Îˇp2 = .626. The longer regression path durations in the [implausible, non-island condition] by the
two ESL groups also appeared to contribute to a significant interaction of island and plausibility,
F1 (1, 23) = 10.739, p = .004, Îˇp2 = .349; F2 (1, 27) = 13.666, p = .001, Îˇp2 = .336, for the early
ESL group, and F1 (1, 23) = 14.648, p < .001, Îˇp2 = .352; F2 (1, 27) = 15.391, p = .001, Îˇp2
= .363, for the adult ESL group. Subsequent planned paired t-tests confirmed that a significant
plausibility effect was present only in the non-island condition, for both learner groups; early
ESL group: [non-island: t1 (20) = 3.974, p = .001, d = 1.072; t2 (27) = 3.865, p = .001, d = .1.061;
and island: t1 (20) = .307, p = .762, d = .003; t2 (27) = .173, p = .864, d = .047], and adult ESL
group: [non-island: t1 (27) = 8.092, p < .001, d = 1.234; t2 (27) = 8.262, p < .001, d = 2.358; and
island: t1 (27) = .582, p = .565, d = .108; t2 (27) = 1.032, p = .311, d = .290]. The NS English
group showed a significant island by plausibility interaction only in the F1 analysis, F1 (1, 23) =
4.433, p = .046, Îˇp2= .162; F2 (1, 27) = 1.112, p = .301, Îˇp2 = .040. However, the following ttests showed only a marginal plausibility effect in the by-subjects analysis; [non-island: t1 (23) =
1.957, p = .063, d = .419; t2 (27) = 1.125, p = .270, d = .268; and island: t1 (23) = .953, p = .351,
d = .231; t2 (27) = .477, p = .637, d = .130. Of the three groups, only the adult ESL group showed
a main island effect, F1 (1, 23) = 6.853, p = .014, Îˇp2= 202; F2 (1, 27) = 4.435, p = .045, Îˇp2

94

= .141, likely due to their much slower reading in the [non-island, implausible] condition. For the
other two groups, their generally slower RTs in the island condition tended to approximate the
sum of relatively faster RTs in the plausible sentences and relatively slower RTs in the
implausible sentences in the non-island condition, leading to no island effect; The NS English, F1
(1, 23) = .543, p = .469, Îˇp2= .023; F2 (1, 27) = .001, p = .975, Îˇp2 = .001; the early ESL, F1 (1,
23) = .005, p = .945, Îˇp2= .001; F2 (1, 27) = .020, p = .890, Îˇp2 = .001.
First-pass regression The results of the follow-up analyses on first-pass regression
patterned very similarly to those of regression path duration. First, the adult ESL group showed a
drastic increase in their regression ratio in the [non-island, implausible] condition (approximately
42%), compared to their regression in the plausible counterpart (approx. 13%), indicating a
greater plausibility effect than they previously had at the critical region (Region1, approx. 15%
and 30% respectively in reading plausible and implausible sentences). This resulted in a
significant main effect for plausibility, F (1, 780) = 56.882, p < .001, as well as significant
interaction of island and plausibility, F (1, 780) = 27.221, p < .001. The early ESL group also
showed a significant effect of plausibility, F (1, 584) = 10.491, p = 0.001, with increased
regression ratios in the [non-island, implausible] condition. However, neither a significant
plausibility effect, F (1, 668) = .231, p = .631, nor a significant interaction, F (1, 668) = .231, p
= .099, was found for the NS English group. Lastly no group showed a main effect of island: The
NS English speakers, F (1, 668) = 1.426, p = .233, early ESL learners, F (1, 584) = .592, p
= .442, and the adult ESL learners, F (1, 780) = .226, p = .636, presumably because their lower
regression ratios in the [non-island, plausible] and higher regression ratio in the [non-island,
implausible] condition offset their generally higher regression ratios in the island condition.

95

Total RT

As reported above, the preliminary analysis on Total RT showed a significant

group by island interaction in both F1 (p1 = .018) and F2 (p2 < .001) analyses. The following
analyses for each group could identify that the source of the significant interaction was on the
relatively longer reading times of the early and adult ESL groups spent in the non-island
condition than in the island condition. In contrast, the NS English group showed the opposite
pattern, spending more times in the island than in the non-island condition. This rendered a
significant island effect for the NS English group, F1 (1, 23) = 24.920, p < .001, Îˇp2= .520; F2 (1,
27) = 27.764, p < .001, Îˇp2 = .507, but not for the early ESL group, F1 (1, 23) = .011, p = .917,
Îˇp2= .001; F2 (1, 27) = 1.569, p = .221, Îˇp2 = .055, and the adult ESL group, F1 (1, 23) = .089, p
= .767, Îˇp2= .003; F2 (1, 27) = .991, p = .328, Îˇp2 = .035. No group showed a significant
plausibility effect; NS English, F1 (1, 23) = 2.019, p = .169, Îˇp2= .081; F2 (1, 27) = 1.701, p
= .203, Îˇp2 = .059; early ESL, F1 (1, 23) = .015, p = .904, Îˇp2= .001; F2 (1, 27) = 1.520, p = .228,
Îˇp2 = .053; and the adult ESL group, F1 (1, 23) = .515, p = .478, Îˇp2= .019; F2 (1, 27) = 3.405, p
= .076, Îˇp2 = .112. Finally, no group showed a significant island by plausibility interaction at
Region2m as measured by Total RT; NS English, F1 (1, 23) = .618, p = .440, Îˇp2= .026; F2 (1,
27) = .831, p = .370, Îˇp2 = .030; early ESL, F1 (1, 23) = .055, p = .816 Îˇp2= .003; F2 (1, 27)
= .523, p = .476, Îˇp2 = .019; and the adult ESL group, F1 (1, 23) = .693, p = .412, Îˇp2= .025; F2
(1, 27) = .238, p = .630, Îˇp2 = .009.

96

4.3.1.3. Interim summary of the resultsâ Initial gap

Table 10. Summary of the major findings at the initial gap
Significant island x plausibility interaction?
(Critical) Region 1
YES
â
â
â
â
NS English
â

Early ESL

YES
â
â
â
â
â

FFD
first-pass RT
REGR
RPD
Total RT

FFD
first-pass RT
REGR
RPD
Total RT

Major implications

(Spillover) Region 2
â˘

All three groups appeared
to have employed the
active filler gap strategy in
the non-island condition,
demonstrating plausibility
effect with longer RTs and
more regressions in
reading implausible
sentences.

â˘

All three groups seemed
to have applied relevant
relative clause island
constraint from early
stages of processing,
avoiding gap postulations
in the island environment.

â˘

Adult ESL group showed
slightly delayed
applications of island
constraints, compared to
the other two groups,
showing the similar
reading patterns in both
island conditions (i.e.,
longer RTs in reading
implausible sentences), as
measured by FFD.

YES
â FFD

NO
â
â
â
â

First-pass RT
REGR
RPD
Total RT

YES
â FFD
â REGR
â RPD

NO
â First-pass RT
â Total RT

Adult ESL

YES
â
â
â
â

first-pass RT
REGR
RPD
Total RT

YES
â FFD
â REGR
â RPD
â˘

Both ESL groups
displayed clearer spillover
NO
NO
effects until the later
â First-pass RT
â FFD
stages of processing,
â Total RT
compared to the NSs of
English controls.
Note. FFD = first fixation duration, REGR = first-pass regression, RPD = regression path
duration

97

4.3.2. Filler-gap reanalysis: Ultimate gap
4.3.2.1. Analysis of reading patterns at the second critical region (Region3)
Table 11 provides descriptive statistics of the three groupsâ RTs and regression
probabilities at the second critical region (Region3), the region that includes the ultimate gap for
the filler for all experimental conditions. In the non-island condition, Region3 serves as a spot
where the parser needs to withdraw its initial analysis at Region1, and performs an immediate
reanalysis by reassigning the filler as the object of the preposition about.

Table 11. Descriptive statistics for RTs and first-pass regression at Region3
Island
Cond.

NS

nonisland

English

nonisland

ESL

F-pass RT

REGR

RPD

Total RT

M (SD)

M (SD)

M (SD)

M (SD)

M (SD)

plausible

293 (49)

442 (186)

.23 (.14)

697 (331)

980 (398)

implausible 243 (48)

366 (126)

.13 (.10)

521 (243)

779 (315)

285 (63)

507 (233)

.28 (.23)

961 (519)

1802 (855)

implausible 298 (80)

530 (212)

.33 (.19)

1086 (560)

1718 (800)

325 (72)

652 (299)

.35 (.27)

1267 (857)

1685 (911)

implausible 277 (56)

575 (218)

.23 (.22)

958 (559)

1057 (410)

316 (57)

814 (314)

.31 (20)

1413 (712)

1774 (494)

implausible 312 (64)

850 (378)

.29 (.22)

1374 (644)

1785 (421)

314 (88)

691 (295)

.35 (.25)

1535 (976)

1833 (1057)

implausible 288 (64)

660 (295)

.37 (.25)

1301 (790)

1233 (457)

345 (65)

902 (333)

.44 (26)

2174 (1131)

2284 (551)

implausible 325 (73)

896 (370)

.46 (27)

2300 (1287)

2187 (669)

plausible

plausible
island

Adult

FFD

plausible
island

Early

Plausibility
Cond.

nonisland

ESL

plausible

plausible
island

98

As a result, Region3 is the place where a reanalysis effect is expected, in the form of a
plausibility effectâbut in a reverse direction compared to the plausibility effect found at the
previous regionsâas signaled by increased RTs and regression probabilities while reading
sentences in the [non-island, plausible] than in the [non-island, implausible] condition. In the
island condition, on the other hand, Region3 serves as an initial gap for the filler, given that there
is no grammatical licit gap in the island structures. The parser therefore should be free from the
plausibility manipulation (i.e., no plausibility effect) not only at the first critical region, but also
at this region. As a result, the integration of the filler into a verb such as mentioned should yield
about the same amount of processing load between the two plausibility conditions.
Bearing that in mind, the reading profiles of the three groups tended to show the expected
patterns in the non-island condition, in that both the NS English and the two ESL groups
exhibited longer reading times and made more first-pass regressions in the plausible than in the
implausible condition across the measures. An exception was first-pass regression of the adult
ESL groups, which had slightly more regressions in the implausible (approx. 37%) than in the
implausible condition (approx. 35%). In addition, compared to the NS English and early ESL
groups, the reading time differences of the adult ESL group between the two plausibility
conditions looked to be slightly smaller, especially on first fixation and first-pass RT. On the
other hand, the Total RT of the adult learners seemed to reflect the largest processing difficulties
in the plausible condition. Interestingly, regression path durations of the adult ESL group were
longer in the plausible condition, despite their slightly less frequent regression ratios (approx. 2%
less). This might suggest that although the adult learners made more regressions when reading
implausible sentences at this region (Region3), their recovery from the initial misanalysis made
at Region1 took longer in the plausible condition. In the island condition, the reading patterns

99

between the plausibility conditions were somewhat mixed across the groups and measures, but,
crucially, the differences between the two plausible conditions were generally smaller than the
differences in the non-island condition regardless of the direction of them. The reading patterns
of the three groups at the second critical region (Region 3) are plotted in Figure 12 (first fixation
duration & first-pass RT) and Figure 13 (regression path duration & Total RT).

100

Figure 12. Reading patterns of the three groups during early stages of processing at Region3

101

Figure 13. Reading patterns of the three groups during late stages of processing at Region3

102

A summary of the preliminary analyses at Region3 is given in Table 12. The preliminary
analyses showed a main effect of island constraints for all measures, with significantly longer
RTs and more regressions in the island condition, indicating generally more processing burden in
the island condition.

Table 12. Summary of the results of preliminary analyses at Region3
by-subject (F1)

Reading Times
I
P
G
first
IxG
fixation
duration P x G
IxP
IxPxG
I
P
G
firstIxG
pass
RT
PxG
IxP
IxPxG

df
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70

f
14.620
22.932
3.743
.927
.090
11.469
3.311
106.213
3.318
12.954
.610
.031
8.234
2.071

p
.001*
.001*
029
.400
.914
.001
.042
.001*
.073
.001*
.546
.969
.005
.134

by-item (F2)
Îˇp
.173
.247
.097
.026
.003
.141
.086
.603
.045
.270
.017
.001
.105
.056
2

df
f
1, 27
20.011
1, 27
11.217
2, 54
9.433
2, 54
1.050
1.4, 38.8
.624
1, 27
10.093
2, 54
3.472
1, 27
40.504
1, 27
1.540
2. 54
136.208
2, 54
.028
2, 54
.253
1, 27
5.524
2, 54
.841

p
.001*
.002
.001*
.357
.490
.004
.038
.001*
.225
.001*
.972
.777
.026
.437

Îˇp2
.426
.294
.259
.037
.023
.272
.114
.600
.054
.835
.001
.009
.170
.030

First-pass regression probability
df
f
p
1, 2032
19.325
.001*
I
1, 2032
5.010
.025
P
2,
2032
4.823
.008
G
firs-pass
2, 2032
4.440
.012
IxG
regression
2, 2032
2.922
.054
PxG
1,
2032
9.034
.003
IxP
2, 2032
3.339
.036
IxPxG
Notes. 1. I = island constraints factor, P = plausibility factor, G = group. // .001* = p < .001.
2. As noted earlier, preliminary mixed ANOVA analyses on regression path duration and Total RT
were not performed because the data did not meet the assumptions for ANOVA.

As shown, a significant main effect of plausibility was found on first fixation duration, first-pass
regression, and marginally on first-pass RT in the F1 analysis, which were most likely due to

103

increased reading times and regression ratios in the [non-island, plausible] condition for all
groups. This pattern also contributed to a significant island and plausibility interaction for all
measures, reflecting plausibility effects that are reverse to the plausibility effects observed at
Region1. In regard to group related interactions, the following results were found. First, a
significant 2-way interaction of group and island was found on first-pass regression (p = .012).
Second, there was a marginally significant interaction of group and plausibility on first-pass
regression (p = .054). Lastly, there was a significant 3-way (group x island x plausibility)
interaction on first fixation duration (p1 = .042, p2 = .038) and first-pass regression (p = .036).
The results of the follow-up analyses on each of those measures, and the result of the nonparametric Wilcoxon signed ranks test on regression path duration and Total RTs are reported
below:
First fixation duration The NS English group showed a main effect of island, F1 (1, 23)
= 4.360, p = .048, Îˇp2= .159; F2 (1, 27) = 10.493, p = .003, Îˇp2 = .280, and a main effect of
plausibility, F1 (1, 23) = 10.584, p = .004, Îˇp2= .315; F2 (1, 27) = 4.760, p = .038, Îˇp2 = .150.
Crucially, there was also a significant island by plausibility interaction for the NS English group,
F1 (1, 23) = 24.626, p < .001, Îˇp2= .517; F2 (1, 27) = 19.124, p < .001, Îˇp2 = .415, suggesting a
reliable plausibility effect only in the non-island condition. The following planned paired t-tests
confirmed this trend, in that longer RTs in the [non-island, plausible] condition were found to be
significantly longer than RTs in the [non-island implausible] condition, t1(23) = 6.095, p < .001,
d = 1.050; t2(27) = 5.134, p < .001, d = 1.300, whereas there was no significant plausibility effect
in the island condition, t1 (23) = .993, p = .331, d = .115; t2 (27) = 1.487, p = .149, d = .442. The
results of the early ESL group were also similar to those of the NS English group, in that they
showed a significant plausibility effect, F1 (1, 23) = 7.006, p = .015, Îˇp2= .259; F2 (1, 27) =

104

6.493, p = .017, Îˇp2 = .194, which was modulated by the island constraints factor, leading a
significant 2-way interaction, F1 (1, 23) = 5.855, p = .025, Îˇp2= .226; F2 (1, 27) = 2.278, p
= .143, Îˇp2 = .078. There was no reliable difference between the two island conditions when the
reading times in the two plausibility conditions in each island group were combined, F1 (1, 23) =
2.450, p = .133, Îˇp2= .109, F2 (1, 27) = 2.205, p = .149, Îˇp2 = .076. The follow-up t-tests for each
island condition confirmed that, like the NS English group, the early ESL group also displayed a
significant plausibility effect only in the non-island condition (i.e., RT plausible > RT implausible); nonisland condition: t1 (20) = 3.163, p = .005, d = .752; t2 (27) = 2.901, p = .007, d = .744; and
island condition, t1 (20) = .538, p = .596, d = .097; t2 (27) = .593, p = .558, d = .163. For the adult
ESL group, they showed a significant main effect of island with significantly longer first fixation
durations in the island condition, F1 (1, 23) = 7.657, p = .010, Îˇp2= .221; F2 (1, 27) = 17.358, p
< .001, Îˇp2 = .391. A significant plausibility effect was found only in the F1 analysis, F1 (1, 23) =
6.877, p = .014, Îˇp2= .203; F2 (1, 27) = 1.540, p =.225, Îˇp2 = .054, as their reading time in the
plausible condition were slightly higher in both island conditions. In contrast to the NS English
and early ESL groups, the adult ESL group did not display a significant island by plausibility
interaction, when measured by first fixation duration, F1 (1, 23) = .077, p = .783, Îˇp2= .003; F2
(1, 27) = .267, p = .610, Îˇp2 = .010. Although there was no significant interaction of the two
factors for the adult ESL group, a planned paired t-test was carried for each island condition for
the sake of further identifying whether the non-significance was due to the lack of plausibility
effect or due to a significant plausibility effect in both island conditions. The result showed that
the source of the non-significant result was due to the lack of plausibility effects in both island
conditions: non-island condition: t1 (27) = 1.540, p = .135, d = .289; t2 (27) =1.466, p = .154, d
= .361; and island condition: t1 (27) = 1.467, p = .154, d = .311; t2 (27) = .429, p = .672, d = .125.

105

First-pass regression The source of a significant interaction of group and island
constraints was on the non-significant island effect of the early ESL group, F (1, 584) = .040, p
= .841, whereas the other two groups showed a significant island effect with relatively higher
overall regression ratios in the island condition for both the NS English, F (1, 668) = 22.265, p
< .001, and the adult ESL group, F (1, 780) = 9.732, p = .002. On the other hand, a significant
island by plausibility effect was found only in the NS English group, F (1, 688) = 13.326, p
< .001, in that the NS English speakers made significantly more frequent regressions (approx.
10% more) in the [non-island, plausible] condition than in the [non-island, implausible], whereas
they showed, albeit to a lesser degree, a reverse pattern in the island condition with
approximately 5 percent more regressions in the implausible condition. The early ESL group,
similarly to the NS English controls, showed about 10 percent more regressions in the [nonisland, plausible] than in the [non-island, implausible] condition. This pattern was the same in
the island condition, although the difference between the two plausibility conditions was much
smaller (approx. 2%, compared to 10% difference in the non-island condition). However, the
results showed no significant interaction of the two factors, F (1, 584) = 1.931, p = .165. The
adult ESL group did not seem to have any significant plausibility effect in either island
condition, as they showed only about 2% difference between plausible and implausible sentences
for both island conditions, which resulted in no main effect for plausibility, F (1, 780) = .431, p
= .512, and no significant interaction of island and plausibility, F (1, 780) = .095, p = 758.
Regression path duration & Total RT As reported earlier, the regression path duration
and Total RT data at Region3 widely violated assumptions for ANOVA. Therefore, sets of
nonparametric Wilcoxon signed ranks test were performed alternatively on each island condition
for each group. Overall, the results were in line with the prediction of the garden-path effect in

106

the [non-island, plausible] condition for both measures. That is, both the regression path
durations and Total RTs in the [non-island, plausible] condition were significantly longer than
the RTs in the plausible counterpart for all groups; NS English: [regression path duration, Z1 =
3.457, p = .001, d = 1.152; Z2 = 2.207, p = .043, d = .563; Total RT, Z1 = 2.857, p = 004; d
= .905; Z2 = 2.619, p = .009, d = .747]; the early ESL group: [regression path duration, Z1 =
3.901, p < .001, d = 1.508; Z2 = 2.163, p = .031, d = .604; Total RT, Z1 = 2.902, p = .004, d =
1.002; Z2 = 4.349, p < .001, d = 1.474]; and the adult ESL group: [regression path duration, Z1 =
3.256, p = .001, d = .966; Z2 = 2.095, p = .036, d = .584; Total RT, Z1 = 4.509, p < .001, d =
1.510; Z2 = 4.440, p < .001, d = 1.474]. In contrast to such reversed plausibility effects that were
significant in the non-island condition, no such significance was found within the island
condition across the groups; NS English: [regression path duration, Z1 = 1.400, p = .162, d
= .413; Z2 = 1.275, p = .202, d = .346; Total RT, Z1= 1.057, p = .290; d = .309; Z2 = .387, p
= .699, d = .104]; early ESL: [regression path duration, Z1 = .174, p = .862, d = .054; Z2 = .979, p
= .327, d = .264; Total RT, Z1 = .330, p = .741, d = .102; Z2 = .023, p = .982, d = .006]; and adult
ESL group: [regression path duration, Z1 = 1.116, p = .256, d = .302; Z2 = .979, p = .327, d
= .264; Total RT, Z1 = .638, p = .524; d = .171; Z2 = .911, p = .362, d = .245].

107

4.3.2.2. Analysis of reading patterns at the spillover region (Region4)
Table 13 provides descriptive statistics for RTs and first-pass regression (%) across the
conditions at Region4. Some different reading patterns were found between the NS English and
Table 13. Descriptive statistics for RTs and first-pass regressions at Region4
Island
Cond.

Plausibility
Cond.

FFD

F-pass RT

REGR

RPD

Total RT

M (SD)

M (SD)

M (SD)

M (SD)

M (SD)

nonisland

plausible

238 (57)

356 (153)

.35 (.18)

965 (430)

905 (276)

implausible 263 (45)

390 (121)

.23 (.17)

678 (338)

886 (292)

250 (56)

440 (135)

.37 (.13)

976 (267)

1131 (338)

implausible 249 (51)

401 (137)

.36 (.18)

1023 (497)

1064 (363)

291 (52)

534 (172)

.41 (.20)

1425 (1044)

1163 (411)

implausible 271 (44)

519 (174)

.22 (.17)

743 (245)

900 (305)

287 (43)

515 (119)

.35 (.24)

1080 (502)

1044 (444)

implausible 277 (45)

530 (130)

.32 (.21)

1035 (441)

1041 (305)

310 (49)

580 (122)

.46 (.27)

1761 (1123)

1393 (424)

implausible 286 (41)

556 (131)

.27 (.18)

1030 (501)

944 (322)

259 (36)

536 (177)

.34 (.21)

1253 (624)

1058 (292)

implausible 298 (54)

591 (156)

.30 (20)

1168 (536)

1120 (385)

NS
English

plausible
island

nonisland
Early
ESL

plausible

plausible
island

nonisland
Adult
ESL

plausible

plausible
island

the two ESL groups in the non-island condition. Most notably, for first fixation duration and
first-pass RT, the NS English group spent more times in the [non-island implausible] than in the
[non-island, plausible] condition, contrary to the pattern they showed at the critical region
(Region3) that had significantly longer reading times in the plausible condition. They, however,
showed the same patterns for the other measures, with longer reading times and more regressions
in the plausible condition. On the other hand, the early and adult ESL groups continued to exhibit
108

similar reading patterns they previously showed at Region3 for all measures, with longer RTs
and more regressions in the [non-island, plausible] condition. The patterns of first-pass
regression by the adult ESL group had somewhat remarkable changes across the experimental
conditions, compared to the patterns they showed at the previous region.
Recall that the adult ESL learners had slightly higher regression ratios in the [non-island,
implausible] condition at Region3, compared to its plausible pair (i.e., approx. 35% in the
plausible and 37% in the implausible condition). At Region4, however, they made nearly 20
percent more regressions in the plausible condition (approx. 46% as opposed to 27%), possibly
reflecting relatively delayed reanalysis, which were also reflected in their largely increased
regression path durations and Total RTs in the non-island condition. In the island condition, the
patterns between the plausible and implausible conditions were somewhat mixed across the
measures. However, the differences between the two plausibility conditions were generally not
large across the measures. The results of the preliminary analyses at Region4 are summarized in
Table 14, followed by Figure 14 (first fixation duration & first-pass RT) and Figure 15
(regression path duration & Total RT) that show the reading patterns of the three groups at this
region.

109

Table 14. Summary of the results of preliminary analyses at Region4
by-subject (F1)
I
P
G
first
IxG
fixation
duration
PxG
IxP
IxPxG
I
P
G
first-pass
IxG
RT
PxG
IxP
IxPxG
I
P
G
regression
IxG
path
duration
PxG
IxP
IxPxG
I
P
G
IxG
Total RT
PxG
IxP
IxPxG

df
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70
1, 70
1, 70
2, 70
2, 70
2, 70
1, 70
2, 70

f
1.906
.436
9.583
2.300
3.780
1.645
7.879
1.365
.178
20.319
2.081
.164
.004
6.427
1.585
30.472
9.401
1.852
.246
15.978
.119
1.932
20.475
1.762
3.853
2.449
13.123
5.528

p
.172
.511
.001*
.108
.028
.204
.001
.247
.674
.001*
.132
.849
.948
.003
.212
.001*
.001*
.165
.783
.001*
.888
.169
.001*
.179
.026
.094
.001
.006

by-item (F2)
Îˇp
.027
.006
.215
.062
.097
.023
.184
.019
.003
.367
.056
.005
.001
.155
.022
.303
.212
.050
.067
.186
.003
.027
.226
.048
.099
.065
.158
.136
2

df
1, 27
1, 27
2, 54
2, 54
2, 54
1, 27
2, 54
1, 27
1, 27
2. 54
2, 54
2, 54
1, 27
2, 54
1, 27
1, 27
2. 54
2, 54
2, 54
1, 27
2, 54
1, 27
1, 27
2. 54
2, 54
2, 54
1, 27
2, 54

f
.895
.060
31.025
2.336
4.353
1.962
5.161
.431
.466
76.771
2.569
.301
.237
3.825
.961
19.543
19.077
6.572
.662
11.118
.624
3.974
19.823
8.607
9.111
2.713
9.042
6.659

p
.352
.808
.001*
.106
.018
.173
.009
.517
.500
.001*
.086
.741
.631
.028
.336
.001*
.001*
.016
.445
.002
.539
.056
.001*
.001
.001*
.075
.006
.003

Îˇp2
.032
.002
.535
.080
.139
.068
.160
.016
.017
.740
.087
.011
.009
.124
.034
.420
.414
.196
.022
.292
.023
.128
.423
.242
.252
.091
.251
.195

df
f
p
1, 2032
.652
.420
I
1, 2032
26.308
.001*
P
2, 2032
.238
.788
G
first-pass
2, 2032
2.093
.124
IxG
regression
2, 2032
.504
.604
PxG
1, 2032
13.515
.001*
IxP
2, 2032
.198
.821
IxPxG
Note. In the first-pass regression analysis, both the subject and item factors were entered as random
factors. I = island constraints factor, P = plausibility factor, G = group, * = p < .001.

110

Figure 14. Reading patterns of the three groups during early stages of processing at Region4

111

Figure 15. Reading patterns of the three groups during late stages of processing at Region4

112

First, a significant interaction of island and plausibility was found for regression path
duration, first-pass regression, and Total RT, likely indicating the expected (reversed)
plausibility effect only in the non-island condition for those measures. The group related
interactions were found for all measures except in the first-pass regression. First, a significant
group by plausibility interaction was found on first fixation duration (p1 = .028, p2 = .018), and
marginally on Total RT (p1 = .094, p2 = .075) and first-pass RT in the by-items analyses (p2
= .086). A significant interaction of group and island was found on Total RT (p1 = .026, p2
< .001) and regression path duration in the by-items analysis (p2 = .016). Lastly, there was a
significant 3-way interactions (group x island x plausibility) on first fixation duration, first-pass
RT, and Total RT. The results of the follow-up analyses for each group on those four measures
are provided below:
First fixation duration & first-pass RT

First of all, the early ESL group showed neither

main effects nor interaction effect in both measures, suggesting that their reading profiles were
similar to one another across the experimental conditions for these two measures: first fixation
durationâIsland (I) (p1 = .927, p2 = .592); Plausibility (P) (p1 = .179, p2 = .187), ; Interaction (I
x P) (p1 = .630, p2 = .525), and first-pass RTâI (p1 = .777, p2 = .959); P (p1 = .823, p2 = .939); I
x P (p1 = .504, p2 = .518). For the NS English group, they showed a main island effect on firstpass RT, F1 (1, 23) = 4.680, p = .041, Îˇp2= .169; F2 (1, 27) = 3.999, p = .056, Îˇp2 = .129, in that
their RTs in the island condition were significantly longer than those in the non-island condition.
The adult ESL group also showed a main island effect on first fixation duration, F1 (1, 23) =
6.232, p = .019, Îˇp2= .188; F2 (1, 27) = 5.112, p = .032, Îˇp2 = .159, in that they spent more time
in the non-island (particularly for plausible reading) than in the island condition. In regard to the
interaction effect, both the NS English and the adult ESL group showed a significant island by

113

plausibility effect for both measures, except in the first fixation duration of the NS English group
in the by-items analysis: NS English, [F1 (1, 23) = 4.376, p = .048, Îˇp2= .160; F2 (1, 27) = 1.772,
p = .200, Îˇp2 = .060] for first fixation duration, and [F1 (1, 23) = 6.266, p = .020, Îˇp2= .214; F2 (1,
27) = 5.870, p = .022, Îˇp2 = .179] for first-pass RT; the adult ESL group, [F1 (1, 23) = 18.369, p
< .001, Îˇp2= .405; F2 (1, 27) = 10.656, p = .003, Îˇp2 = .283] for first fixation duration, and [F1 (1,
23) = 5.754, p = .024, Îˇp2= .176; F2 (1, 27) = 3.609, p = .068, Îˇp2 = .118] for first-pass RT.
However, subsequent paired t-tests for each island condition found that the ways the two factors
(i.e., island & plausibility) interacted differ between the two groups. First, the NS English group
continued to show the same pattern in the island condition, with no significant differences
between the two plausibility conditions for both measures, first fixation duration (p1 = .962, p2
= .877); first-pass RT (p1 = .177, p2 = .221). In the non-island condition, the NS English group
displayed significant RT differences between the two plausibility conditions for both measures;
first-fixation duration, [t1 (23) = 2.954, p = .007, d = .503; t2 (27) = -2.187, p = .038, d = .608],
and first-pass RT [t1 (23) = 2.279, p = .032, d = .351; t2 (27) = 1.613, p = .118, d = .405].
However, as noted above, the direction of the effect was opposite to the pattern they showed at
Region3, with significantly longer RTs in the implausible rather than in the plausible condition
for both measures. On the other hand, the adult ESL group showed a significant difference in the
island condition, rather than in the non-island condition, for first fixation duration, [t1 (27) = 3.878, p = .001, d = .854; t2 (27) = -3.941, p = .002, d = .868], and marginally for first-pass RT,
[t1 (27) = -1.861, p = .074, d = .366; t2 (27) = -1.897, p = .069, d = .493], in that their RTs in the
implausible condition were found to be significantly longer than the RTs in the plausible
counterpart. In the non-island condition, the adult learners spent more times in reading plausible
sentences for both measures; however, it was only first fixation duration that showed

114

significance; first fixation duration: [t1 (27) = 2.297, p = .030, d = .530; t2 (27) = 1.736, p = .094,
d = .500.]; and first-pass RT (p1 = .318, p2 = .418).
Regression path duration & Total RT The follow-up analysis for each group found that
the cause of the significant interaction of group and island for regression path duration appeared
to be the increased RTs of the two ESL groups in the [non-island, plausible] condition (see Table
12), which increased the overall RTs of the two groups in the non-island condition, contributing
to their generally slower RTs in the island condition. Consequently, the two learner groups did
not present a significant island effect for regression path duration: the early ESL (p1 = .582, p2
= .878); and the adult ESL (p1 = .597 p2 = .123). On the other hand, a significant island effect
was found for the NS English group, F1 (1, 23) = 5.888, p = .023, Îˇp2= .204; F2 (1, 27) = 9.735, p
= .004, Îˇp2 = .265, with significantly longer RTs in the island than in the non-island condition.
There was a significant plausibility effect for all groups, most likely due to increased RTs in the
[non-island, plausible] for all groups; NS English, F1 (1, 23) = 6.583, p = .017, Îˇp2= .223; F2 (1,
27) = 2.985, p = .095, Îˇp2 = .100; early ESL, F1 (1, 20) = 13.312, p = .002, Îˇp2= .400; F2 (1, 27) =
6.738, p = .015, Îˇp2 = .200; and the adult ESL, F1 (1, 27) = 11.819, p = .002, Îˇp2= .304; F2 (1, 27)
= 23.328, p < .001, Îˇp2 = .464. There was also a significant interaction of island and plausibility
across the groups; NS English, F1 (1, 23) = 4.527, p = .044, Îˇp2= .164; F2 (1, 27) = 1.384, p
= .250, Îˇp2 = .049; early ESL, F1 (1, 20) = 7.164, p = .015, Îˇp2= .264, F2 (1, 27) = 9.041, p
= .006, Îˇp2 = .251, and the adult ESL, F1 (1, 27) = 4.951, p = .035, Îˇp2= .155; F2 (1, 27) = 6.077,
p = .020, Îˇp2 = .184. The subsequent paired t-tests confirmed that the source of this significant
interaction was the significant RT differences between the plausible and implausible condition
that was found only in the non-island condition for all three groups: NS English, [non-island: t1
(23) = 3.043, p = .006, d = .769; t2 (27) = 2.077, p = .047, d = .497; island: t1 (23) = .094, p

115

= .926, d = .025; t2 (27) = .251, p = .804, d = .073]; early ESL, [non-island: t1 (20) = 4.406, p
< .001, d = .989; t2 (27) = 4.217, p < .001, d = 1.212; island: t1 (20) = .144, p =.887, d = .040; t2
(27) = .179, p = .859, d = .051]; and the adult ESL, [non-island: t1 (27) = 3.575, p = .001, d
= .726; t2 (27) = 4.538, p < .001, d = 1.092; island: t1 (27) = .164, p = .871, d = .038; t2 (27)
= .408, p = .686, d = .094].
Analysis of Total RT also found patterns similar to those in regression path duration, in
that it was only the NS English group that showed a main island effect, F1 (1, 23) = 9.854, p
= .005, Îˇp2= .300; F2 (1, 27) = 27.514, p < .001, Îˇp2 = .505. On the other hand, a significant
plausibility effect was found only in the two learner groups, arguably due to their longer Total
RTs in the [non-island, plausible] condition for both groups; early ESL, F1 (1, 20) = 5.495, p
= .030, Îˇp2= .216; F2 (1, 27) = 21.454, p < .001, Îˇp2 = .443; and the adult ESL, F1 (1, 27) =
21.153, p < .001, Îˇp2= .439; F2 (1, 27) = 17.507, p = .020, Îˇp2 = .393. The early and adult ESL
groups also exhibited a significant interaction of island and plausibility, early ESL, F1 (1, 20) =
4.828, p = .040, Îˇp2= .194; F2 (1, 27) = 5.565, p = .026, Îˇp2 = .171; and the adult ESL, F1 (1, 27)
= 16.453, p < .001, Îˇp2= .379; F2 (1, 27) = 34.280, p < .001, Îˇp2 = .559, but not for the NS
English group (p1 = .548, p2 = .647). The planned paired t-tests confirmed that the significant
interactions found in the ESL groups were due to significant RT differences only in the nonisland condition (i.e., RT plausible > RT implausible); early ESL, [non-island: t1 (20) = 2.727, p < .013,
d = .684; t2 (27) = 4.453, p < .001, d = .903; island: t1 (20) = .590, p =.562, d = .104; t2 (27)
= .253, p = .802, d = .064]; and the adult ESL, [non-island: t1 (27) = 5.507, p < .001, d = 1.190; t2
(27) = 7.407, p < .001, d = 1.626; island: t1 (27) = .536, p = .597, d = .094; t2 (27) = 1.307, p
= .309, d = .231].

116

4.3.2.3. Interim summary of the resultsâ Ultimate gap
Table 15. Summary of the findings at the ultimate gap
Significant island x plausibility interaction?
(Critical) Region 3
YES
â
â
â
â
NS English
â

Early ESL

YES
â
â
â
â

FFD
first-pass RT
REGR
RPD
Total RT

FFD
first-pass RT
RPD
Total RT

Major implications

(Spillover) Region 4
â˘

The NS English and early
ESL groups showed
evidence for filler-gap
reanalysis from earlier
stages of processing when
reading in the non-island
condition, demonstrating
sensitivity to the structural
cues that signal the need
for a reanalysis.

â˘

The adult ESL group
showed evidence for
filler-gap reanalysis only
during late stages of
processing, as measured
by RPD and Total RT.
Their reading patterns
during early stages of
processing did not present
any plausibility effects,
indicating delayed gap
identifications, compared
to the early ESL and NS
English group.

â˘

No group displayed
plausibility effects in the
island condition,
indicating no effect of
plausibility manipulations.
This corroborates the
results found at the
previous regions that the
participants did not
postulate a gap in the
island environment.

YES
â REGR
â RPD

NO
â FFD
â First-pass RT
â Total RT

YES
â
â
â
â

FFD
REGR
RPD
Total RT

NO
â REGR

NO
â First-pass RT

YES
â first-pass RT
â RPD
â Total RT

YES
â
â
â
â

NO
â FFD
â REGR

NO
â First-pass RT

FFD
REGR
RPD
Total RT

Adult ESL

117

4.4. The effect of individual differences in working memory capacity
In order to examine how individual differences in WMC influence the ways the early and
adult ESL learners deal with a dislocated filler during online reading of filler-gap dependencies
in their L2 English, a series of repeated measures ANCOVA analyses for RT measures, and a
logistic random effects regression analysis for first-pass regressions, were carried out separately
for each group20.

4.4.1. The effect of WMC at the earliest gap Region1 and spillover Region2
The observation of the parameter estimates (Î˛Ě coefficient) for the WM span scores
showed a general trend that for all three groups, those with higher WMC, compared to those with
lower WMC participants in the same group, generally tended to read slightly faster and make
less regression (during first pass) both at Region1 and Region2, especially in the non-island
condition, but to lesser degrees or occasionally in the opposite direction in the island condition.
This appeared to be relatively more so for the adult ESL learners. For example, the regression
path duration data of the adult ESL group at Region1 showed that when the WM span score was
increased by one unit, the reading times (regression path durations) were reduced by
approximately 18 percent21, (1 â (10đ˝=.â085 )) in the [non-island, plausible condition], and
about 15 percent, ((1 â (10â.072 )) in the [non-island, implausible condition]. On the other hand,
the native English speakersâ regressing path durations were reduced by about 6.7 percent, and

20

Recall that the WM data from one early ESL learners and one adult ESL learners were
removed from the analyses due to their inconsistent performance across the two WM span tests.
Thus, the sample sizes of the ESL groups were adjusted to N = 20 for the early ESL, and N = 27
for the adult ESL group in the analyses.
21
As noted earlier, the changes in the reading time outcome variables are presented in percent
rather than changes in actual reading times because the raw data were log transformed.
118

3.3 percent with one unit increase in the WM span, respectively in the same conditions. When
considering the fact that the RTs of the ESL learners were much slower than the native English
speakers (see Table 6), the degrees of the changes in RTs as a function of the WM score increase
would seem to be relatively larger. With this in mind, the results of the repeated measures
ANCOVAs and logistic regression analyses at the critical (Region1) and spillover regions
(Region2) for each group are provided in Table 16. In the following, the results of each group are
reported.
NS English

At Region1 and Region2, the NS English group showed neither a reliable

main effect of WM nor a significant interaction associated with WM on these two regions across
all measures, indicating that different WMCs among the native English speakers did not have
much effect on native English speakersâ reading behaviors on these regions across different
experimental conditions. Compared to the results obtained from the mixed ANOVA analyses
reported in the previous sections, the results for the other non-WM-related factors from the
ANOCOVA analyses did not show much change. Crucially, the significant interaction of island
constraints and plausibility found at Region1 in the ANOVA analyses appeared to remain intact,
for all dependent measures.

119

Table 16. Summary of the WM effect analyses at Region1 and Region2
NS English
Region1

Early ESL

Region2

Region1

Adult ESL

Region2

Region1

Region2

f

p

f

p

f

p

f

p

f

p

f

p

I

6.931

.015

4.065

.056

.600

.449

.305

.588

.868

.360

.269

.608

P

7.085

.014

5.678

.026

8.491

.009

.121

.732

8.505

.007

2.116

.158

first-

IxP

10.504

.004

2.106

.161

11.912

.003

14.027

.001

.004

.951

3.942

.044

fixation

WM

.151

.702

.106

.748

3.643

.072

.700

.414

4.333

.048

.455

.506

duration I x WM

2.732

.113

1.098

.306

2.751

.114

.762

.394

.083

.775

.170

.684

P x WM

.126

.726

.021

.887

.007

.935

.017

.899

.190

.667

.000

.992

I x P x WM

.692

.414

1.686

.208

2.836

.109

1.022

.325

1.933

.177

2.503

.126

I

5.916

.024

13.308

.001

4.236

.054

.068

.797

10.832

.003

.026

.874

P

10.854

.003

2.468

.130

6.073

.024

.159

.694

3.499

.073

4.725

.039

IxP

12.986

.002

.355

.557

5.573

.030

2.520

.130

.499

.486

.893

.354

WM

1.249

.276

.516

.480

1.381

.255

1.170

.292

2.053

.164

.221

.642

I x WM

1.902

.182

.746

.397

.575

.458

.663

.426

1.098

.305

.540

.469

P x WM

.215

.647

1.091

.310

4.484

.048

.770

.392

.299

.590

3.216

.085

I x P x WM

.041

.841

1.637

.214

2.118

.163

.001

.980

1.858

.185

.315

.580

I

.656

.427

.415

.526

4.128

.057

.025

.877

5.773

.024

.549

.021

P

7.085

.014

.374

.547

5.492

.310

12.144

.003

12.922

.001

38.794

.001*

24.793

.001*

4.400

.048

16.327

.001

9.183

.007

20.841

.001*

18.502

.001*

.670

.422

.076

.786

1.920

.183

1.444

.245

1.915

.179

.045

.834

duration I x WM

.017

.896

.293

.593

.062

.806

.798

.384

.184

.672

3.607

.069

P x WM

.125

.727

1.095

.307

.735

.403

.353

.560

1.108

.303

2.090

.161

I x P x WM

.136

.716

.141

.711

1.113

.305

.507

.485

.443

.512

.328

.572

first-pass
RT

regression I x P
path

WM

120

Table 16 (contâd)
NS English
Region1
I

Total
RT

first-pass
regression

Early ESL
Region2

Region1

Adult ESL
Region2

Region1

Region2

f

p

f

p

f

p

f

p

f

p

f

P

15.984

.001

25.966

.001*

1.084

.312

.014

.909

.480

.495

.005

.947

*

.177

.678

P

2.706

.114

1.701

.206

5.906

.026

.021

.885

24.767

.001

IxP

4.758

.044

.594

.449

11.474

.003

1.022

.325

20.388

.001*

.481

.494

WM

1.854

.187

.600

.447

.587

.454

1.300

.269

.426

.520

.046

.831

I x WM

3.225

.085

1.078

.310

.529

.476

1.279

.273

5.159

.032

3.536

.072

P x WM

.694

.414

.370

.549

.906

.354

.052

.822

1.661

.209

.113

.740

I x P x WM

1.430

.244

.005

.947

.340

.567

.063

.805

.303

.587

.201

.658

I

5.579

.018

1.632

.202

9.152

.003

.715

.398

21.346

.001*

.974

.324

P

6.601

.010

.241

.623

12.060

.001

8.182

.004

8.941

.003

48.943

.001*

IxP

16.785

.001*

2.606

.107

14.735

.001*

2.559

.110

4.972

.026

25.962

.001*

WM

.004

.951

.345

.557

.479

.489

.462

.497

.381

.537

.236

.628

I x WM

.554

.457

.222

.638

.459

.498

.168

.682

.728

.394

5.814

.016

P x WM

.207

.650

.058

.809

2.295

.130

.058

.810

.178

.178

.061

.805

I x P x WM

.220

.639

1.532

.216

9.749

.002

1.216

.271

.154

.154

.613

.434

Note. I = island constraints factor, P = plausibility factor, WM = WM covariate, // .001* = p < .001.

121

Early ESL

The early ESL learners exhibited WM-related significant interactions on

two measures at Region1: a significant 2-way interaction between plausibility and WM for firstpass RT (p = .048), and a significant 3-way (WM x island x plausibility) interaction for firstpass regression, F (1, 18) = 9.749, p = .002, Îˇp2 = .130. As noted in the previous chapter (see
section 3.4.2.), the early ESL learners were divided into two sub-groups based on their WM span
scores, the higher WM (n =10), and lower WM (n = 10), to obtain the descriptive statistics of the
two subgroups for those two measures. See Table 17.

Table 17. First-pass RT and first-pass regressions by higher- and lower-WM early ESL
First-pass RT

[non-island,
plausible]
[non-island,
implausible]
[island,
plausible]
[island,
implausible]

REGR

Î˛Ě

H-WM
M (SD)

L-WM
M (SD)

OR

H-WM
M (SD)

L-WM
M (SD)

.88

432 (158)

468 (245)

.93

0.13 (.16)

.17 (.13)

.86

540 (266)

584 (248)

.75

0.24 (.15)

.36 (.15)

1.08

441 (188)

399 (130)

.72

.29 (.17)

.40 (.18)

.84

375 (145)

458 (142)

1.29

.37 (.14)

.27 (.17)

In regard to first-pass RT, the repeated ANCOVA showed a main effect of plausibility (p
= .024), a marginal effect of island (p = .054), and a significant interaction of island and
plausibility (p = .030), reflecting a clear sign of plausibility effect only in the non-island
condition. With respect to the significant interaction of plausibility and WM, F (1, 18) = 4.484,
p = .048, Îˇp2 = .199, the observation of the beta coefficients for the WM span scores hinted that
the cause of the significant interaction might be the different reading patterns in the island
condition between the two subgroups. In the [island, plausible] condition, the trend was about
8% increase of first-pass RT with one unit increase in the WM span, (i.e., positive direction), but

122

it was the opposite in the implausible counterpart; the amount of the change was the largest with
the negative relationship of 16% in the [island, implausible] condition. This tendency could be
observed in the descriptive statistics as well (see Table 15). Whereas the two groups displayed
similar reading patterns in the non-island condition (i.e., RT implausible > RT plausible), the RT pattern
of the higher WM group appeared to be reversed in the island condition (i.e., RT plausible > RT
implausible),

thus reducing the magnitude of the plausibility effect in the non-island condition (i.e.,

no or less plausibility effect overall). In contrast, the lower WM group showed a similar pattern
across the island conditions (i.e., RT plausible < RT implausible), thereby likely causing a greater
plausibility effect when the island conditions are collapsed to lump the two plausibility
conditions together. Consequently, different degrees of the plausibility effect between the two
WM subgroups appeared to lead to a significant interaction of plausibility and WM.
The analysis of the first-pass regression data at Region1 also displayed a clear plausibility
effect that occurred only in the non-island condition, as reveled by a significant island by
plausibility interaction (p < .001), and a main plausibility effect (p = .001) as a result of the
increased regressions in the [non-island, plausible] condition. The 3-way interaction (p = .002)
found in the analysis is interesting here, as it showed the entirely opposite patterns the two
subgroups displayed on their first-pass RT discussed above. The source of the interaction was
arguably the peak of the higher WM group in the [island, implausible] condition, in that the
probability of making a first-pass regression was increased by about 29 percent (OR = đ 0.2580 )
with one unit increase in the WM span scores (i.e., positive). The observation of the descriptive
statistics showed this trend, in that the mean regression ratio of the higher WM group was about
10 percent higher (M = .37, SD = .14) than that of the lower WMC group (M = .27, SD = .17) in
the [island, implausible] condition. This pattern was reversed in the plausible counterpart, in that

123

it was the lower WM group that showed much higher mean regression ratio (M = .40, SD = .18),
compared to the higher WM group (M = .29, SD = .17). In the non-island condition, both groups
had more regressions in the implausible condition. See Table 15 above. As a result, similar
regression patterns across the island conditions by the higher WM learners (i.e., REGR plausible <
REGR implausible), accompanied with the different regression pattern between the island conditions
(i.e., REGR plausible < REGR implausible in the non-island, but REGR plausible > REGR implausible in the
island condition) by the lower WM learners appeared to be the likely source of the 3-way
interaction.
Adult ESL

The analysis of the adult ESL learner data showed significant WM-related

effects for two measures at Region1, first fixation duration and Total RT, and one measure, firstpass regression at Region2. First, for first fixation duration at Region1, there was a main effect of
WM, F (1, 25) = 4.333, p = .048, Îˇp2 = .148. However, given that there was no WM-related
interaction in the analysis, and also considering small values and ranges of odds ratios across the
experimental conditions (ranging from .930 to .975), the main WM effect seemed to be a
reflection of relatively shorter fixation duration by the higher WM adult learners across the
experimental conditions. The results on the other factors remained almost intact compared to the
ANOVA analysis performed earlier, with no interaction of island and plausibility (p = .951).
At the same region (Region1), the analysis of their Total RT showed a significant island
by WM interaction, F (1, 25) = 5.159, p = .032, Îˇp2 = .171. The parameter estimates for the WM
span scores were examined first. It showed that the relationships between the WM span scores
and Total RT yielded negative relationships for all experimental conditions, except in the [island,
implausible] condition that had a positive relationship: non-island: Î˛Ěplausible = .872; Î˛Ěimplausible
= .903; island:Î˛Ěplausible = .976; Î˛Ěplausible = 1.061. This suggests that the RTs of the lower WM adult

124

learners in the non-island conditions would likely be longer than those of the higher WM adult
learners at the least. As before, the adult ESL group was divided into two WM subgroupsâ
higher WM (n = 14), lower WM (n = 13)âto supplement the interpretation of the interaction.
The descriptive statistics showed that whereas both groups exhibited a clear plausibility effect in
the non-island condition, [higher WM: M plausible = 836ms; M implausible = 1435ms; lower WM: M
plausible

= 1243ms; M implausible = 1844ms], the RTs of the lower WM group in the [non-island,

plausible] condition were particularly high, thus likely resulting in their overall reading time in
the non-island condition being relatively longer than their overall reading time in the island
condition when the two plausibility conditions were lumped [M non-island = 2987; M island = 2743].
On the other hand, the higher WM group appeared to have spent relatively more time in the
island condition, [M non-island: M = 2271ms; M implausible = 2767ms], presumably to deal with the
structurally more complex part of the sentences at this point, while they were generally faster and
more efficient in performing filler-gap processing in the non-island condition, compared to the
lower WM group. In sum, the significant interaction of island and WM by the adult ESL group
thus could be attributed to the lower WM adult learnersâ heavier processing difficulties in
dealing with implausible interpretation in the [non-island, implausible] condition, which
appeared to have been extended until the later stages of processing as measured by Total RT.
Lastly, the adult ESL group also showed another significant interaction of island and WM
for first-pass regression at Region2, F (1, 748) = 5.814, p = .016. The ORs for the WM span
scores across the experimental conditions showed that the relationship between the WM span
scores and the outcome variables (i.e., REGR) turned to the opposite way between the non-island
and island condition. Specifically, there was a negative relationship in the non-island condition
(OR plausible = .87, and OR implausible = .79), signaling decreases in probabilities of making a first-

125

pass regression (about 13% and 21% respectively) as a function of one unit increase in the WM
span scores. In contrast, the odds of making a first-pass regression was increased by about 29
(OR plausible = 1.29) and 37 (OR implausible = 1.37) percent with one unit increase in the WM span
scores in the island condition. This trend was also reflected in the descriptive statistics. First, in
the island condition, whereas both groups did not show much differences between the two
plausibility conditions, overall first-pass regression ratios by the higher WM group (M plausible
= .26; M implausible = .27) were higher than those of the lower WM group (M plausible = .18; M
implausible

= .17), conforming to the positive ORs above. At least partly because of such higher

regressions by the higher WM group in the island condition, their overall mean regression ratios
in the non-island condition (M plausible = .12, and M implausible =.33) were lower than their own
regression ratios in the island condition, despite a peak in regressions in the [non-island,
implausible] condition (i.e., REGR non-island < REGR island). In contrast, the lower WM group
appeared to have higher regression ratios in the non-island (M plausible = .14, and M implausible = .47)
than in the island condition. As was the case in their Total RT above, it could be interpreted as
suggesting that the lower WM adult ESL learners had more processing difficulties in dealing
with semantic anomalies in the non-island condition.
For the higher WM adult ESL learners, it is interesting to find that they made relatively
more regressions than the lower WM adult learners in the island conditions at this spillover
region. In fact, however, such regression patterns that the higher WM adult learners displayed
between the two island conditions (i.e., REGR non-island < REGR island) are more similar to the
reading behaviors that the NS English group showed in the island condition at this region, as
they were measured by their first-pass regression, regression path duration, and Total RT at this
region (see Table 8). One possible explanation would be that more regressions in the island

126

condition by the higher WM adult learners may be a reflection of their more active filler-gap
processing at an earlier point of reading than the lower WM adult learners, in an attempt to
construct a dependency between the second filler (i.e., journalist) and the verb wrote inside the
island as early as possible, compared to the lower WM counterparts. That is, when the parser
encounters the relative pronoun who in the island condition, which signals the opening of another
relative clause, the parser must identify another filler to carry (i.e., the journalistj co-indexed with
whoj) in addition to the filler that the WM already holds (i.e., the booki/the cityi). In the
subsequent processing at wrote, the parser would need to attempt to link the journalistj (not the
booki or the cityi) with its subcategorizing verb wrote, forming a filler-gap dependency which is
licit (i.e., the journalist as the subject of the embedded relative clause). This is a highly complex
syntactic computation especially when taking into account the fact that there is still another filler
that has not been resolved yet at this point (i.e., the booki/the cityi). This is assumedly why the
native English speakers spent more time and made more regressions reading sentences in the
island condition even at this spillover region.
At Region2, the higher WM adult learners indeed appeared to spend more time in the
island condition than the lower WM adult learners at this region. Although there was only a
marginally significant interaction of WM and island constraints, the reading patterns in
regression path duration (p = .069: Î˛Ěplausible = 1.07; Î˛Ěplausible = 1.04) and Total RT (p = .072:
Î˛Ěplausible = 1.08; Î˛Ěplausible = 1.04) were shown to be same as those in the first-pass regression
reported above. What is interesting here is that the direction of the relationship between WM
span scores and reading times in the non-island condition was negative for those measures,
meaning that the higher WM adult learnersâ reading times in the non-island condition were faster
than the lower WM adult learners, similarly to the first-pass regression result discussed above:

127

regression path duration (Î˛Ěplausible = .92; Î˛Ěplausible = .92); Total RT (Î˛Ěplausible = 1.07; Î˛Ěplausible =
1.04). Considering these reading patterns in the two island conditions from multiple measures
together, it might be reasonable to assume that the higher WM adult learners initiated complex
structure building at a relatively earlier point during reading compared to the lower WM adult
learners. It should be also noted that more regressions by the higher WM adult learners do not
appeared to be the consequence of illicit filler-gap formations (i.e., the first filler the booki or the
cityi as the object of wrote), given that they did not display any mismatched plausibility effect.
This was the case for the lower WM adult learners as well.
4.4.2. The effect of WMC at the ultimate gap at Region3 and spillover Region4
As discussed earlier, Region3 contains the canonical position of the filler, and is the place
where a plausibility effect is expected only in the non-island condition, but in a reversed
direction to the plausibility effect that was found to be present at Region1. The reason behind
this expectation was that an initial filler-gap analysis that results in a plausible interpretation
become more challenging for the parser to withdraw it for reanalysis, potentially resulting in
longer RTs and more regressions (i.e. RT plausible > RT implausible). With this in mind, the results of
the repeated measures ANCOVAs and logistic regression analyses at Region3 and the following
Region 4 for each group are provided in Table 18.

128

Table 18. Summary of the WM effect analyses at Region3 and Region4
NS English
Region3

Early ESL

Region4

Region3

Adult ESL

Region4

Region3

Region4

f

p

f

p

f

p

f

p

f

p

f

p

I

3.972

.059

.003

.959

1.616

.220

.028

.869

9.481

.005

7.048

.014

P

9.521

.005

5.513

.028

5.612

.029

1.456

.243

6.143

.020

.959

.337

first

IxP

28.663

.001*

3.609

.071

9.164

.007

.317

.580

.093

.763

14.042

.001

fixation

WM

.827

.373

.033

.859

1.155

.297

.274

.607

.042

.839

1.142

.295

duration

I x WM

.056

.815

.265

.612

1.138

.300

1.990

.175

1.865

.184

1.183

.287

P x WM

1.180

.289

.010

.921

.339

.568

.260

.616

.875

.359

.014

.907

I x P x WM

2.937

.101

.385

.542

.862

.366

.287

.599

6.392

.018

3.422

.076

I

21.070

.001*

4.046

.057

44.026

.001*

.001

.992

38.330

.000

.090

.767

P

1.085

.309

.012

.914

.362

.555

.013

.910

.323

.575

.693

.413

IxP

10.670

.004

6.416

.019

3.483

.078

.330

.573

.252

.620

4.336

.048

WM

.899

.353

.052

.822

1.382

.255

.251

.623

1.374

.252

.673

.420

I x WM

.010

.921

.768

.390

3.254

.088

.333

.571

.181

.674

.032

.860

P x WM

1.200

.285

.686

.417

.733

.403

1.288

.271

3.532

.072

1.247

.275

I x P x WM

1.592

.220

.394

.536

.393

.539

.148

.705

1.324

.261

.074

.788

I

50.905

.001*

6.384

.019

12.848

.002

.099

.757

57.671

.000

.704

.410

P

1.171

.204

2.107

.084

7.415

.014

12.847

.002

1.181

.288

13.894

.001

regression

IxP

21.814

.001*

3.925

.060

16.985

.001

7.909

.012

4.321

.048

7.114

.013

path

WM

.410

.529

.210

.681

2.011

.173

.287

.599

1.430

.243

1.384

.250

duration

I x WM

.971

.335

.884

.357

2.242

.152

.829

.374

.106

.747

1.946

.175

P x WM

1.085

.309

2.034

.097

.050

.826

.944

.344

.267

.610

.561

.461

I x P x WM

.662

.425

1.814

.144

.618

.442

.998

.331

.872

.197

.775

.387

first-pass
RT

129

Table 18 (contâd)
NS English
Region3

Total
RT

regression

Region4

Region3

Adult ESL
Region4

Region3

Region4

f

p

f

p

f

p

f

p

f

p

f

P

I

139.274

.001*

11.477

.003

11.561

.003

.045

.834

1.865

.184

1.051

.315

P

8.988

.007

1.171

.291

11.435

.003

4.767

.042

14.626

.001

24.418

.001*

IxP

5.669

.026

.515

.480

14.652

.001

4.618

.046

4.492

.044

16.698

.001*

WM

.659

.426

.817

.376

3.382

.082

.575

.458

.103

.751

6.308

.019

I x WM

1.285

.269

2.150

.157

.090

.767

.683

.420

.207

.653

1.693

.205

P x WM

5.810

.025

.064

.803

.585

.454

.010

.921

.012

.913

1.489

.234

I x P x WM

1.755

.199

.626

.437

1.431

.247

.117

.736

.069

.795

1.939

.176

*

1.047

.306

I

first-pass

Early ESL

*

.001
.042

4.883

.027

.064

.800

.238

.626

12.652

.001

P

23.089
4.326

5.298

.022

12.014

.001

8.767

.003

.612

.434

18.118

.001*

IxP

13.019

.001*

7.445

.007

3.502

.032

5.737

.017

.035

.852

3.752

.053

WM

.125

.724

1.078

.300

2.105

.147

.781

.377

.633

.426

4.130

.042

I x WM

1.071

.301

.011

.915

.001

.995

1.623

.203

.127

.722

.544

.461

P x WM

.641

.510

1.968

.161

2.339

.127

.744

.389

.007

.935

3.371

.078

I x P x WM

.649

.421

1.698

.193

.368

.544

.421

.516

8.171

.004

.244

.621

Note. I = island constraints factor, P = plausibility factor, WM = WM covariate, // .001* = p < .001.

130

NS English

The results of the NS English group showed a WM-related interaction

only on one measure at the critical region (Region3), and there was no WM-related effect in the
spillover region (Region4). At Region3, there was a significant interaction between plausibility
and WM for Total RT, F (1, 22) = 5.820, p = .025. To better interpret this interaction, the
parameter estimates (Î˛Ě coefficient) for the WM span scores were examined first. There was a
trend that higher WM native speakers tended to read faster than lower WM counterparts in both
[non-island, plausible: Î˛Ě = .83] and [non-island, implausible: Î˛Ě = .92] conditions, but the
estimated degree of a decrease was larger in the plausible than in the implausible condition
(about 17% vs. 8%). On the other hand, there was a positive relationship in both [island,
plausible: Î˛Ě = 1.01] and [island, implausible: Î˛Ě = 1.01] conditions. The descriptive statistics
confirmed this trend in the non-island condition, in that the higher WM participantsâ RTs (M
plausible

= 830ms, M implausible = 763) were about 300ms faster than those of the lower WM

participants (M plausible = 1129, M implausible: M = 793) in the [non-island, plausible] condition, with
only about 30ms difference in the implausible counterpart. On the other hand, although the
parameter estimates for WM span in the island condition, albeit very marginal (1%), was positive
(i.e., increases in reading time with increases in WM span), the mean RTs of the two subgroups
showed that the higher WM group was slightly faster in the plausible condition, [higher WM: M
plausible

= 1661ms, lower WM: M plausible: 1812], displaying some discrepancies with the

information from the parameter estimates. In the [island, implausible] condition, the mean Total
RT of the two groups showed a marginal difference, [higher WM: M implausible = 1748, lower
WM: M implausible = 1713]. Thus, it appeared that the primary source of the interaction was longer
plausible reading times of the lower WM group in the both island conditions, particularly in the
[non-island, plausible] condition. This might be taken to suggest that while both the higher and

131

lower WM native speakers experienced more processing difficulties in revising their initially
computed plausible dependencies, and the recovery from the misanalysis took longer for those
with the lower WMC until later stages of processing.
Early ESL

The results of the early ESL group at Region3 and Region4 did not show

any significant main effect of WM and WM-related interactions, implying that different WMCs
among the early ESL learners did not have much influence on the way they processed the target
sentences at these regions. When comparing their results for the other non-WM-related factors to
the results from the ANOVA analyses reported in the previous sections, it did not display much
changes for both regions, showing a significant interaction of island and plausibility interactions
at Region3 for all dependent measures, except first-pass RT that showed an interaction that was
marginal (p = .078).
Adult ESL

Whereas the NS English and early ESL group did not present much effect of

WM and associated interactions at those two regions, the analyses of the adult ESL learnersâ data
elicited some more WM-related effects. First, the analysis of the first fixation data showed a
significant 3-way interaction (island x plausibility x WM) at Region3, F (1, 25) = 6.392, p
= .018, Îˇp2 = .204. The parameter estimates for the WM span scores revealed that the direction of
the relationship between first fixation duration and the WM span scores were all mixed.
Specifically, in the [non-island, plausible] condition, the parameter estimates for the WM span
scores yielded a positive relationship (Î˛Ě = 1.09) whereas it showed a negative relationship in the
[non-island, implausible] condition (Î˛Ě =.94). The directions of these two were found to be
reversed in the island condition although the degrees of the positive relationships were rather
small (Î˛Ě plausible = .97, Î˛Ě implausible = 1.01). The descriptive statistics of the two subgroups were in
line with this trend. In the [non-island, plausible] condition, the mean first fixation duration of

132

the higher WM group (M = 347ms) was slower than that of the lower WM group (M = 280), but
they were slightly faster in the implausible counterparts (higher WM: M = 279, lower WM: M =
297). In contrast, in the [island, plausible condition], the mean fixation duration of the higher
WM group (M = 323) was faster than that of the lower WM group (M = 369). The mean fixation
duration of the two subgroups in the [island, implausible] condition was very close to one
another, M = 326, and M = 324, respectively for the higher and lower WM group. To summarize,
although the adult ESL group as a whole did not show any significant interaction of plausibility
and island interaction (F < 1, p = .763), it was shown that the higher WM adult learnersâ reading
patterns in the non-island condition (347ms for plausible against 279ms for implausible reading)
conformed to the patterns found in the NS English and early ESL group, suggesting that the adult
ESL learners with higher WMC detected the need for a reanalysis more immediately as soon as
they encountered Region3, compared to their counterparts with lower WMC.
Another 3-way interaction found at the same region (Region3) was on first-pass
regression, F (1, 748) = 8.171, p = .004. The ORs for the WM span scores across the
experimental conditions showed that the relationship between the WM span scores and the
outcome variables were all negative, indicating that a probability of making a first-pass
regression decrease by the amount of OR values (i.e., 1 â OR) with one unit increase in the WM
span scores: non-island: [OR plausible = .939. OR implausible = .695], and island: [OR plausible = .670.
OR implausible = .880]. Thus, the degrees of decrease in probability of making a regression
appeared to be larger in the [non-island, implausible; about .30.5% less] condition, and in the
[island, plausible; about 33% less] condition. The two subgroups indeed displayed the largest
differences in their mean first-pass regression ratios in those two conditions; [non-island,
implausible]: M = .29, and M = .46 for the higher and lower WM group, and [island, plausible]:

133

M = .36, and M = .54 for the higher and lower WM group in that order. Such relatively larger
differences in those two experimental conditions appeared to be the source of the differences in
the reading patterns between the two subgroups. For the higher WM group, their mean
regressions in the [non-island, plausible] condition was slightly higher than its implausible
counterpart (M plausible = .32, M implausible = 29), whereas it was the opposite direction for the lower
WM group (M plausible = .39, M implausible = 46). These patterns between the two groups were
reversed in the island condition; the higher WM group regressed slightly more in the implausible
condition (M plausible = .36, M implausible = .40), while the lower WM group made about 7% more
regressions in the plausible condition (M plausible = .54, M implausible = .47). As a result, it seemed
that at least for the adult ESL learners with higher WMC appeared to show reading behaviors
that were similar to the native English speakers as well as the early ESL learners. Lastly, the
adult group showed a main effect of WM on two measures, Total RT, F (1, 25) = 6.308, p = .019
and first-pass regression, F (1, 748) = 4.130, p = .042, at the spillover region (Region). However,
given that there were no other WM-related interactions associated with these main effects, and
that the results for the other non-WM-related factors (island and plausibility) remain intact, the
effects appeared not to provide any particularly useful information.

134

4.4.3. Summary of the resultsâ the effect of WM
Table 19. Summary of the findingsâ The WM effect
Any WM-related Effect?
Initial gap
Ultimate gap
No

Major Implications
â˘

The Total RT of the NS
English group showed a
significant plausibility by
WM interaction at
Region4, which suggested
that NSs with lower WM
had more spillover effects
in the non-island
condition than those with
higher WM.

â˘

The results of the early
ESL learners displayed
somewhat contradictory
patterns. In the analysis of
first-pass RT at Region1,
it was the lower WM
learners that showed
evidence for illicit gap
postulations in the island
environment. However,
the results on REGR
revealed that this nonnative-like pattern was
shown by the higher WM
learners, thus making it
difficult to interpret the
direction of the WM effect
for this group.

â˘

The adult ESL learners
with higher WMC
presented filler-gap
reanalysis effects from
early stages of processing
at Region3, demonstrating
more native-like reading
patterns.

YES (Region4)
â Total RT

NS English

YES (Region1)
â First-pass RT
â REGR

No

Early ESL

Adult ESL

YES (Region1)
FFD
Total RT

YES (Region3)
FFD
REGR

YES (Region2)
REGR

YES (Region4)
Total RT

135

CHAPTER 5: DISCUSSION
One of the main concerns in the current L2 processing literature has been whether the
types of parsing heuristics and linguistic resources adult L2 learners put to use during online
processing are qualitatively similar or different from those used by native speakers of the target
language. Whereas the current L2 processing literature provides evidence for both qualitative
similarities and differences between L1 and adult L2 processing, Clahsen and Felser (2006a,
2006b, 2006c) suggest through their shallow structure hypothesis (SSH) that the nature of L2
processing by adult L2 learners is fundamentally and qualitatively different from L1 processing.
The SSH claims that the type of syntactic representations the parser computes in adult L2
processing are shallower and hierarchically less detailed for two possible reasons: First, the SSH
characterizes the L2 grammatical representations of adult L2 learners as being incomplete and
divergent from the target language norms. In consequence, the parser fed by such deficient L2
grammar representations is restricted in constructing a sufficiently detailed representation for the
input. Second, the SSH views that even if relevant L2 grammar is somehow available (e.g., in the
offline), adult L2 learners are less likely to be able to utilize it in real time presumably due to
their limited and inefficient L2 processing capacities to rapidly integrate their knowledge into the
parse, even at a highly proficiency level. For these reasons, the L2 parser depends largely on
non-syntactic representations instead, such as lexical-semantic verb-argument information and
pragmatic information, which makes what adult L2 processing fundamentally different from
native processing.
The present study attempted to address these issues to provide more insight into the
nature of adult L2 syntactic processing, by investigating how advanced early and adult ESL
learners deal with a dislocated filler to create an association with its ultimate gap for

136

comprehension, whether they are able to make use of island constraints in a timely manner to
build a structurally detailed grammatical representation for the input, and whether learnersâ
different working memory capacities have effects on their online processing. This chapter
discusses the research findings and their implications in the light of the research questions.

5.1. The effect of age of acquisition
In exploring the role of age of acquisition in L2 processing, the first research question
sought to examine how advanced early and adult ESL learners (A) process island constraints at
the earliest possible gap position (Region1 and Region2), and (B) perform a filler-gap
(re)analysis (Region3 and Region4), as they are compared to native English speaker controls.
This section discusses the findings obtained from the analysis at the first critical region
(Region1) and the following spillover region (Region2). The examples of the four experimental
conditions illustrated in (27) ~ (30) are repeated in (31) and (32) below for readersâ convenience.

(27)

[non-island, plausible âthe bookâ & implausible âthe cityâ]

The booki / The cityi that the journalist
[Region3]

(28)

about ti was

[Region4]

[Region1]

wrote ti

[Region2]

fairly regularly

named for an explorer.

[island, plausible âthe bookâ & implausible âthe cityâ]
The booki / The cityi that the journalist who [Region1] wrote
[Region3]

mentioned ti was

[Region4]

[Region2]

fairly regularly

named for an explorer.

Use of active filler strategy & online application of island constraints Recall that Region1 is
the linearly closest gap site for the filler in the non-island condition, but not in the island

137

condition. In the non-island condition, the parser may postulate a gap at this region in accordance
with the active filler strategy, which subsequently should create either plausible or implausible
interpretations of the sentences by the experiment design. An implausible interpretation would
likely render more processing burden to the parser during the syntactic and semantic integration
processes, as measured by increased reading times and more regressions in the implausible than
in the plausible condition, thus resulting in a plausibility effect. However, such plausibility effect
may not occur in the island condition because there is no structurally posited gap in the
grammatical representation; but this should be so only if the parser makes use of full-fledged
knowledge of island constraints.
Bearing this in mind, the results at Region1 suggest that all three groups exhibited, by
and large, fairly similar reading behaviors across the experimental conditions at these regions.
The results on first-pass RT, first-pass regression, regression path duration, and Total RT
revealed that all three groups showed a reliable interaction of plausibility and island constraints,
largely due to the plausibility effects that were captured only in the non-island condition.
Crucially, this interaction was not modulated by the group factor for those measures, which
could be taken as suggesting that both the NS English and the two ESL groups successfully
blocked illicit filler-gap formations inside the island constructions, arguably by virtue of
integrating the appropriate grammatical constraints into the parse from early stages of
processing.
In comparing the two learner groups, the early ESL learners appeared to have a certain
degree of speed advantage compared to the adult ESL learners. However, apparently both groups
showed a plausibility effect only in the non-island across those four measures. The only
exception came from the result on first fixation duration, which showed a significant interaction

138

of plausibility and island that was modulated by the group factor in the by-subject analysis, and
marginally in the by-item analysis (p1 = .029, p2 = .097). This was due to the fact that the adult
ESL group read implausible sentences relatively more slowly than plausible sentences in both
island conditions, although the extent to which the reading times slowed down in reading
implausible sentences was relatively smaller in the island condition. The follow-up analysis
showed that that it was the adult ESL group that did not have a significant plausibility and island
interaction, whereas the other two groups showed an expected reliable interaction. One plausible
explanation for such different reading patterns between the adult ESL and the other two groups
would be that the adult ESL learnersâ application of island constraints might have been slightly
delayed during the very early stage of processing, thus momentarily experiencing a mild
plausibility effect in the island condition. However, when considering the results from the
subsequently following early measures (first-pass RT and first-pass regression) at this region, it
was shown that the adult ESL learners nevertheless made use of the relevant grammatical
constraints fairly early, although this might not have been as immediate and efficient as the early
ESL learners and the native English speakers.
The results at the following spillover region (Region2) showed somewhat similar, but
different reading behaviors among the groups, especially with respect to the degrees of a
spillover effect found in the non-island condition. For the native English learners, they did show
a plausibility effect only on their first fixation durations in the non-island condition, which was
modulated by the island constraints factor, as suggested in the results from the preliminary
analysis. This result could be interpreted as a spillover effect carried over from Region1. The
results showed, however, neither a reliable plausibility effect, nor an interaction of the two
factors in the other subsequent measures, except in the by-subject analysis on regression path

139

duration (p = .046). These results suggest that the native English speakers rapidly overcame the
processing difficulties derived from implausible interpretations in the non-island condition at
Region1, thus exhibiting no plausibility effect in both island conditions.
On the other hand, the results from the early and adult ESL groups suggest that both
groups had rather clearer spillover effects from an early to later stage of processing. For the early
ESL group, a significant interaction of plausibility and island constraints was found in their first
fixation duration, regression path duration, and marginally on first-pass regression (p = .077).
Again, the apparent source of those significant interactions was on the significant plausibility
effects that took place only in the non-island condition, with longer reading times and more
regressions in reading implausible sentences. In contrast, their reading patterns between the two
plausibility conditions were comparable to each another in the island condition, as revealed by
the subsequent paired t-tests.
The adult ESL learners did not differ much from the early ESL learners in this regard.
The adult learners also exhibited clear spillover effects in the non-island condition, reflecting
extended processing difficulties in dealing with sematic anomalies, as measured by first fixation
duration, regression path duration, and first-pass regression. However, their reading patterns in
the island condition showed no such plausibility effect, contributing to a reliable interaction of
plausibility and island constraints for those measures.
Taken together, the results of the analysis of the participantsâ eye-movement data at
Region1 and Region2 showed that all three groups actively sought to fill the gap in the nonisland condition by postulating a grammatically licit gap at the earliest possible position, which
is, by and large, in line with the findings from a number of previous studies (e.g., Cunnings et al.,
2010; Juffs, 2005; Kim et al., 2015, Omaki & Schulz, 2011, Traxler & Pickering, 1996, among

140

others). Keep in mind that although this was not the main focus of the present study (i.e.,
plausibility effect in the non-island condition at the earliest gap), this finding provides a crucial
basis for evaluating the reading patterns in the island condition, specifically in regard to whether
such plausibility effects observed in the non-island condition would disappear by virtue of the
fillerâs application of the island constraints in real time. That is, if the parser makes use of the
active filler strategy (i.e., processing) without much consideration of computing syntactic details
such as island constraints, relying instead lexical-semantic and verb-argument information as the
SSH would predict for L2 processing, then what would follow is that the same plausibility effect
must take place in the island condition as well. However, no groups in the present study showed
such plausibility effect in the island condition at these earliest possible gap sites. Thus, this could
be taken as suggesting that both the early and adult ESL learners successfully deployed the
knowledge of island constraints during their initial processing and blocked the parserâs illegal
gap formations inside the island constructions. This finding is generally in line with the result of
the Chinese [-wh] and German [+wh] ESL learners in Cunnings et al.âs (2010) eye-tracking
reading, German ESL learners in Felser et al.âs (2012) eye-tracking reading. and Spanish [+wh]
ESL learners in Omaki and Schulzâs (2001) and Kim et al.âs (2015) self-paced reading studies,
although it should be noted that some of those studies found significant interaction of plausibility
and island constraints only on later measures, or at the spillover regions.
As discussed, the only exception came from the adult ESL learner group on their first
fixation duration at Region1, which showed the similar reading patterns in both island
conditions, with longer reading times in the implausible than in the plausible condition. This
result may be consistent with the result of Korean ESL learners in Kim et al.âs self-paced reading
experiment to some extent, which found evidence for L2 learnersâ use of island constraints in

141

their stop-making-sense judgment task, but not during their self-paced reading. However, more
finely grained eye-movement data analyzed in the present study showed that such tendencies
were not carried over to the following processes at the same region (e.g., first-pass, first-pass
regression), as well as at the following regionâi.e., no spillover effect in the island condition,
whereas there was a great deal of spillover effects in the non-island condition. There may be a
couple of possible accounts that could explain the different results between the current study and
Kim et al. in terms of adult ESL learnersâ gap creations in the island environment. first, as also
noted by the authors, Kim et al.âs self-paced reading was accompanied with the stop-makingsense judgment task, which might have potentially added more task burden to the adult ESL
learners as they had to do the two different tasks simultaneously (i.e., segment-level reading and
stop-making-sense judgment). This consequently might have affected learnersâ reading
behaviors. On the other hand, in the current study, the participants read sentences in a more
natural way as there was no other task required during reading. Another possibility to consider is
that, adult learnersâ different amount of L2 immersion experience between the two studies might
have resulted in different results between the two studies, Although this is an estimation based on
the information from their article, comparing the participantsâ length of residence, it appears that
overall the adult ESL learners in this study had more exposure to an English-speaking
environment (M = 4.76, .33-20 years), compared to the Korean ESL learners in Kim et al.âs
study (M = 3.6, 1-8 years). When taking into account some empirical evidence that shows a
positive relationship between the amount of exposure and L2 processing, generally more
immersion experience by the adult ESL learners in the current study might have allowed them to
deploy the relevant grammatical information more efficiently (e.g., Dussias & Sagarra, 2007;
Frenck-Mestre, 2002, 2005; Pliatsikas & Marinis, 2013)

142

The effect of filler-gap (re)analysis

Region3 includes the canonical position of the filler

for all experimental condition, but its function is slightly different between the two island
conditions. For the non-island condition, this is the second and ultimate gap site where the parser
must cancel its earlier misanalysis at Region1 and perform an immediate reanalysis as soon as it
identifies the missing of an object of the preposition. This reanalysis can be more challenging for
the parser especially when it needs to give up its initially constructed plausible interpretation
(due to readersâ deeper commitment). A reanalysis in the implausible condition may be relatively
easier, when taking into account the fact that this region may be the very place for the parser to
resolve the mystery of the implausible interpretation it experienced at Region1. Such
mismatched processing difficulties (i.e., reanalysis effect) between the two plausibility
conditions would present a plausibility effect, but in a reverse way to the plausibility effects
observed at Region1, with longer reading times and more regressions in the plausible than in the
implausible condition. On the other hand, Region3 serves as an initial gap site in the island
condition given that there should be no gap postulation at the previous regions. Therefore, the
parser should be free from a reanalysis effect in the island condition, meaning that both plausible
and implausible sentence reading should yield comparable reading patterns.
With this in mind, the results at Region3 showed the expected reading patterns for all
groups, at least when it comes to the late measures, namely regression path duration and Total
RT. As the results from the nonparametric tests and their mean descriptive statistics revealed, all
three groups showed an expected reanalysis effect with longer reading times in reading plausible
than implausible sentences in the non-island condition. Crucially, no group showed a significant
plausibility effect in the island condition.

143

Some group differences were found in the analysis on some of the early measures, on
first fixation duration and first-pass regression in particular. For first fixation duration, it was the
adult ESL group that showed a difference. Whereas both the NS English and the early ESL
group showed a reliable reanalysis effect that was modulated by the factor island constraints,
there was no significant interaction of the two factors for the adult ESL groupâpartly due to the
fact that their reading patterns were similar in both island conditions with slightly longer reading
times in the plausible condition, and also partly due to the lack of significant plausibility effect in
both island conditions, as tested in separated paired t-testsâcompared to the other groups. This
may imply that the adult ESL learners might not have detected the need for a reanalysis as
rapidly and efficiently as the early ESL learners and the native English speakers at this point
(i.e., less sensitive to a gap identification). Perhaps this might have been a similar case in their
first-pass RT as well. Although the preliminary analysis yielded a significant interaction of
plausibility and island constraints which was not modulated by the group factor, the difference
between reading plausible and implausible reading in the non-island condition that the adult ESL
group presented in their descriptive statistics was relatively smaller, compared to the other two
groups. Assuming that adult ESL learnersâ online processing is generally less efficient than the
early ESL learners, or the native English controls at the least, the task of reanalysis would have
been more burdened for the adult learners, meaning that they could have displayed greater
processing difficulties in revising plausible initial misanalysis, just as was the case in their
regression path duration (1761ms and 1030ms, for plausible and implausible reading) and Total
RT (1393ms and 944ms, for plausible and implausible reading). Considering that, it seems
reasonable to interpret that the adult learnersâ reanalysis process might not have started yet
during their first-pass reading.

144

Another piece of evidence that is in line with this interpretation comes from adult ESL
learnersâ first-pass regression patterns. In their first-pass regression, the adult ESL group did not
exhibit much difference across the experimental conditions. Furthermore, in contrast to the other
two groups that exhibited a clear reanalysis effect on their first-pass regressions (approx.10%
more regression in the non-island plausible condition compared to the implausible counterpart),
the adult ESL learners even made slightly more regressions in reading implausible sentences
(approx. 2%), suggesting no clear reanalysis effect.
Finally, the results at the following spillover region (Region4) showed spillover effects
for all groups. For the NS English group, they showed a significant reanalysis effect on first-pass
regression and regression path duration with more regressions and longer reading times in the
plausible than in the implausible condition. This was so only in the non-island condition, as
revealed by a significant plausibility effect interacting with island constraints and the subsequent
pairwise comparisons for both measures. The two ESL groups also showed a clear spillover
effect on those measures with the patterns comparable to the NS English controls. In addition, it
seems that their processing difficulties for reanalysis lasted longer, as their Total RTs also
yielded a significant main plausibility effect that was interacting with island constraints, whereas
the NS English group showed no such effect in this late measure.
Taken together, the results of analysis on the participantsâ eye-movement data at the
ultimate gap Region3 and the following Region4 suggest that the native English speaker controls
and the early ESL learners patterned similarly to one another, in that both groups presented
longer reading times and more regressions in reading plausible than in reading implausible
sentences in the non-island condition across the measures at Region3, suggesting that they
identified a gap and initiated a reanalysis from a fairly early stage of processing at this region,

145

and this was so only in the non-island condition, and no plausibility effect appeared to be present
in the island condition. The reading profiles of the adult ESL learners also suggest that the fillergap reanalysis took place from the critical region (Region3), but this was the case for the two late
measures (regression path duration and Total RT) only. The could be taken to suggest that
although the adult ESL learners were slower in identifying a gap and initiating a reanalysis, they
were eventually able to do so during later stages of processing at the least. Finally, the adult ESL
group, like the other two groups, showed no plausibility (or reanalysis) effect in the island
condition, displaying comparable reading patterns between the two plausibility conditions. This
last result in the island condition corroborates the findings that the adult ESL learners avoided
illicit filler-gap formations at earlier regions in the island condition.
Based on the results discussed above, the findings of the present study provide evidence
for qualitative similarities between L1 and L2 processing, in that both the early and adult ESL
learners demonstrated sensitivity to the relative clause island constraint, as was the case for the
native English controls. As discussed above, island configuration entails abstract and
hierarchically detailed locality constraints that restrict an extraction of the filler from certain
structures such as relative clause (islands) under investigation, which according to the SSH may
not be available to those adult ESL learners for use during online processing. Evidence for nonapplication of island constraints (i.e., shallower processing) would have been a plausibility effect
in both non-island and island conditions, with longer reading time in reading implausible
sentences at Region1, and with longer reading time in reading plausible sentences at Region3.
Contrary to what the SSH would predict, however, the adult ESL learners displayed no such
plausibility effect in the island environment across the board, except in their first fixation
duration at Region1. This demonstrates that the adult ESL learners made use of the relevant

146

knowledge of syntactic constraints to build suitable structural representations of the island
constructions from the early stages of processing, although they might not be as fast and efficient
as the early ESL learners and the native English speakers when deploying such knowledge.
In comparing the performance of the adult ESL learners to that of the early ESL learners,
the early learners appeared to have some processing advantages over the adult learners; their
reading was relatively faster, and their application of the constraints appeared to have occurred
slightly earlier than that of the adult ESL learners that showed no plausibility effect modulated
by island constraints in their first fixation duration at the initial gap site. In addition, the early
ESL learners patterned more similarly to the native English speakers when it came to the
reanalysis processes at the ultimate gap. As noted earlier, both the NS English and early ESL
group showed a clear reanalysis effect at this region from the very early stages of processing,
suggesting that they rapidly identified the structural gap that needs to be filled, and initiated the
reanalysis early. On the other hand, the adult ESL group showed this effect in the late measures
only (regression path duration & Total RT), meaning that their identification of a structurally
posited gap was delayed until later stages of processing. The result of the adult ESL learners on
the filler-gap reanalysis is in line with the results found in previous studies (e.g., Felser et al.,
2012; Williams et al., 2011; Williams, 2006, but cf. Kim et al., 2015). Of those studies, Felser et
al. (2012) claimed that learnersâ weaker sensitivity to structural information during early stages
of processing and availability of relevant grammatical representations (e.g., structural cue, such
as a missing object of the preposition at Region3) only during later stages of processing suggest
that these learners may employ shallow processing because it would work faster for them.
However, delayed processing relative to native speakers should not undermine the fact the adult
ESL learners had the relevant knowledge of filler-gap representations and utilized that

147

knowledge for the filler-gap processing (Juffs & Rodriguez, 2015, see also Dekydtspotter et al.,
2006). In other words, it should be construed as more of quantitative rather than qualitative
difference between L1 and L2 processing (i.e., efficiency rather than representational deficit).
It should be also noted that the early and adult ESL groups in the present study were not
matched in terms of their L2 proficiency and length of residence, in that the proficiency test
score of the early ESL group was significantly higher than that of the adult ESL group; although
their self-rated proficiency on L2 reading and L2 grammar were not statistically different from
one another. The length of residence by the early ESL group (M = 16.44 years) was also
significantly longer than that of the adult ESL group (M = 4.76 years), thus potentially rendering
some advantages to the early ESL group. As a result, relatively less efficient and less immediate
processing performance by the adult ESL group might not necessarily be due to their late ages of
immersion, as the aforementioned factors (proficiency and length of residence) might have
functioned as confounds.
To summarize, the results of the adult ESL learners in the present study do not support
the claims of the SSH that adult L2 processing is fundamentally and qualitatively different from
L1 processing, and the grammatical representations used by adult L2 learners during online
processing are shallower and structurally less detailed. Despite the cross-linguistic difference
between their L1s and the target language, the adult ESL learners (wh-in-situ: L1 Korean &
L1Chinese) in this study showed that they have acquired relevant syntactic representations of
wh-constructions (wh-movement) in the L2 (e.g., White & Juffs, 1998). Although their
processing was not as immediate and efficient as the early learners and the native speakers
(either because of late ages of immersion, relatively shorter length of residence, lower
proficiency, or any combinations of these three factors), the results demonstrated clearly that the

148

adult ESL learners made use of the relevant and detailed structural representations early during
their online reading. The reading patterns that the adult ESL learners showed in the present study
have several methodological and theoretical implications for the discussion of the age-related
effects in L2 processing and learning. First of all. the results of this study are not compatible with
the claims of Johnson and Newport (1989, 1991) and Dekeyser (2000) that suggested gradual
declines in L2 performance and L2 learning ability over age of arrival up until puberty (around
the age of 17), with poorer L2 performance and larger performance variability for adult (or late)
learners whose L2 immersion occurred in the adulthood. Although the present study did not
perform a separate analysis to examine whether there were linear decreases in L2 performance as
a function of age (of arrival) for each ESL group, the adult ESL learners demonstrated reading
behaviors comparable to those of the early ESL learners, which were not different from those of
the native English speaker controls in many respects. One possibility that could account for such
different findings between this study and Johnson and Newport and Dekeyserâs study might be
different research methods. Using the eye-movement monitoring techniques, this study tested
learnersâ knowledge and usage of L2 grammar in a more naturalistic reading setting, and
analyzed various types of eye-tracking measures to track down their processes of reading more in
detail from early to later stages of processing. On the other hand, the two aforementioned studies
used the audio grammaticality judgment tasks. As discussed earlier, however, with the type of
the data the GJT provides, it is difficult to determine on what basis learners come to their
grammaticality judgments. Thus, there is a risk that learnersâ judgments might have been based
on a number of different factors within and across the participants. In addition, the auditory task
might have been more challenging than the reading task for adult L2 learners in their study (cf.
Johnson, 1992). Furthermore, given the fact that the mean ages of the early and late learner

149

groups in Dekeyserâs study were quite high (43.2 and 60.00 years old, respectively), with the
oldest age of the participant at the time testing was 81 years old, the auditory task might have
been difficult especially for some of the elderly participants.
Another implication of the findings of the present study may be related to the adult ESL
learnersâ high levels of education and/or extended period of schooling in the L2 environment. As
discussed earlier, Birdsong (2014) found from his analysis of Dekeyserâs (2000) data that these
two factors (i.e., years of schooling and levels of education) were strongly correlated with
learnersâ grammatical proficiency. Similarly, Hakuta et al. (2003) also suggested the role of
formal education in L2 learning. The authors analyzed the U.S. Census data of nearly 2.3 million
immigrants with L1 Spanish or L1 Chinese backgrounds, and found that the amount of formal
education was a one of the crucial factors in predicting how well those immigrants learn the L2.
In this respect, all participants in the present study were at least college graduates, with 24
graduate students (18 doctoral and 6 MA students), and one MA graduate and 3 participants with
a Ph.D. degree at the time of their participation, all of whom have had higher and/or graduate
education in the United States. It is not entirely clear how exactly years/levels of education affect
the development of L2 grammar and L2 processing. Note, however, that the structurally complex
relative clause constructions tested in the current study are more frequently provided in written
texts. When taking this into account, one plausible account might be that perhaps these adult
ESL participants might have had more extensive reading experience through their studies and
careers, which led them to have sufficient processing experiences with more written input,
including long-distance filler-gap dependency and relative clause constructions. As a result, the
adult ESL learners in this study likely had more opportunities to develop their knowledge of the

150

target language grammar in question, as well as the parsing abilities to make use of the acquired
knowledge in real time effectively.
A theoretical implication, but related to the discussion above, is that the amount, type
(written/spoken), and quality of L2 (classroom/naturalistic) input may play a crucial role in the
development of L2 grammar and processing, especially for adult L2 learners. Thus, although a
general consensus is that adult L2 learners are less efficient and less consistent than early L2
learners, it does not necessarily mean that they (adult L2 learners) are restricted in their
acquisition of L2 grammar and processing heuristics. With more exposure to the naturalistic and
relevant L2 input and more processing practices, it may be possible for adult L2 learners to
develop their knowledge of L2 grammars and parsing mechanisms.
The last implication, which is both theoretical and methodological, is that L2 learnersâ
exposure to the target language input prior to their L2 immersion may need to be treated with
more caution when investigating age-related effects in L2 acquisition and processing. As
reported earlier, all adult ESL learners in the present study had received formal English
instruction from their home country at early ages prior to their arrival (mean age = 11.61, SD =
1.59). Although there has been a tendency in recent L2 critical period and processing research to
consider age of arrival as a more important factor in determining status of L2 learners (i.e., early
versus late/adult learners), the adult learnersâ L2 experience at early ages might also influence
the way they learn and process the target language structures in the long term, at least to a certain
degree, especially when the target language in question is more commonly taught and used
second/foreign languages such as English, the input of which is not only accessible from their
classroom settings, but also available from outside of their classroom as well especially in these
days (e.g., internet and TV shows). Considering this possibility, then it is an empirical question

151

whether adult L2 learners whose exposure to the target language before puberty is close to zero
can develop their L2 grammar and processing skills, through rich and quality L2 input, to a
degree comparable to the early L2 learners or native speakers of the target language.

5.2. The role of working memory in L2 processing of island constraints
The second research question sought to investigate the potential role of individual
differences in working memory capacity (WMC) on L2 processing of island constraints.
Specifically, the main interests were in how different WMC of individual learners affects (2.1.)
the way they respond to the plausibility manipulation at Region1, and (2.2.) the way they apply
the relevant grammatical representation of island constraints at Region1 and Region3, and (2.3.)
the way they perform a filler-gap reanalysis at Region3.
The analysis of the WM data together with the eye-tracking measures of the three groups
brought some interesting results, especially for the adult ESL learners. For the native English
speakers, the results yielded neither a significant WM effect nor a WM-related interaction for
any measure across the regions, except one. As a result, even after the WM effect has been
controlled in the model, the other non-WM related effects appeared to have remained for most of
the time across the measures, compared to the results from the ANOVA analyses reported
earlier, suggesting a non-significant role of WM on online processing of island constraints for
this group. The only exception that yielded a significant WM-related effect was on their Total
RT at Region3, where a reanalysis effect was expected (e.g., about was). The result yielded a
significant interaction of plausibility and WM (p = .025). The follow-up analysis found that
those with lower WMC native English speakers had more processing difficulties reading
plausible sentences in the non-island condition, reflecting a greater reanalysis effect, compared to

152

those with higher WMC. When considering that the measure was Total RT, it could be taken to
interpret that the lower WMC native speakers might have needed extended period of time to
recover from initial misanalysis until the later stages of processing at this region. However, given
the fact that this was the only result that showed a WM-related effect among the measures, this
result must be taken with caution.
The results of the early ESL group also provided non-significant WM effects for most
measures across the regions. The results of the other non-WM related factors also did not show
notable changes, compared to their results from the ANOVA analyses. There were only two
cases that showed a significant WM-related effects for this group, which were first-pass RT at
Region1 that yielded a significant 2-way interaction between plausibility and WM (p = .048),
and a significant 3-way interaction between WM and the other two factors (p = .002) at the same
region. For first-pass RT, it was identified that the cause of the significant interaction was the
lower WM early learnersâ slower reading with implausible sentences in both island conditions,
whereas the higher WM learners showed this trend only in the non-island condition. Given the
fact that Region1 is the critical point that tests whether the parser makes use of the island
constraints and thus avoids a gap postulation inside the relative island structure, the pattern that
the lower WM early learners presented in the island condition might be construed as a result of
ungrammatical gap creation at the verb (i.e., who wrote*___) in the island condition during
initial processes, as was the case for the adult ESL learners in their first fixation duration at the
same region. However, a problem of coming to such a plausible interpretation came when
encountering a result of the 3-way interaction on their first-pass regression at the same region.
That is, the analysis of this interaction revealed that the regression patterns of the two WM
subgroups in the island condition were in the opposite direction they showed on first-pass RT

153

discussed above. Specifically, it was the higher WM learners that had more regressions in
reading implausible sentences in both island condition. When taking into account these two
conflicting findings in the lack of additional information from other measures (e.g., WM-related
effects), it seems difficult to determine at this point how exactly early ESL learnersâ different
WMCs played a role on their processing of island constraints.
The results of the adult ESL group yielded some interesting findings. At Region1 and
Region2, the lower WM adult learners showed a tendency to read more slowly and make more
regressions in the non-island than in the island condition, despite the fact that the sentences in the
island condition were structurally more complex to parse. This was largely attributable to the
greater magnitude of the plausibility effect that reflects their struggles in dealing with
implausible interpretations in the non-island condition, compared to the higher WM adult
learners. In contrast, the higher WM adult learners showed a trend that they spent more time and
made more regressions than the lower WM adult learners in the island condition at Region2, as
measured by first-pass regression, regression path duration, and Total RT, and this appeared to
be conforming to the reading patterns of the NS English group. Interestingly, this trend was
found to be reversed in the following region (Region3) between the higher and lower WM adult
learners. That is, at Region3, it was the lower WM adult learners that spent more time and made
more regressions than the higher WM adult learners in the island condition. Taken together,
these results could be interpreted to suggest that the higher WM adult learners might have
initiated processing of structurally complex embedded relative clauses earlier than the lower WM
adult learners, in order to resolve the filler-gap dependency between the second filler (e.g., the
journalist) and the embedded relative clause verb (e.g., wrote). On the other hand, the lower WM
adult ESL learnersâ increased reading time and more frequent regressions in the island condition

154

at Region3 suggest that their construction of the embedded relative clause was initiated with
more delays.
Another intriguing finding with respect to the role of WM came from Region3, which
showed a significant 3-way interaction (WM x plausibility x island constraints) in two measures,
first fixation duration and first-pass regression. The analysis of those interactions found that
while the adult ESL learners as a whole group showed evidence for filler-gap reanalysis only
from later stages of processing (regression path duration and Total RT), the higher WM adult
learners were shown to display reading patterns that were similar to those of the early ESL
learners and the native English speakers on those two early measures. First, for first fixation
duration, the results showed that the higher WM adult learners spent more time in reading
plausible than implausible sentences in the non-island condition (i.e., reanalysis effect), whereas
the lower WM adult learners showed the opposite reading pattern, spending more time in reading
implausible sentences. In the island condition, both WM subgroups showed comparable reading
patterns in reading plausible and implausible sentences with no sign of a plausibility effect. The
result on the first-pass regression showed a similar pattern, in that the higher WM adult learners
made more regressions while reading plausible sentences in the non-island condition (i.e.,
reanalysis effect), whereas the lower WM adult learners showed the opposite pattern with more
regressions in reading implausible sentences. These results may be taken to suggest that the
higher WM adult learners were more sensitive in identifying a structurally posited gap than the
lower WM adult learners at the ultimate gap position, thus initiating filler-gap reanalysis from
earlier stages of processing (see Dussias and PiĂąar, 2010 for a similar result).
To summarize, the results showed some evidence that individual differences in working
memory capacity do influence the way adult ESL learners process filler-gap dependency

155

constructions. First, lower WM adult learners did not have a disadvantage of making use of the
syntactic information during online processing, compared to the lower WM adult learners, in that
the adult ESL learners did not attempt to postulate a gap in an island environment, as shown by
significant interaction of plausibility and island that is not modulated WM. Second, the effect of
WMC were found in the non-island condition, in that the lower WM adult learners tended to
carry more processing difficulties in dealing with implausible interpretations. This might have
drained their relatively limited cognitive resources, consequently making them less sensitive to
the gap identification at the ultimate gap during early stages of processing. On the other hand, the
higher WM adult learners showed relatively faster recoveries from the implausible
interpretations, and demonstrated reading behaviors that were conforming to those of the early
ESL learners and the native English speaker controls at the ultimate gap, by presenting evidence
that they were sensitive to the structural cues for the filler-gap reanalysis.
The SSH posits that the fundamental differences between adult L2 and L1 processing
cannot be explained by different cognitive resource capacities such as working memory,
although they confusingly stated in their footnote ââŚ unlike the grammar, parsing is subject to
time constraints and capacity limitationsâŚ Computationally complex sentencesâŚ. tend to be
difficult to process even though they are licensed by the grammarâ (Clahsen & Felser, 2006b,
p.123). However, based on what has been discussed above, the results of the present study do not
fully support the claims of the SSH. First of all, the adult ESL learners did not show any
evidence through their reading behaviors that they failed to utilize the syntactic information in
the first place. Second, the fact that the results of this study yielded some significant WM effects
on adult learnersâ online processing behaviors suggest a possibility that the claims of the SSH
may not be applied to all adult L2 learner population. These need further research however.

156

CHAPTER 6: CONCLUSION
This dissertation mainly aimed to explore the nature of second language (L2) online
processing to provide a better insight into âhowâ and âwhatâ types of parsing mechanisms and
information resources second language learners bring into real time processing to manage the
target language input, and ultimately achieve comprehension. Given that no language can be
acquired or learned without using it, understanding this HOW and WHAT may provide us with
valuable information that will help advance our understanding of how learners develop their
interlanguage grammar system through the input they encounter.
One of the primary issues in the current L2 processing literature is whether adult L2
learners can make use of relevant grammatical information to construct fully detailed and
appropriate structural representations to accommodate incoming target language input, like
native speakers do. In this regard, one position stands that the nature of adult L2 processing is
qualitatively different from L1 processing, mainly due to adult learnersâ L2 grammar that feeds
the parser being âincomplete, divergent, or of a form that makes unsuitable for parsingâ or
because of their limited ability to compute detailed grammatical representations during online
processing even if they have one (Clahsen & Felser, p.118). The current study attempted to test
the validity of these claims by investigating the way advanced early and adult ESL learners deal
with structurally complex relative island constructions during filler-gap processing. Additionally,
learnersâ working memory capacity was also measured to examine whether individual learnersâ
different resource capacity has a role in their online application of grammatical representations.
Overall, the results demonstrated that although there was a slight delay, compared to the
early ESL learners and the native English speakers, the adult ESL learners were sensitive to the
structural cues (e.g., relative pronoun who), and were able to deploy the relevant and

157

hierarchically detailed knowledge of island constraints in a timely manner from the early stages
of processing, thus avoiding postulating an illicit gap inside the island environment. The adult
learnersâ filler-gap reanalysis occurred with delays in that they showed evidence of filler-gap
reanalysis only during later stages of processing. However, the result of the working memory
analysis revealed higher working memory adult learners had reading patterns similar to those of
the early ESL learners and the native speaker controls, showing evidence of filler-gap reanalysis
at earlier stages of processing compared to the lower working memory adult learners.
Based on the results discussed above, online processing of filler-gap dependencies and
the application of the structurally complex island constraints performed by the early and adult
ESL learners in this study were not qualitatively different from the native English speakers. The
adult learners were relatively less efficient and less sensitive in identifying the gap at some point
and during certain stages of processing, but they showed evidence that they had the knowledge
of the target structures under investigation, they made use of the knowledge during their online
reading, and that they managed to comprehend the target sentences accurately.
One of the contributions of the current study is that the use of eye-tracking added more
ecological validity to the reading task, and provided the participants with a more naturalistic
reading environment. Given the complexity of the target sentences used in this study,
implementing other types of methods such as the noncumulative self-paced reading would have
added more task burden to the learners, which may have functioned as a confound. In addition, it
also made it possible to observe readersâ multi-stages of processing of the text during their
reading, and the analysis of various types of time course measures and the movement patterns
provided invaluable information in interpreting the results.

158

The present study also sheds more lights on the role of individual differences in working
memory capacity. In order to gain more accurate working memory-spans of the participants, this
study implemented two different types of automated working memory tests that have been
widely used in a number of different fields of study, and endeavored to administer the test as
accurately as possible. The results of the working memory capacity on the adult L2 brought some
interesting findings as discussed above. However, further research is obviously needed to verify
the findings.

6.1. Limitations and future research
One limitation of this study is that L2 groups included participants from two L1s, which
was done mainly for a practical reason, as it was challenging to recruit enough number of early
proficient adult L2 learners from one L1 background. Although the two languages share the
same linguistic feature that was the main focus of the study, they differ from one another in
many other respects (e.g., word order), some of which could have resulted in different processing
patterns between the two L2s, thereby functioning as a confound in the group results.
Another limitation of the present study is that English proficiency and length of residence
(or L2 exposure) were not matched between the early and adult ESL group, which consequently
made it difficult to examine the role of different ages of immersion, as both L2 proficiency and
degrees of L2 exposure have been found to play role in development of L2 grammar and L2
processing. It would be beneficial for future research to control for these factors in order to better
examine the age-related effects in L2 processing.
As addressed earlier in the introduction, language processing involves a series of multiple
complex linguistic analyses. The present study focuses only on one aspect of those analyses,

159

syntactic processing. However, given the fact that all these analyses occur more or less
concurrently, it is conceivable that one type of linguistic analysis has an influence on another
type of linguistic analysis. In this regard, some recent L2 processing studies report that L2
learnersâ lexical processing could affect their subsequent syntactic processing in one way or
another, depending on lexical frequency of the items used in the test (e.g., Hopp, 2016), or
cognate status (e.g., Miller, 2014). Thus, it will be more beneficial for future research to take
these findings into consideration and further explore how these two areas of processing interact
with each other.

160

APPENDICES

161

Appendix A. Language background questionnaire

1. Background Questionnaire for L2 learners
BACKGROUND QUESTIONNAIRE
(Participant ID:

WM order:

Task Type:

)

âťPlease answer the following questions. If you have any questions that you would prefer not to
answer, you can just leave them blank. All information you provide will be used only for the
research purpose and all data will be kept confidential for your privacy and no information you
provide will be directly related with any of your personal information.
A. Are you right-handed or left-handed?

âĄ LEFT

âĄ RIGHT

B. Are you wearing contact lenses or glasses for the experiment?
âĄ YES (Circle one: Contact lenses Glasses)
âĄ NO
1. Gender: âĄ Male

âĄ Female

2. Age: ____________ years old.
3. Education and/or Current Academic Status
âĄ Freshman âĄ Sophomore âĄ Junior
âĄ Senior âĄ BA (graduated)
âĄ MA student âĄ MA
âĄ Ph.D. student
âĄ Ph.D. âĄ Others ________
4. What is your field of study? _______________________________________
5. What is your first/native language? _________________________________
6. In what country and/or language environment did you have your primary (elementary) and
secondary education?
I had my primary (elementary) education in _____, and classes were taught in _____language.
I had my secondary education in ______, and classes were taught in _____________language.

7. HOW OLD WERE YOUâŚ
when you first began acquiring/learning English? I was _________ years old.

162

7.1.) What was the educational setting for your English learning? Please mark all that apply.
âĄ English classes at school (Your grade ______________ )
âĄ Private tutoring
âĄ English institute
âĄ Others (Please explain what the setting was ______________________ __________)
when you first came/moved to the United States?
âĄ I was ________ years old.
âĄ I was born in the U.S. (if applicable)
8. How long have you been living/studying in the U.S. so far? For __ years ___ months
9. Do you have any other experiences of living in other English-speaking environments/countries
PRIOR TO your current residence in the U.S.? Please exclude your travel experiences unless
they are more than a year.
âĄ No
âĄ Yes (Please provide more details about those living abroad experiences.
For example, I was 2 years old when I moved to Australia and lived there for 3.5 years.
I was ___ years old when I moved to _________(WHERE) and lived there for _______years.
I was ___ years old when I moved to _________(WHERE) and lived there for _______years.
I was ___ years old when I moved to _________(WHERE) and lived there for _______years.
10. Please list all the languages you know, including your mother language and English, from the
most to the least proficient order.
(1) ____________________________________________ [Most proficient]
(2) ____________________________________________
(3) ____________________________________________
(4) ____________________________________________ [Least proficient]
11. Among the languages you listed aboveâŚâŚ
which language do you feel most comfortable with for verbal communication? ___________
which language do you feel most comfortable with for reading? ______________________
which language do you feel most comfortable with for writing? _______________________

12. On a scale from ZERO (Not proficient at all) and TEN (near native-like), how would you
rate your English proficiency in each of the following language skills? Please select one and
V-check in the appropriate box in the table below.

163

0

1

2

3

4

5

6

7

8

9

10

Not
at all

Native
-like

Listening
Speaking
Reading
Writing
Grammar
Overall

13. Please skip this question, if you have come to the U.S. after the age of 16.
On a scale from ZERO (Not proficient at all) and TEN (native-like), how would you rate
your Chinese (For Chinese speakers) or Korean (For Korean speakers) proficiency in each of
the following language skills? Please select one and V-check in the appropriate box in the
table below.
0

1

2

3

4

5

6

7

Not
at all

8

9

10
Native
-like

Listening
Speaking
Reading
Writing
Grammar
Overall
14. Have you taken any form of the standardized English proficiency tests, such as TOEFL,
TOEIC, MTELP, or IELTS?
âĄ No
âĄ Yes
If your answer is YES, please provide what the test was and what the score was for each
test. You do not need to answer, if you would like. The scores you provide will be used for
the research purpose only.
TOEFL (âť Please circle the test format taken: iBT , CBT
TOEIC

MTELP

IELTS

OTHERS (Test name_______)

,

PBT ) Your score _____
Your score ___________

15. Is there any other information you can provide about your language background, or any
comments? If so, please include it here:

164

2. Background Questionnaire for native speakers
BACKGROUND QUESTIONNAIRE
(Participant ID:

WM order:

Task Type:

)

âťPlease answer the following questions. If you have any questions that you would prefer not to
answer, you can just leave them blank. All information you provide will be used only for the
research purpose and all data will be kept confidential for your privacy and no information you
provide will be directly related with any of your personal information.
A. Are you right-handed or left-handed?

LEFT

RIGHT

B. Are you wearing glasses for the Eye-tracking experiment?

YES

C. Are you wearing contact lenses for the Eye-tracking experiment?
1. Gender: âĄ Male

NO

YES

NO

âĄ Female

2. Age: ____________ years old.

3. Education and Current Academic Status
âĄ Freshman

âĄ Sophomore

âĄ MA student âĄ MA

âĄ Junior

âĄ Senior âĄ BA (graduated)

âĄ Ph.D. student

âĄ Ph.D.

4. What is your field of study? _______________________________________

5. Is there any other information you can provide about your language background, or any
comments? If so, please include it here:

165

Appendix B. List of test items in the LexTALE English proficiency measure

[Words in English, n = 40]
ablaze

allied

bewitch

breeding

carbohydrate celestial

censorship

cleanliness

cylinder

dispatch

eloquence

festivity

flaw

fluid

fray

hasty

hurricane

ingenious

lengthy

listless

lofty

majestic

moonlit

muddy

nourishment plaintively

rascal

recipient

savoury

scholar

scornful

screech

shin

slain

stoutly

turmoil

turtle

unkempt

upkeep

wrought

[Nonwords in English, n = 20]
abergy

alberation

crumper

destription

exprate

fellick

interfate

kermshaw

kilp

magrity

mensible

plaudate

proom

pudour

pulsh

purrage

rebondicate

skave

spaunch

quirty

166

Appendix C. Materials for the eye-tracking experiment

[non-island, plausible]
1a. The song that the guitarist wrote so passionately for was loved by the audience.
[non-island, implausible]
1b. The band that the guitarist wrote so passionately for was loved by the audience.
[island, plausible]
1c. The song that the guitarist who wrote so passionately recommended was loved by the audience.
[island, implausible]
1d. The band that the guitarist who wrote so passionately recommended was loved by the audience.
2a. The diary that the historian read very thoroughly about was found near the castle.
2b. The sword that the historian read very thoroughly about was found near the castle.
2c. The diary that the historian who read very thoroughly studied was found near the castle.
2d. The sword that the historian who read very thoroughly studied was found near the castle.
3a. The fish that the chef cooked very uniquely with was introduced in the magazine.
3b. The lady that the chef cooked very uniquely with was introduced in the magazine.
3c. The fish that the chef who cooked very uniquely liked was introduced in the magazine.
3d. The lady that the chef who cooked very uniquely liked was introduced in the magazine.
4a. The captain that the spy killed so fiercely for was exposed to the enemy.
4b. The mission that the spy killed so fiercely for was exposed to the enemy.
4c. The captain that the spy who killed so fiercely assisted was exposed to the enemy.
4d. The mission that the spy who killed so fiercely assisted was exposed to the enemy.
5a. The actor that the designer dressed very elegantly for was praised by the critics.
5b. The opera that the designer dressed very elegantly for was praised by the critics.
5c. The actor that the designer who dressed very elegantly saw was praised by the critics.
5d. The opera that the designer who dressed very elegantly saw was praised by the critics.
6a. The nurse that the doctor texted very urgently about was isolated for further tests.
6b. The virus that the doctor texted very urgently about was isolated for further tests.
6c. The nurse that the doctor who texted very urgently examined was isolated for further tests.
6d. The virus that the doctor who texted very urgently examined was isolated for further tests.
7a. The article that the intern wrote very critically about was reviewed by the editor.
7b. The picture that the intern wrote very critically about was reviewed by the editor.
7c. The article that the intern who wrote very critically sent was reviewed by the editor.
7d. The picture that the intern who wrote very critically sent was reviewed by the editor.
8a. The suspect that the detective questioned very intensively about was sent to the CIA.
8b. The cocaine that the detective questioned very intensively about was sent to the CIA.
8c. The suspect that the detective who questioned very intensively found was sent to the CIA.
8d. The cocaine that the detective who questioned very intensively found was sent to the CIA.

167

9a. The mailman that the lady asked very angrily about was stuck in the warehouse.
9b. The package that the lady asked very angrily about was stuck in the warehouse.
9c. The mailman that the lady who asked very angrily expected was stuck in the warehouse.
9d. The package that the lady who asked very angrily expected was stuck in the warehouse.
10a. The woman that the suspect texted very frequently about was searched by the police.
10b. The house that the suspect texted very frequently about was searched by the police.
10c. The woman that the suspect who texted very frequently mentioned was searched by the police.
10d. The house that the suspect who texted very frequently mentioned was searched by the police.
11a. The dinner that the cook prepared very adeptly for was served with red wine.
11b. The client that the cook prepared very adeptly for was served with red wine.
11c. The dinner that the cook who prepared very adeptly hosted was served with red wine.
11d. The client that the cook who prepared very adeptly hosted was served with red wine.
12a. The concert that the singer performed very actively for was sponsored by the city.
12b. The pianist that the singer performed very actively for was sponsored by the city.
12c. The concert that the singer who performed very actively helped was sponsored by the city.
12d. The pianist that the singer who performed very actively helped was sponsored by the city.
13a. The book that the journalist wrote fairly regularly about was named for the explorer.
13b. The city that the journalist wrote fairly regularly about was named for the explorer.
13c. The book that the journalist who wrote fairly regularly mentioned was named for the explorer.
13d. The city that the journalist who wrote fairly regularly mentioned was named for the explorer.
14a. The lectures that the professor prepared very hard for were evaluated by the program.
14b. The students that the professor prepared very hard for were evaluated by the program.
14c. The lectures that the professor who prepared very hard taught were evaluated by the program.
14d. The students that the professor who prepared very hard taught were evaluated by the program.
15a. The resort that the housekeeper cleaned very diligently for was charged with tax evasion.
15b. The lawyer that the housekeeper cleaned very diligently for was charged with tax evasion.
15c. The resort that the housekeeper who cleaned very diligently sued was charged with tax evasion.
15d. The lawyer that the housekeeper who cleaned very diligently sued was charged with tax evasion.
16a. The scenario that the novelist wrote very frequently about was selected for the filming.
16b. The mountain that the novelist wrote very frequently about was selected for the filming.
16c. The scenario that the novelist who wrote very frequently liked was selected for the filming.
16d. The mountain that the novelist who wrote very frequently liked was selected for the filming.
17a. The school that the architect built very dedicatedly for was headlined in the news.
17b. The artist that the architect built very dedicatedly for was headlined in the news.
17c. The school that the architect who built very dedicatedly supported was headlined in the news.
17d. The artist that the architect who built very dedicatedly supported was headlined in the news.
18a. The golfer that the trainer advised very thoroughly about was taken to the clinic.
18b. The monkey that the trainer advised very thoroughly about was taken to the clinic.
18c. The golfer that the trainer who advised very thoroughly trained was taken to the clinic.
18d. The monkey that the trainer who advised very thoroughly trained was taken to the clinic.

168

19a. The crocodile that the rangers hunted very eagerly for was filmed for the documentary.
19b. The zoologist that the rangers hunted very eagerly for was filmed for the documentary.
19c. The crocodile that the rangers who hunted very eagerly liked was filmed for the documentary.
19d. The zoologist that the rangers who hunted very eagerly liked was filmed for the documentary.
20a. The witness that the lawyer called so hurriedly about was reviewed by the judges.
20b. The verdict that the lawyer called so hurriedly about was reviewed by the judges.
20c. The witness that the lawyer who called so hurriedly questioned was reviewed by the judges.
20d. The verdict that the lawyer who called so hurriedly questioned was reviewed by the judges.
21a. The reporter that the senator phoned very recently about was investigated by the police.
21b. The accident that the senator phoned very recently about was investigated by the police.
21c. The reporter that the senator who phoned very recently blamed was investigated by the police.
21d. The accident that the senator who phoned very recently blamed was investigated by the police.
22a. The bomb that the soldier threw quite forcefully toward was covered with thick moss.
22b. The wall that the soldier threw quite forcefully toward was covered with thick moss.
22c. The bomb that the soldier who threw quite forcefully destroyed was covered with thick moss.
22d. The wall that the soldier who threw quite forcefully destroyed was covered with thick moss.
23a. The proposal that the senator prepared so ambitiously for was tackled by the panel.
23b. The governor that the senator prepared so ambitiously for was tackled by the panel.
23c. The proposal that the senator who prepared so ambitiously supported was tackled by the panel.
23d. The governor that the senator who prepared so ambitiously supported was tackled by the panel.
24a. The hotel that the architect designed so intensely for was targeted by the terrorist.
24b. The queen that the architect designed so intensely for was targeted by the terrorist.
24c. The hotel that the architect who designed so intensely visited was targeted by the terrorist.
24d. The queen that the architect who designed so intensely visited was targeted by the terrorist.
25a. The team that the athlete trained so intensively for was supported by the fans.
25b. The game that the athlete trained so intensively for was supported by the fans.
25c. The team that the athlete who trained so intensively led was supported by the fans.
25d. The game that the athlete who trained so intensively led was supported by the fans.
26a. The document that the lawyer read very thoroughly about was investigated by the FBI.
26b. The accident that the lawyer read very thoroughly about was investigated by the FBI.
26c. The document that the lawyer who read very thoroughly reported was investigated by the FBI.
26d. The accident that the lawyer who read very thoroughly reported was investigated by the FBI.
27a. The engineer that the CEO paid quite generously for was disliked by the investors.
27b. The proposal that the CEO paid quite generously for was disliked by the investors.
27c. The engineer that the CEO who paid quite generously selected was disliked by the investors.
27d. The proposal that the CEO who paid quite generously selected was disliked by the investors.
28a. The musical that the musician composed so devotedly for was awarded the grand prize.
28b. The pianist that the musician composed so devotedly for was awarded the grand prize.
28c. The musical that the musician who composed so devotedly loved was awarded the grand prize.
28d. The pianist that the musician who composed so devotedly loved was awarded the grand prize.

169

REFERENCES

170

REFERENCES

Adger, David (2003). Core syntax: a minimalist approach. Oxford: Oxford University Press.
Aldwayan, S., Fiorentino, R., & Gabriele, A. (2010). Evidence of syntactic constraints in the
processing of wh-movement: A study of Najdi Arabic learners of English. In VanPatten,
B., & Jegerski, J. (Eds.), Research in second language processing and parsing (pp. 6586). Philadelphia, PA: John Benjamins Publishing Company.
Ariji, Kenji, Akira Omaki & Nano Tatsuta. (2003). Working memory restricts the use of
semantic information in ambiguity resolution. In Peter Slezak (ed.) Proceedings of the
4th International Conference on Cognitive Science (pp. 19-25). Sydney, Australia:
University of New South Wales.
Baddeley, A. D. (2003). Working memory and language: An overview. Journal of
Communication Disorders, 36, 189â208.
Baddeley, A., Gathercole, S., & Papagno, C. (1998). The phonological loop as a language
learning device. The Psychological Review, 105, 158-173.
Barrouillet, P., & Lepine, R. (2005). Working memory and childrenâs use of retrieval to solve
addition problems. Journal of Experimental Child Psychology, 91, 183-204.
Belikova, A. & White, L. (2009). Evidence for the Fundamental Difference Hypothesis or not?
Island constraints revisited. Studies in Second Language Acquisition, 31, 199-223.
Bialystok, E. (1997). The structure of age: in search of barriers to second language acquisition.
Second Language Research, 13, 116-137.
Birdsong, D. (1992). Ultimate attainment in second language acquisition. Language, 68, 706755.
Birdsong, D. (1999). Introduction: Whys and Why nots of the critical period hypothesis for
second language acquisition. In Gass, S., & Schachter, J. (Eds), Second language
acquisition and the critical period hypothesis (pp. 1-22). Mahwah, NJ: Lawrence
Erlbaum Associates.
Birdsong, D. (2005). Nativelikeness and non-nativelikeness in L2A research. International
Review of Applied Linguistics (IRAL), 43, 319-328.
Birdsong, D. (2014). The critical period hypothesis for second language acquisition: Tailoring
the coat of many colors. In Pawlak, M., & Aronin, L. (Eds.), Essential topics in applied
linguistics and multilingualism. Studies in honor of David Singleton (pp. 43-50).
Heidelberg: Springer.
Bley-Vroman, R., (1990). The logical problem of second language learning. Linguistic Analysis,
20, 3â49.

171

Bley-Vroman, R., (2009). The evolving context of the fundamental difference hypothesis,
Studies in Second Language Acquisition, 31, 175-198.
Bley-Vroman, R., Felix. S., & G. Ioup. (1988). The accessibility of Universal Grammar in adult
language learning. Second Language Research, 4, 1-32.
Carreiras, M., & Clifton, C. (1999). Another word on parsing relative clauses: Eyetracking
evidence from Spanish and English, Memory and Cognition, 27(5), 826-833.
Chomsky, N. (1973). Conditions on transformation. In S. Anderson & P. Kiparsky (Eds.), A
festschrift for Morris Halle (pp. 232-286). New York, NY: Holt, Rinehart & Winston.
Chomsky, N. (1981). Lectures on government and binding. Dordrecht: Foris.
Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: MIT Press.
Clahsen, H. (2007). Psycholinguistic perspectives on grammatical representations. In
Featherston, S., & S. Wolfgang (Eds.), Roots: Linguistics in its Search of Its Evidence
Base. (pp.97-132). Berlin: Mouton de Gruyter Publishing Company.
Clahsen, H., & Felser, C. (2006a). Grammatical processing in language learners. Applied
Psycholinguistics, 27, 3- 42.
Clahsen, H., & Felser, C. (2006b). Continuity and shallow structures in language processing.
Applied Psycholinguistics, 27, 107- 126.
Clahsen, H., & Felser, C. (2006c). How native-like is non-native language processing? Trends in
Cognitive Sciences, 10, 564- 570.
Clahsen, H., & Muysken, P. (1996). How adult second language learning differs from child first
language development. Behavioral and Brain Sciences, 19, 721-723.
Clifton, C., Staub, A., Rayner K. (2007). Eye movements in reading words and sentences. In van
Gompel, R.P.G., Fischer, M.H., Murray, W.S., Hill R.L. (Eds.), Eye movements: A
window on mind and brain (pp. 341â371). New York: Elsevier.
Conway, R. A., M. J. Kane, M. Bunting, D. Hambrick, O. Wilhelm & R. Engle (2005). Working
memory span tasks: A methodological review and userâs guide. Psychonomic Bulletin &
Review, 12, 769â786.
Cunnings, I., Batterham, C., Felser, C., Clahsen, H. (2010). Constraints on L2 learnersâ
processing of wh-dependencies: Evidence from eye-movements. In VanPatten, B., &
Jegerski, J. (Eds.), Research in second language processing and parsing (pp. 87-112).
Philadelphia, PA: John Benjamins Publishing Company.
Cuetos, F., & Mitchell, D. C. (1988). Cross-linguistic difference in parsing: restrictions on the
late-closure strategy in Spanish. Cognition, 30, 73-105.

172

Cuetos, F., & Mitchell, D. C., & Corely, M. (1996). Parsing in different languages. In M.
Carreiras, J. Garcia-Albea, & N. Sebastien-Galles (Eds.). Language Processing in
Spanish (pp. 145-187). Hillside, NJ: Lawrence Erlbaum Associates.
Daneman, M., & Carpenter, P. (1980). Individual differences in working memory and reading.
Journal of Verbal Learning and Verbal Behavior, 19, 450-466.
Declerck, M., LemhĂśfer, K., & Grainger, J. (in press). Bilingual language interference initiates
error detection: Evidence from language intrusions. Bilingualism: Language and
Cognition. http://dx.doi.org/10.1017/S1366728916000845.
Dekeyser, R. (2000). The robustness of critical period effects in second language acquisition.
Studies in Second Language Acquisition, 22, 49-533.
Dekeyser, R. (2010). Cross-linguistic evidence for the nature of age effects in second language
acquisition. Applied Psycholinguistics, 31, 413-438.
Dekydtspotter, L., & Miller, (2009). Probing for intermediate traces in the processing of longdistance wh-dependencies in English as a second language. In Bowles, M., Ionin, T.,
Montrul, S., & Tremblay, A. (Eds.) Proceedings of the 10th Generative Approaches to
Second Language Acquisition (GASLA, 2009) (pp. 113-124). Somerville, MA: Cascadilla
Proceedings Project.
Dekydtspotter, L., & Miller, (2013). Inhibitive and facilitative priming induced by traces in the
processing of wh-dependencies in a second language. Second Language Research, 29,
345-372.
Dekydtspotter, L., & Renaud, C. (2014). On second language processing and grammatical
development: The parser in second language acquisition. Linguistic Approaches to
Bilingualism., 4, 131-165.
Dekydtspotter, L., Schwartz, & Sprouse, R. (2006). The comparative fallacy in L2 processing
research. Proceedings of the 8th Generative Approaches to Second Language Acquisition
Conference, 33â40.
Dussias, P. E., & Sagarra, N. (2007). The effect of exposure on syntactic parsing in SpanishEnglish bilinguals. Bilingualism: Language and Cognition, 10, 101-116.
Dussias, P.E., & PiĂąar, P. (2010). Effects of reading span and plausibility in the reanalysis of whgaps by Chinese-English second language speakers. Second Language Research, 26, 443472.
Ellis, R. (1991). Grammaticality judgments and second language acquisition. Studies in Second
Language Acquisition, 13, 161-186.
Epstein, S., Flynn, S., & Martohardjono, G. (1996). Second language acquisition: Theoretical
and experimental issues in contemporary research. Behavioral and Brain Sciences, 19,
677-714.

173

Featherston, S. (2001). Empty categories in sentence processing. Amsterdam, the Netherlands:
John Benjamins Publishing Company.
Felix, S., & Weigl, W. (1991). Universal grammar in the classsroom: The effect of formal
instruction on second language acquisition. Second Language Research, 7, 162-180.
Felser, C., Roberts, L., & Marinis, T. (2003). The processing of ambiguous sentences by first and
second language learners of English. Applied Psycholinguistics, 24, 453-489.
Felser, C., Sato, M., & Bertenshaw, M. (2009). The on-line application of binding Principle A in
English as a second language. Bilingualism: Language and Cognition, 12, 485-502.
Felser, C., Cunnings, I., Batterham, C., & Harald Clahsen (2012). The timing of island effects in
nonnative sentence processing. Studies in Second Language Acquisition, 34, 67-98.
Fernandez, E. (1999). Processing strategies in second language acquisition: Some preliminary
results. In E, Klein, & G. Martohardjono (Eds.), The development of second language
grammars: A generative approaches (pp.217-239). Amsterdam and Philadelphia: John
Benjamins Publishing Company.
Fender, M. (2003). English word recognition and word integration skills of native Arabic- and
Japanese-speaking learners of English as a second language. Applied Psycholinguistics,
24, 289-315.
Fender, M. (2008). L1 effects on the emergence of ESL sentence processing skills of Chinese
and Korean ESL learners: A preliminary study. Languages in Contrast, 8, 47-73.
Ferreira, F., Engelhardt, P., & Jones, M. (2009). Good enough language processing: A satisficing
approach. In N. Taatgen, H. Rijn, J. Nerbonne, & L. Schomaker (Eds.), Proceedings of
the 31st Annual conference of the Cognitive Science Society. Austin, TX: Cognitive
Science Society.
Field, A. (2009). Discovering statistics using SPSS (3rd ed.). London, England: Sage.
Fodor, J. D. (1998). Parsing to learn. Journal of Psycholinguistic Research, 22, 339-374.
Frazier, L. (1998). Getting there (slowly). Journal of Psycholinguistic Research, 16, 123â146.
Frazier, L., & Rayner, K. (1982). Making and correcting errors during sentence comprehension:
Eye movements in the analysis of structurally ambiguous sentences. Cognitive
Psychology, 14, 178â210.
Frazier, L. (1987). Processing syntactic structures: Evidence from Dutch. Natural Language and
Linguistic Theory, 5, 519-559.
Frazier, L., & Fodor, J. D. (1978). The sausage machine: A new two-stage parsing model.
Cognition, 6, 291 - 325.

174

Frazier, L., & Clifton, C. (1989). Successive cyclicity in the grammar and the parser. Language
and Cognitive Processes, 4, 93-126.
Frenck-Mestre, C. (1997). Examining second language reading: An on-line look. In A. Sorace,
C. Heycock, & R. Shillcok (Eds.), Proceedings of the GALA1997 Conference on
Language Acquisition (pp. 474-478). Edinburgh, UK: Human Communications Research
Center.
Frenck-Mestre, C. (2002). A on-line look at sentence processing in a second language. In R.
Herrida and J. Altarriba (Eds.), Bilingual Sentence Processing (pp.217-236). North
Holland.
Frenck-Mestre, C. (2005). Eye-movement recording as a tool for studying syntactic processing in
a second language: a review of methodologies and experimental findings. Second
Language Research, 21(2), 175-198.
Frenck-Mestre, C., & Pynte, J. (1997). Syntactic ambiguity resolution while reading in second
and native languages. Quarterly Journal of Experimental Psychology, 50A, 119â148.
Gass, S. M. (1994). The reliability of second-language grammaticality judgments. In Tarone, E.,
Gass, S. M., & Cohen, A. D. (Eds.), Research methodology in second-language
acquisition. Hillsdale, NJ: L. Erlbaum Associates.
Gass, S. M., & Lee, J. (2011). Working memory capacity, inhibitory control, and proficiency in a
second language. In Schmid, M., & Lowie, W. (Eds.), Modeling Bilingualism: From
Structure to Chaos (pp. 59-84). Amsterdam, The Netherlands: John Benjamins
Publishing Company.
Gass, S. M., & Selinker, L. (2008). Second Language Acquisition: An introductory course (3rd
edition). New York, NY: Routledge.
Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68, 176.
Gibson, E., & Hickok, G. (1993). Sentence processing with empty categories. Language and
Cognitive Processes. 8, 147-171.
Gibson, E., Pearlmutter, N., Canseco-Gonzalez, E., & Hickok, G. (1996). Recency preference in
the human sentence processing mechanism. Cognition, 56, 23-59.
Gibson, E. and Warren, T. (2004): Reading-time evidence for intermediate linguistic structure in
long-distance dependencies. Syntax, 7, 55â78.

Ha, J. (2005). Age-related effects on syntactic ambiguity resolution in first and second
languages: Evidence from Korean-English Bilinguals. In Laurent Dekydtspotter et al

175

(Eds), Proceedings of the 7th Generative Approaches to Second Language Acquisition
(pp.111-123). Somerville, MA: Cascadilla Proceedings Project.
Hakuta, K., Bialystok, E., & Wiley, E. (2003). Psychological Science, 14, 31-38.
Harrington, M., & Sawyer, M. (1992). Working memory capacity and L2 reading skill. Studies
in Second Language Acquisition, 14, 25-38.
Havic, E., Roberts, L., van Hout, R., Schreuder, R., & Haverkort, M. (2009). Processing subjectobject ambiguities in the L2: A self-paced reading study with German L2 learners of
Dutch. Language Learning, 59, 73-112.
Hawkins, R. (2001). Second language syntax: A generative introduction. Oxford: Blackwell
Publishers.
Hawkins, R., & Chan, C. (1997). The partial availability of Universal Grammar in second
language acquisition: The âfailed functional features hypothesis.â Second Language
Research, 13, 187-226.
Hawkins, R., & Hattori, H. (2006). Interpretation of English multiple wh-questions by Japanese
speakers: a missing uninterpretable feature account. Second Language Research, 22, 269301.
Herschensohn, J. (2000). The second time around minimalism and L2 acquisition. Philadelphia,
PA: John Benjamins Publishing Company. (AGE EFFECT CHECK)
Hopp, H. (2006). Syntactic features and reanalysis in nearânative processing. Second Language
Research, 22, 369â397.
Hopp, H. (2010). Ultimate attainment in L2 inflection: Performance similarities between nonnative and native speakers. Lingua, 120, 901â931.
Hopp, H. (2014). Working memory effects in the L2 processing of ambiguous relative clauses.
Language Acquisition, 21, 250-278.
Hopp, H. (2016). The timing of lexical and syntactic processes in second language sentence
comprehension. Applied Psycholinguistics, 37, 1253-1280.
Hummel, K. (2009). Aptitude, phonological memory, and second language proficiency in
nonnative adult learners. Applied Psycholinguistics, 30, 225-249.
Jackson, C. N. (2008). Proficiency level and the interaction of lexical and morphosyntactic
information during L2 sentence processing. Language Learning, 58, 875â909.
Jackson, C. N., & Dussias, P. E. (2007). Cross-linguistic differences and their impact on L2
sentence processing. Bilingualism: Language and Cognition, 12, 65-82.

176

Jegerski, J., VanPatten, B., & Keating, G. (2011). Cross-linguistic variation and the acquisition
of pronominal reference in L2 Spanish. Second Language Research, 27, 481-507.
Jiang, N. Selective integration of linguistic knowledge in adult second language learning.
Language Learning, 57, 1-33.
Jiang, N., Novokshanova, E., Masuda, K., & Wang, X. (2011). Morphological congruency and
the acquisition of L2 morphemes. Language Learning, 61, 940-967.
Johnson, J., & Newport, E. (1989). Critical period effects in second language learning: The
influence of maturational state on the acquisition of English as a second language.
Cognitive Psychology, 21, 69-99.
Johnson, J., & Newport, E. (1991). Critical period effects on universal properties of language:
The status of subjacency in the acquisition of second language. Cognition, 39, 215-258.
Johnson, J. (1992). Critical period effects in second language acquisition: The effect of written
versus auditory materials on the assessment of grammatical competence. Language
Learning, 42, 217-248.
Juffs, A. (1998). Some effects of first language argument structure and morphosyntax on second
language sentence processing. Second Language Research, 14, 406-242.
Juffs, A. (2004). Representation, processing and working memory in a second language.
Transactions of the Philological Society, 102, 199â225.
Juffs, A. (2005). The influence of first language on the processing of wh-movement in English as
a second language. Second Language Research, 21, 121-151.
Juffs, A., & Harrington, M. (1995). Parsing effects in second language sentence processing:
subject and object asymmetries in wh-extraction. Studies in Second Language
Acquisition, 17, 483-516.
Juffs, A., & Harrington, M. (1996). Garden path sentences and error data in second language
sentence processing. Language Learning, 46, 283-326.
Juffs, A., & Harrington, M. (2011). Aspects of working memory in L2 learning. Language
Teaching, 44, 137-166.
Juffs, A., & Rodriguez, G. (2015). Second language sentence processing. New York, NY:
Routledge.
Just, M. A., Carpenter, P., & Woolley, J.D. (1982). Paradigms and processes and in reading
comprehension. Journal of Experimental Psychology: General, 3, 228-238.
Kann, E., Ballantyne, J., & Wijnen, F. (2015). Effects of reading speed on second language
sentence processing. Applied Psycholinguistics, 36, 799-830.

177

Keating, G. (2009). Sensitivity to violations of gender agreement in native and nonnative
Spanish: An eye-movement investigation. Language Learning, 59, 503-535.
Kim, E., Baek, S., Tremblay, A. (2015). The role of island constraints in second language
sentence processing. Language Acquisition, 22, 384-416.
Kim, J., Christianson, K. (2013). Sentence complexity and working memory effects in ambiguity
resolution. Journal of Psycholinguistic Research, 42, 393-411.
Lardiere, D. (2008). Feature assembly in second language acquisition. In: Liceras J., Zobl, H.
and Goodluck H. (Eds.), The role of formal features in second language acquisition (pp.
106-140). New York, NY: Lawrence Erlbaum Associates.
Larson-Hall, J. (2010). A guide to doing statistics in second language research using SPSS. New
York: Routledge.
Lee, M-W. (2004). Another look at the role of empty categories in sentence processing (and
grammar). Journal of Psycholinguistic Research, 33, 51â73.
Leeser, M. (2007). Learner-based factors in L2 reading comprehension and processing
grammatical form: Topic familiarity and working memory. Language Learning, 57, 229270.
LemhĂśfer K and Broersma M (2012) Introducing LexTALE: A quick and valid lexical test for
advanced learners of English. Behavior Research Methods, 44, 325â343.
Lenneberg, E., (1967). Biological foundations of language. New York, NY: Wiley.
Lewis, R. (1998). Reanalysis and limited repairing parsing: leaping off the garden path. In J. D.
Fodor & F. Ferreira (Eds.), Reanalysis in sentence processing (pp. 247-228). Dordrecht:
Kluwer Academic Publishers.
MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). The lexical nature of
syntactic ambiguity resolution. Psychological Review, 1j01, 676â703.
Marinis, T. (2003). Psycholinguistic techniques in second language acquisition research. Second
Language Research, 19, 144-161.
Marinis, T., Roberts, L., Felser, C., & Clashen, H. (2005). Gaps in second language processing.
Studies in Second Language Acquisition, 27, 53-78.
Marquardt, D. (1980). A critique of some ridge regression methods: Comment. Journal of the
American Statistical Association, 75, 67-91.
Nakano, Y., Felser, C., & Clahsen, H. (2002). Antecedent priming at trace positions in Japanese
long-distance scrambling. Journal of Psycholinguistic Research, 31, 531-571.

178

Meara, P. M. (1996). English Vocabulary Tests: 10 k. Swansea. UK: Center for Applied
Language Studies.
Miller, A. K. (2014). Accessing and maintaining referents in L2 processing of wh-dependencies.
Linguistic Approaches to Bilingualism, 4, 167-191.
Miller, A. K. (2015). Intermediate traces and intermediate learners: Evidence for the use of
intermediate structure during sentence processing in second language French. Studies in
Second Language Acquisition, 37, 487-516.
Mirdamadi, F., & De Jong, N. (2015). The effect of syntactic complexity on fluency: Comparing
actives and passives in L1 and L2 speech. Second Language Research, 31, 105-116.
Ojima, S. et al. (2005) An ERP study of second language learning after childhood: effects of
proficiency. Journal of Cognitive Neuroscience, 17, 1212-1228
Omaki, A., & Schulz, B. (2011). Filler-gap dependencies and island constraints in secondlanguage sentence processing. Studies in Second Language Acquisition, 33, 563-588.
O'Rourke, P. (2013). The interaction of different working memory mechanisms and sentence
processing: A study of the P600. In Knauff, M., Sebanz, N., Pauen, M., & Wachsmuth, I.
(Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (pp.
1097-1102). Austin, TX: Cognitive Science Society.
Osaka, M., & Osaka, N. (1992). Language independent working memory as measured by
Japanese and English reading span tests. Bulletin of the Psychonomic Society, 30, 287289.
Oswald, F., McAbee, S., Redick, T., & Hambrick, D., (2015). The development of a short
domain-general measure of working memory capacity. Behavior Research Methods, 1,
1343-1355.
Papadopolou, D. (2006). Cross-linguistic variation in sentence processing. Dordrecht: Springer.
Penfield, W., & Roberts, L. (1959). Speech and brain mechanisms. New York, NY: Athenaeum
Pickering, M., & Barry, G. (1991). Sentence processing without empty categories. Language and
Cognitive Processes, 6, 229-259.
Pickering, M., & Traxler, M. (1998). Plausibility and recovery from garden path: An eyetracking study. Journal of Experimental Psychology, 24, 940-961.
Pliatsikas, C., & Marinis, T. (2013). Processing empty categories in a second language: When
naturalistic exposure fills the (intermediate) gap. Bilingualism: Language and Cognition,
16, 167-182.
Pollard, C., & Sag, I. (1994). Head-driven phrase structure grammar. Chicago: University of
Chicago Press and Stanford: CSLI Publications.

179

Pritchett, B. (1988). Garden path phenomena and the grammatical basis of language processing.
Language, 64, 539-576.
Pritchett, B. (1992). Grammatical competence and parsing performance. Chicago: University of
Chicago Press.
Rahman, S. S. (2010). Acquisition of wh-movement in L2 learning: A cross-linguistic analysis.
The Dhaka University Journal of Linguistics, 2, 185-199.
Rayner, K., & Pollatsek, A. (1989). The psychology of reading. Englewood Cliffs, NJ: Prentice
Hall.
Redick, T., Broadway, J., Meier, M., Kuriakose, P., Unsworth, N., Kane, M., & Engle, R. (2012).
Measuring working memory capacity with automated complex span tasks. European
Journal of Psychological Assessment, 28, 164-171.
Roberts, L., Marinis, T., Felser, C. and Clahsen, H. (2007). Antecedent priming at gap positions
in childrenâs sentence processing. Journal of Psycholinguistic Research, 36, 175-188.
Roberts, L., & Siyanova-Chanturia (2013). Using eye-tracking to investigate topics in L2
acquisition and L2 processing. Studies in Second Language Acquisition, 35, 213-235.
Robinson, P. (2002). Effects of individual differences in intelligence, aptitude and working
memory on incidental SLA. In Robinson, P. (Ed.), Individual differences and instructed
language learning. Philadelphia: John Benjamins.
Rothman, J. (2008). Why all counter-evidence to the critical period hypothesis in second
language acquisition is not equal or problematic. Language and Linguistic Compass, 2/6,
1063-1088.
Ross, J. R. (1967). Constraints on variables in syntax. Unpublished Ph.D. thesis. MIT.
Sabourin, L. and Haverkort, M. (2003). Neural substrates of representation and processing of a
second language. In van Hout, R., Hulk, A., Kuiken, F., & Towell, R. (Eds.), The
LexiconâSyntax Interface in Second Language Acquisition (pp. 175â195). Amsterdam:
John Benjamins.
Sagarra, N., & Ellis, N. C. (2013). From seeing adverbs to seeing morphology. Language
experience and adult acquisition of L2 tense. Studies in Second Language Acquisition,
35, 261-290.
Sagarra, N. & J. Herschensohn (2010). The role of proficiency and working memory in gender
and number agreement marking in processing in L1 and L2 Spanish. Lingua,120, 2022â
2039.
Schacter, J. (1989). Testing a proposed universal. In S. Gass & J. Schachter (Eds.), Linguistic
perspectives on second language acquisition (pp. 73-88). Cambridge, UK: Cambridge
University Press.

180

Schachter, J. (1990). On the issue of completeness in second language acquisition. Second
Language Research, 6, 93-124.
Schachter, J., & Yip, V. (1990). Grammaticality Judgments: Why does anyone object to subject
extraction? Studies in Second Language Acquisition, 12, 379-392.
Schwartz, B., & Sprouse, R. (1996). L2 cognitive states and the Full Transfer/Full Access model.
Second Language Research, 12, 40â72.
Segalowitz, N., & Segalowitz, S. J. (1993). Skilled performance practice and differentiation of
speedup from automatization effects: Evidence from second language word recognition.
Applied Psycholinguistics,13, 369â385.
Singleton, D. (2005). The critical period hypothesis: A coat of many colours. International
Review of Applied Linguistics, 43, 269-285.
Slabakova, R. (2006). Is there a critical period for semantics? Second Language Research, 22,
302-338.
Staub, A. & Clifton, C. (2006). Syntactic prediction in language comprehension: Evidence from
eitherâŚ or. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32,
425-436.
Stowe, L. (1986). Evidence for on-line gap location. Language and Cognitive Processes, 1, 227â
245.
Sturt, P., Pickering, M. J., & Crocker, M. W. (1999). Structural change and reanalysis difficulty
in language comprehension. Journal of Memory and Language, 40, 136-150.
Swinney, D. (1979). Lexical access during sentence comprehension: (Re) consideration of
context effect. Journal of Verbal Learning and Verbal Behavior, 18, 645-659.
Swinney, D., Ford, M., Frauenfelder, U., & Bresnan, J. (1988). On the temporal course of gapfilling and antecedent assignment during sentence comprehension. In Grosz, B., Kaplan,
R., Macken, M., & Sag, I. (Eds.), Language structure and processing. Stanford, CA:
CSLI.
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration
of visual and linguistic information in spoken language comprehension. Science, 268,
1632â1634.
Tremblay, A. (2005). Theoretical and Methodological perspectives of the use of grammaticality
judgment task in linguistic theory. Second Language Studies, 24, 129-167.
Traxler, M., & Pickering, M. (1996). Plausibility and the processing of unbounded dependencies:
An eye-tracking study. Journal of Memory and Language, 35, 454-475.

181

Trenkic, D., Mirkovic, J., & Altmann, G. (2014). Real-time grammar processing by native and
non-native speakers: Constructions unique to the second language. Bilingualism:
Language and Cognition, 17, 237-257.
Ullman, M. (2004). Contributions of memory circuits to language: the declarative/procedural
model. Cognition, 92, 231-270.
Van Gompel, R. P. G., & Pickering, M. (2007). In Gaskell, M., & Altmann, G. (Eds.), The
Oxford handbook of psycholinguistics. (pp. 289-307). Oxford: Oxford University Press.
VanPatten, B., & Jegerski, J. (2010). Second language processing and Parsing: The issues. In
VanPatten, B., & Jegerski, J. (Eds.), Research in second language processing and
parsing (pp. 3-26). Philadelphia, PA: John Benjamins Publishing Company.
Vasishth, S., Drenhaus, H. (2011). Locality in German. Dialogue and Discourse, 2, 59-82.
Wagers, M., & Phillips, C. (2014). Going the distance: Memory and control processes in active
dependency construction. The Quarterly Journal of Experimental Psychology, 67, 12741304.
Weber-Fox, C. M., & Neville, H. J. (1996). Maturational constraints on functional
specializations for language processing: ERP and behavioral evidence in bilingual
speakers. Journal of Cognitive Neuroscience, 8, 231-256.
White, L. (1992). Subjacency violations and empty categories in L2 acquisition. In H. Goodluck
& M. Rochemont (Eds.), Island constraints (pp. 445-464). Dordrecht: Kluwer.
White, L. (2003). Second language acquisition and Universal Grammar. Cambridge, England:
Cambridge University Press.
White, L., & Juffs, A. (1998). Constraints on wh-movement in two different contexts of
nonnative language acquisition: Competence and processing. In Flynn, S.,
Martohardjono, G. and OâNeill, W. (Eds.), The generative study of second language
acquisition (pp. 111-130). Hillsdale, NJ: Erdbaum.
Williams, J. N. (2006). Incremental interpretation in second language sentence processing.
Bilingualism: Language and Cognition, 9, 71-88.
Williams, J., Moebius, P., & Kim, C. (2001). Native and non-native processing of English whquestions: Parsing strategies and plausibility Constraints. Applied psycholinguistics, 22,
509-540.
Zawiszewski, A., Gutierrez, E., Fernandez, B., & Laka, I. (2011). Language distance and nonnative syntactic processing: Evidence from event-related potentials. Bilingualism:
Language and Cognition, 14, 401-411.

182

Zufferey, S., Mak, W., & Degand, L. (2015). Advanced learnersâ comprehension of discourse
connectives: The role of L1 transfer across on-line and off-line tasks. Second Language
Research, 31, 389-411.

183