TESTING THE REMINDING ACCOUNT OF THE LAG EFFECT IN L2 VOCABULARY 
ACQUISITION FROM L2-L1 RETRIEVAL PRACTICE WITHIN A PAIRED-ASSOCIATE 

LEARNING FORMAT 

By 

Natalya G Koval 

 
 
 
 
 
 
 
 
 
 
 
 
 
 

A DISSERTATION 

Submitted to 

Michigan State University 

in partial fulfillment of the requirements 

for the degree of 

Second Language Studies – Doctor of Philosophy 

2020 

 
 
 
 

ABSTRACT 

TESTING THE REMINDING ACCOUNT OF THE LAG EFFECT IN L2 VOCABULARY 

ACQUISITION FROM L2-L1 RETRIEVAL PRACTICE WITHIN A PAIRED-

ASSOCIATE LEARNING FORMAT 

By 

Natalya G Koval 

The spacing/lag effect refers to the finding in memory research that spacing repeated 

study more widely produces important learning benefits (Crowder, 1976; Dempster, 1988, 

1989). In order to know when and how this effect can be most useful for second language 

learning, it is important to understand the cognitive mechanism(s) that drive any effects of 

spacing in second language learning. It is also important to understand how the operation of 

the mechanism(s) may be affected by variables inherent in second language learning contexts. 

In the present study, I investigate the contribution of the dual mechanism of effortful 

successful retrieval to the effects of lag in second language vocabulary learning. This dual 

mechanism is proposed to underlie both beneficial and detrimental effects of lag on learning 

within the reminding account (Benjamin & Tullis, 2010). I additionally investigate the 

potential effects of externally imposed study time on learning as well as on the operation of 

the two mechanisms under investigation.     

 

Fifty-two native speakers of American English studied 72 novel L2 Finnish words 

during overt oral L2-L1 translation retrieval practice in a paired-associate learning format 

from 6 repetitions under three constant levels of within-session lag with immediate study of 

feedback for 3 or 9 seconds after each retrieval attempt. Study-phase response latencies and 

accuracy were recorded and used as measures of study-phase retrieval effort and success, 

respectively (as in Maddox & Balota, 2015). Immediate and delayed form recognition, L2-L1 

translation and translation matching posttests were used to measure learning outcomes.  

 

Results showed a large spacing effect on all measures and at both times of test 

administration as well as a lag effect on delayed meaning tests. Study time had an overall 

small positive effect on learning; however, it did not cancel out negative effects of massing 

retrieval practice: the effects of spacing were considerably larger. Increasing lag between 

retrieval attempts produced increasingly longer study-phase response latencies and 

increasingly lower levels of study-phase retrieval success. Study time had a small 

nonsignificant negative effect on study-phase response latencies and a small significant 

positive effect on study-phase retrieval success. Moderated mediation analyses showed that 

study time, as operationalized in the present study, did not affect the operation of the two 

underlying mechanisms under investigation. They further showed that, despite the fact that a 

nonmonotonic function was not observed in the present learning outcomes, increasing inter-

study interval still had a negative effect on learning and this effect operated through a lower 

rate of study-phase retrieval success. Further, the moderated mediation analyses showed  that 

the positive effects of retrieval effort (Roediger & Karpicke, 2006) were conditional on 

retrieval success, in line with predictions of the reminding account. 

 

The findings of the dissertation suggest that: (a) massed L2-L1 translation retrieval 

practice may not be effective for L2 vocabulary learning; (b) externally imposing a longer 

study time does not have the large benefits that learner-regulated longer study time does; (c) 

effortful successful retrieval underlies benefits of lag in L2 vocabulary learning from L2-L1 

retrieval practice – the benefits of effortful retrieval are conditional on retrieval success, even 

in the presence of immediate feedback; (d) successful retrieval is more beneficial than 

unsuccessful retrieval, even when retrieval attempts are followed by immediate feedback – 

study of feedback does not offset the negative effects of retrieval failure.    

 

 

 

 

 

 

 

 

 

 

 

 

 
 
 

Copyright by 
NATALYA G KOVAL 
2020 

 

 

 

ACKNOWLEDGEMENTS 

 

I owe a tremendous debt of gratitude to my advisors Charlene Polio, Susan Gass, Patti 

Spinner, Aline Godfroid, and Sandra Deshors for their help, support, and guidance throughout 

the four years of the SLS program at Michigan State University. These amazing people 

inspired me and fostered in me a passion for second language research. They provided 

important guidance and were instrumental in helping me to become the researcher that I am 

today.  

 

I am further grateful to my professors and classmates for a stimulating experience. I 

am honored to have met and worked with so many accomplished researchers and simply 

awesome people! I am grateful to all my participants for taking the time to participate in this 

study and for making data collection easy and fun.  

 

I am grateful to all my friends outside the SLS program for their friendship and the 

wonderful time spent together. Finally, I am grateful that I was able to spend these four years 

living in the breathtakingly beautiful place known as Spartan Village. 

 

Michigan State University has been a great place to study, teach, and do research. My 

very first experience at MSU had been during the 2000-2001 academic year, when I was an 

undergraduate exchange student here on an FSA scholarship. I did not know at the time that I 

would be getting my doctoral degree from this very institution 20 years later! I am honored to 

call myself a Spartan! Thank you, MSU! 

 

 

 

 

v 
 

 

TABLE OF CONTENTS 

 

LIST OF TABLES .................................................................................................................. viii 

LIST OF FIGURES .................................................................................................................... x 

CHAPTER 1 SECOND LANGUAGE ACQUISITION, OVERT RETRIEVAL PRACTICE, 
AND THE SPACING/LAG EFFECTS  ..................................................................................... 1 
The present dissertation  ......................................................................................................... 3 
Overview of the dissertation  .......................................................................................... 5 
Definition of key terms  .................................................................................................. 5 

 
 
 
CHAPTER 2 THE SPACING AND LAG EFFECTS AND EFFORTS TO UNDERSTAND 
THEIR UNDERLYING MECHANISMS  ................................................................................ 7 
Theories of the spacing effect ................................................................................................. 9 
The two-process reminding account ..................................................................................... 14 
Investigating the role of attention and effort in learning from repetition  ........................ 15 
Investigating the role of reminding in learning from repeated study  ............................... 22 
Overt retrieval practice and its effects on memory ...................................................... 27 
Research into the spacing effect and retrieval practice in second language acquisition ...... 33 
Research questions ............................................................................................................ 42 

 

CHAPTER 3 METHOD ........................................................................................................... 43 
Participants ............................................................................................................................ 43 
Materials and design ............................................................................................................. 44 
Study phase ................................................................................................................... 44 
Distractor math task ...................................................................................................... 48 
Posttests ........................................................................................................................ 49 
Linguistic background questionnaire ........................................................................... 52 
Instruments  ........................................................................................................................... 53 
Procedure .............................................................................................................................. 53 
Analyses ................................................................................................................................ 57 

 
 
 
 

CHAPTER 4 RESULTS ........................................................................................................... 58 
Background questionnaire .................................................................................................... 58 
Posttests results ..................................................................................................................... 58 
Posttest results: Descriptive statistics ........................................................................... 59 
Posttest results: Inferential statistics  ............................................................................ 69 
     The form-recognition test ........................................................................................ 70 
     The L2-L1 translation test  ...................................................................................... 74 
     The form-meaning matching test  ............................................................................ 78 
Study-phase results ............................................................................................................... 82 
Study-phase response latencies: Descriptive statistics ................................................. 83 
Study-phase response latencies: Inferential statistics ................................................... 89 

 
 
 
 
 

 
 

vi 
 

 

 
 

 
 
 
 
 
 
 
 
 

Study-phase retrieval success: Descriptive statistics  ................................................... 96 
Study-phase retrieval success: Inferential statistics ..................................................... 99 
Moderated mediation analyses  ....................................................................................... 104 
Moderated parallel mediation analyses  ..................................................................... 104 
     The form-recognition test  ..................................................................................... 107 
     The immediate meaning test  ................................................................................. 109 
     The delayed meaning test  ..................................................................................... 111 
Mediation by retrieval effort moderated by retrieval success (a moderated mediation    
analysis)  ..................................................................................................................... 112 
     The form-recognition scores  ................................................................................. 115 
     The immediate meaning scores  ............................................................................ 116 
     The delayed meaning scores  ................................................................................. 117 

CHAPTER 5 DISCUSSION .................................................................................................. 120 
Pedagogical implications  ............................................................................................... 128 
Limitations and suggestions for future research  ............................................................ 132 

APPENDICES  ....................................................................................................................... 137 
Appendix A: Target Finnish words with their English translations  ............................... 138 
Appendix B: Information on the English translations  ................................................... 139 
Appendix C: Instructions for vocabulary posttests  ........................................................ 140 
Appendix D: The form recognition test  ......................................................................... 141 
Appendix E: The L2-L1 translation test ......................................................................... 145 
Appendix F: The form-meaning matching test ............................................................... 147 
Appendix G: Linguistic Background Questionnaire ....................................................... 149 
Appendix H: Study-phase instructions ........................................................................... 150 

REFERENCES  ...................................................................................................................... 151 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

vii 

 

 

LIST OF TABLES 

 

Table 1: Definitions for key terminology  .................................................................................. 6 

Table 2: Variables used in the study-phase analyses  ............................................................... 49 

Table 3: Variables used in the posttest analyses  ..................................................................... 51 

Table 4: Variables used in the moderated mediation analyses  ................................................ 52 

Table 5: Raw posttest scores in the practice and no-practice conditions  ................................ 60 

Table 6: Raw posttest scores across the three experimental conditions  .................................. 61 

Table 7: Percent correct in the massed practice and no-practice conditions  ........................... 63 

Table 8: Percent correct in the short-spaced practice and no-practice conditions  .................. 64 

Table 9: Percent correct in the long-spaced practice and no-practice conditions  ................... 65 

Table 10: Correlations and loadings for each test on the extracted component  ...................... 66 

Table 11: Form-recognition omnibus test  ............................................................................... 71 

Table 12: Form-recognition results against the no-practice condition  .................................... 73 

Table 13: Form-recognition results against the short-spaced practice condition  .................... 74 

Table 14: L2-L1 translation omnibus test  ............................................................................... 76 

Table 15: L2-L1 translation results against the no-practice condition  .................................... 77 

Table 16: L2-L1 translation results against the short-spaced practice condition  .................... 78 

Table 17: Form-meaning matching omnibus test  .................................................................... 80 

Table 18: Form-meaning matching results against the no-practice condition  ........................ 81 

Table 19: Form-meaning matching results against the short-spaced practice condition  ......... 82 

Table 20: Response latencies across the practice conditions  .................................................. 84 

Table 21: Response latencies in the two study time conditions  .............................................. 85 

Table 22: Response latencies in successful and unsuccessful retrieval attempts  .................... 86 

viii 

 

 

Table 23: Correct retrieval events per experimental condition  ............................................... 97 

Table 24: Study-phase retrieval success in the short and long study time conditions  ............ 97 

Table 25: The effect of ISI on retrieval success at the five repetitions  ................................. 102 

Table 26: Parameter estimates for the effect of ISI on study-phase retrieval success  .......... 103 

Table 27: Correlation coefficients and loadings for form-recognition tests  .......................... 105 

Table 28: Correlation coefficients and loadings for immediate meaning tests  ..................... 105 

Table 29: Correlation coefficients and loadings for delayed meaning tests  .......................... 105 

Table 30: Correlation coefficients and loadings for form-recognition tests  .......................... 114 

Table 31: Correlation coefficients and loadings for immediate meaning tests  ..................... 114 

Table 32: Correlation coefficients and loadings for delayed meaning tests  .......................... 114 

Table 33: Effect of effort at three levels of success for form-recognition  ............................ 116 

Table 34: Effect of effort at three levels of success for immediate meaning tests  ................ 117 

Table 35: Effect of effort at three levels of success for delayed meaning tests  .................... 118 

Table 36: Frequency and concreteness indices for the English translations for the target words 
 ................................................................................................................................................ 139 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ix 
 

 

LIST OF FIGURES 

 

Figure 1: A conceptual illustration of the repetition pattern for one item  ............................... 48 

Figure 2: A summary of the experimental procedure  .............................................................. 53 

Figure 3: An example of one experimental trial sequence  ...................................................... 54 

Figure 4: Form-recognition scores in the three ISI conditions  ................................................ 66 

Figure 5: L2- L1 translation scores in the three ISI conditions  ............................................... 66 

Figure 6: Form-meaning mapping scores in the three ISI conditions  ..................................... 67 

Figure 7: Posttest results in the three ISIs for the two groups of participants  ......................... 68 

Figure 8: Effect of study time on scores in the two groups  ..................................................... 69 

Figure 9: Median study-phase response latencies across the six repetitions  ........................... 86 

Figure 10: Response latencies in the short and long study duration conditions  ...................... 87 

Figure 11: Study-phase latencies in successful and unsuccessful retrieval attempts  .............. 88 

Figure 12: Growth in the latencies in the three conditions across repetitions  ......................... 89 

Figure 13: Response latencies across five successful retrieval attempts  ................................. 93 

Figure 14: Response latencies as a function of condition and success of retrieval  ................. 95 

Figure 15: Successful retrievals at each repetition in the three conditions  ............................. 98 

Figure 16: Growth in retrieval successes in the two study time conditions  ............................ 99 

Figure 17: Conceptual structure for the moderated parallel mediation analysis  ................... 107 

Figure 18: Conceptual structure for the moderated mediation analysis  ................................ 115

x 
 

 

CHAPTER 1 

SECOND LANGUAGE ACQUISITION, OVERT RETRIEVAL PRACTICE, AND THE 

SPACING/LAG EFFECTS 

Learning large numbers of words is an important part of becoming proficient in a 

second language. Therefore, an important question for second language pedagogy is how to 

go about the task of learning/teaching vocabulary in a way that is both successful and 

efficient. Second language research has addressed this question by testing different methods 

of learning vocabulary. One method that has been widely found to increase retention of 

studied material in the field of psychology is to space repeated study of target material rather 

than use massed repeated study (Crowder, 1976; Dellarosa & Bourne, 1985; Dempster, 1988, 

1989; Hintzman, 1974; Pavlik & Anderson, 2005; Rohrer & Pashler, 2007). This finding, 

widely known as the spacing effect, has also been observed with learning of second language 

vocabulary (Bloom & Shuell, 1981; Nakata, 2015; Nakata & Webb, 2016). A closely related 

finding, termed the lag effect, is the finding that the wider practice is spaced the better the 

learning outcomes. The spacing effect is one of the most robust and ubiquitous findings in 

memory research. The positive effects of spacing are usually very large: it is often found that, 

holding total exposure time constant, two exposures to a target item that are massed 

(consecutive) are hardly more effective than a single exposure while two spaced exposures are 

often about twice as effective as one. Spacing study offers important benefits also because it 

can help save time: no additional study time is required to observe the considerable learning 

benefits – in fact, less time may be required to attain more learning (Maddox & Balota, 2015). 

Because of its considerable benefits and practicality, the spacing effect potentially holds great 

promise for any learning situation. However, as noted by many, the full extent of its potential 

1 
 

 

benefits is not being exploited in educational settings (Cepeda et al., 2009; Dempster, 1988; 

Gerbier & Toppino, 2015; Kang, 2016; Maddox, 2016). Further, despite the generality and 

consistency of the observed benefits of spaced practice obtained across vastly diverse 

populations and target tasks in the field of psychology, investigations of spaced practice in the 

context of second language learning have produced mixed results, with some studies finding 

that spacing repeated study more widely has no effect or even has a detrimental effect on 

learning (Collins, Halter, Lightbown, & Spada, 1999; Elgort & Warren, 2014; Rogers & 

Cheung, 2018; Serrano, 2011; Serrano & Munoz, 2007; Suzuki & DeKeyser, 2017; White & 

Turner, 2005). In order to understand when and how spacing repeated study of L2 material 

more widely may be beneficial for second language learning contexts and in order to be able 

to give useful practical recommendations regarding how to make the best use of this 

potentially very powerful learning tool in second language pedagogy, it is important to 

understand the underlying mechanisms that may drive any effects of spacing in specific 

learning situations. It is further important to understand how the operation of these 

mechanisms may be affected by variables that are relevant for any specific learning contexts. 

Prior SLA research has tested the effects of spacing repeated study on acquisition of various 

aspects of a second language and provides important insights into the usefulness of this 

learning method for SLA contexts. However, prior SLA research has not produced much 

direct investigation into the process as well as the product of learning from repeated 

exposures under different levels of spacing. The present study contributes to filling this gap. 

In the present study, I investigate the contribution of a proposed underlying mechanism of the 

spacing effect to novel L2 vocabulary learning from overt retrieval practice in a paired-

associate learning (PAL) format. 

2 
 

 

Overt retrieval practice is another popular method that has been widely shown to 

produce powerful beneficial effects on learning. Information that is retrieved from memory 

becomes more recallable in the future. This finding is known as the retrieval effect (Carrier & 

Pashler, 1992; Cull, Shaughnessy, & Zechmeister, 1996). Just as is the case with the spacing 

effect, retrieval practice produces very large learning benefits and is a very robust and 

ubiquitous finding. Just as is the case with the spacing effect, it is not being taken full 

advantage of in education (McDaniel & Fisher, 1991; Roediger & Karpicke, 2006). 

Optimizing retrieval practice with L2 vocabulary is an important goal in L2 pedagogy. One 

way to make retrieval practice more effective is to space retrieval attempts more widely 

(Maddox & Balota, 2015; Maddox, Balota, Coane, & Duchek, 2011). The underlying 

mechanism here is proposed by some accounts to be a combination of retrieval effort and 

success (Bjork, 1994; Maddox & Balota, 2015), which is a dual mechanism that is also 

believed to more generally underly the effects of spacing any type of practice more widely 

(Benjamin & Tullis, 2010).  

The present dissertation 

In the present dissertation, I investigate the contribution of the two-process mechanism 

of effortful successful retrieval during study to the spacing/lag effect in L2 vocabulary 

learning. Such a dual mechanism is proposed to underlie the spacing/lag effect within the 

reminding framework (Benjamin & Tullis, 2010). I further investigate how the operation of 

the two mechanisms of study-phase retrieval effort and success, as well as ultimate learning 

outcomes, may be affected by a variable that is relevant for second language learning 

contexts, which is the amount of time a learner is allowed, per encounter (and in total, while 

3 
 

 

holding the number of encounters constant), for studying a foreign word with its translation. 

This latter variable is referred to, throughout this text, as study time or presentation duration.  

Using a fully counterbalanced within-participant within-item design, I investigate 

learning of novel foreign vocabulary in a PAL format (Barcroft, 2007; Nakata, 2011) within 

one session under three levels of inter-study interval (ISI): (a) 0-1 intervening trials, (b) 17-38 

intervening trials or 12-22 trials and a six-minute break (c) 71-119 intervening trials and the 

six-minute break. I further investigate any mediating effects of successful effortful overt 

retrieval of the paired L1 translation associate (Maddox & Balota, 2015; Maddox et al., 2011; 

Nakata, 2015) by using response accuracy and latency as proxies for retrieval success and 

effort, respectively (Maddox & Balota, 2015, Maddox, Pyc, Kauffman, Gatewood, & 

Schonhoff, 2018) as well as the role of feedback study time in moderating these effects 

(Verkoeijen & Bouwmeester, 2008). I use two levels of feedback presentation duration: (a) 3 

seconds and (b) 9 seconds. This refers to the length of time a foreign word and its L1 

translation stay on the screen for the learners to study following each of its retrieval attempts. 

The total study time for the words is 18 versus 54 seconds over six exposures. The amount of 

time a learner is allowed to study a word with its translation is an important variable for 

second language vocabulary learning success that has not received much attention in SLA 

research. While it has been shown that the time learners choose to spend on studying or 

attentionally processing a target item has an important positive effect on learning of the item 

(Godfroid et al., 2018; Godfroid et al., 2013; Koval, 2019; Rundus, 1971), it is not obvious 

that the same effect should be observed when study time is externally imposed on the learner 

by a word-learning software or an instructor. In the present study, I investigate whether the 

benefits of longer study time will hold when the length of study is externally determined. 

4 
 

 

Further, the length of time a learner is given to study a target L2-L1 translation pair may have 

important effects on the study-phase processes of retrieval success and effort. Longer study 

time at each repetition is likely to result in stronger encodings (Verkoeijen & Bouwmeester, 

2008), which, in turn, might increase the likelihood of retrieval success on subsequent 

repetitions but also decrease the amount of effort needed for such retrieval. In this way, in 

addition to having potential learning benefits due to increased exposure to the target 

translation pair, study time could affect the operation of the proposed underlying mechanisms. 

The present dissertation aims to answer the following general research questions: (a) Does the 

dual mechanism of successful effortful retrieval underlie the benefits/detrimental effects of 

spacing on L2 vocabulary learning in a PAL format?, (b) Does exposure duration moderate 

these effects?  

 

Overview of the dissertation. The present dissertation consists of three chapters. In 

the first chapter, I introduce the motivation for the present dissertation and its main goals. In 

the second chapter, I discuss extant literature and present the methodology and results of the 

present experiment. In the third chapter, I present a discussion of the present results and well 

as their pedagogical implications, followed by a discussion of limitations of the present 

experiment and suggestions for future research.  

Definition of key terms. Table 1 presents a list of key terms with their definitions.  

 

 
 
 
 
 
 
 
 
 
 

5 
 

 

Table 1: Definitions for key terminology 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6 
 

 

CHAPTER 2 

THE SPACING AND LAG EFFECTS AND EFFORTS TO UNDERSTAND THEIR 

UNDERLYING MECHANISMS 

Research interest in the spacing effect is known to have been sparked by Ebbinghaus’ 

(1885/1964) influential book on memory. Research of this memory phenomenon has been 

quite prolific since that time. The benefits of spacing practice have been consistently obtained 

under a wide range of learning conditions and target tasks (Crowder 1976; Dempster 1996; 

Donovan & Radosevich, 1999; Hintzman, 1976); with younger and older individuals (Balota, 

Duchek, & Paullin,1989) and in healthy humans as well as in people with memory 

impairments (Green, Weston, Wiseheart, & Rosenbaum, 2014; Hillary et al., 2003). Memorial 

benefits of spacing have also been found in other species, such as monkeys, rodents, and even 

honeybees and drosophilae (Commins, Cunningham, Harvey, &Walsh, 2003; Deisig, Sandoz, 

Giurfa, & Lachnit, 2007; Yin, Del Vecchio, Zhou, & Tully, 1995). Thus, the spacing effect 

appears to be a robust and quite universal finding. Further, its beneficial effects are usually 

found to be large, suggesting that spacing study is potentially a very powerful learning tool 

that may be used in a wide range of learning situations.  

Studies investigating the spacing effect usually compare learning under two 

conditions: a massed condition, where repetitions of the studied material are consecutive, and 

a spaced condition, where repetitions are separated by time or study of other material. 

Psychology studies of the effects of spacing repetitions also usually include once-presented 

words (Braun & Rubin, 1998). These serve as filler material to achieve the desired order and 

spacing of the target items as well as a baseline for investigating the effects of repetition. In 

its strictest sense, massed practice refers to situations where repetitions of the same item are 

7 
 

 

separated by zero intervening items or time that is no longer than one second (Carpenter, 

Cepeda, Rohrer, Kang, & Pashler, 2012; Kahana & Howard, 2005), while spaced repetitions 

are those that are separated by a longer period of time or at least one intervening item. A 

closely related phenomenon, known as the lag effect, is the finding that longer ISIs lead to 

better long-term retention than shorter ISIs (D’Agostino & DeRemer, 1973; Toppino, & 

Gracen, 1985). Studies investigating the lag effect usually include more than one level of lag 

– that is, repetitions are separated by different intervals of time or numbers of intervening 

items in different lag conditions. In studies investigating learning from more than two 

repetitions, the spacing between each two consecutive repetitions may be constant (or equal) 

or it may be progressively longer (what is known as an expanding schedule) or shorter (what 

is referred to as a shrinking schedule). Further, the increase or decrease in the amount of 

spacing across repetitions may be systematic (such as 0-2-4-6 intervening items) or 

unsystematic (such as 0-1-5-6 intervening items). In studies investigating nonuniform lag 

schedules, the average lag is held constant across the different tested lag schedules for more 

valid conclusions regarding the effects of nonconstant spacing schedules that are not 

confounded with different overall amount of time between repetitions. Further, the number of 

repetitions may be constant or not, or it may depend on participants’ performance levels. In 

what is known as a drop-out schedule, target items are tested during the acquisition phase 

(usually through overt response) until a criterion level of knowledge is reached, at which time 

the items in question do not appear for further study. This latter method may be useful in 

investigations of forgetting, where each item needs to be at the same level of mastery at the 

end of the acquisition phase, thus equating intercepts of the forgetting curves for the different 

items in the different learning conditions (e.g., Pyc & Rawson, 2009). Some studies have 

8 
 

 

varied the number of repetitions a priori to investigate the effects of repetition at different 

levels of ISI (e.g., Maddox & Balota, 2015). This allows to test whether fewer or more 

repetitions are needed with a given ISI schedule. 

Theories of the spacing effect 

Despite the fact that research interest in the spacing and lag effects dates back over a 

century and despite the large number of theories that have been proposed in efforts to explain 

it (Benjamin & Tullis, 2010; Bjork & Allen, 1970; Challis, 1993; Dellarosa & Bourne, 1985; 

Estes, 1955; Glenberg, 1979; Greene, 1989; Jacoby, 1978; Küpper-Tetzel, & Erdfelder, 2012; 

Landauer, 1969; Madigan, 1969; Melton,1970; Pavlik & Anderson, 2005; Raaijmakers, 2003; 

Rundus, 1971; Thios & D’Agostino, 1976;  Zimmerman, 1975), its underlying mechanisms 

are still poorly understood (Kılıç, Hoyer, & Howard, 2013; Maddox et al., 2018). Further, it is 

widely recognized today that a different mechanism, or combination of mechanisms, may 

underlie the effects of spacing depending on a specific learning situation or target task 

(Gerbier & Toppino, 2015; Glenberg & Smith, 1981; Greene, 1989; Kornell & Bjork, 2008; 

Russo & Mammarella, 2002). One proposed mechanism that is intuitively relevant for second 

language learning is that proposed by the deficient processing theory of the spacing effect 

(Bjork, 1999; Callan & Schweighofer, 2010; Challis, 1993; Cuddy & Jacoby, 1982; 

Hintzman, 1976; Jacoby, 1978, Pavlik & Anderson, 2005; Rose & Rowe, 1976; Rundus, 

1971; Zechmeister & Shaughnessy, 1980). According to this theory, repetitions of the same 

stimulus that occur in close succession receive less attentional processing than repetitions that 

occur more widely apart. Such an attentional account assumes that more attentional 

processing leads to better learning outcomes, which is in line with proposals in the field of 

SLA (Gass, 1988; Robinson, 2003; Schmidt, 1990, 2001), in general, and findings from L2 

9 
 

 

vocabulary studies (Godfroid et al., 2018; Godfroid, et al., 2013), in particular. In fact, in 

Koval (2019), I found that more attentional processing that is given to novel L2 words that 

occur with longer intervals between repetitions mediates the large beneficial effects of 

spacing obtained in my study, suggesting that the mechanism proposed to underlie the 

beneficial effects of spacing by the deficient processing theory contributes in important ways 

to the effects of spacing on learning L2 vocabulary.  

 According to theory, deficient processing may be due to voluntary or involuntary 

mechanisms. Thus, less than optimal processing of massed repetitions may be the result of a 

conscious choice to give less attention to an immediate repetition of the same stimulus due to 

a heightened sense of familiarity (Greene, 1989; Kornell & Bjork, 2008; Rundus, 1971; 

Shaughnessy, Zimmerman, & Underwood, 1972; Zechmeister & Shaughnessy, 1980; 

Zimmerman, 1975). Such a voluntary, consciously controlled mechanism is particularly 

relevant for intentional learning situations, such as when one is trying to learn a list of L2 

words. Thus, when a word is repeated immediately, one may overestimate one’s knowledge 

of the word and strategically choose to allocate less study time to it. When, on the other hand, 

a word is repeated after a substantial amount of time has gone by and, consequently, the 

memory trace of the previous encounter has faded quite a bit more relative to what occurs 

within the short time between massed repetitions, the word may strike the learner as less 

familiar, in which case more rehearsal will seem warranted. An involuntary deficient 

processing mechanism, on the other hand, operates automatically, such as through the process 

of habituation, priming, or neural repetition suppression (Callan & Schweighofer, 2010; 

Challis, 1993; Mammarella, Avons, & Russo, 2004; Russo & Mammarella, 2002; Russo, 

Parkin, Taylor, & Wilks, 1998; Van Strien, Verkoeijen, Van der Meer, & Franken, 2007; Xue 

10 

 

 

et al., 2011). Thus, for example, recognition of an immediate repetition usually requires a 

much less extensive analysis of the target stimulus than its recognition upon its first 

presentation or when it is repeated after a longer time interval and some forgetting of the 

initial presentation has occurred. Processing is further often said to be deficient in terms of the 

amount of effort involved in retrieval of information (Bjork, 1994,1999). More effortful, or 

difficult, retrieval is believed to be desirable for stronger memory traces (Benjamin, Bjork, & 

Schwartz, 1998; Benjamin & Tullis, 2010; Bjork, 1994, 1999; Gardiner, Craik, & Bleasdale, 

1973; Jacoby, 1978; Logan & Balota, 2008; Pavlik & Anderson, 2005; Roediger & Karpicke, 

2006; Schmidt & Bjork, 1992). Repeated retrieval practice that is massed is often assumed to 

require less effort than repeated retrieval practice that is spaced (Benjamin & Tullis, 2010; 

Bjork, 2013; Pyc & Rawson, 2009) or to involve less complete retrieval processes because the 

to-be-retrieved information still resides in working memory (Glover, 1989). 

An important characteristic of the lag function (the function relating various degrees 

of ISI and learning success) is that it is nonmonotonic, or an inverted-U in shape (Cepeda, 

Pashler, Vul, Wixted, & Rohrer, 2006; Cepeda et al., 2009; Cepeda, Vul, Rohrer, Wixted, & 

Pashler, 2008; Küpper-Tetzel, & Erdfelder, 2012; Rohrer & Pashler, 2007). This means that 

with shorter ISIs, increasing the ISI leads to more learning; however, as lags get increasingly 

longer, there comes a point beyond which increasing lag any further may actually have 

detrimental effects on learning (Benjamin & Tullis, 2010; Cepeda, et al., 2006; Maddox, 

2016; Peterson, Wampler, Kirkpatrick, & Saltzman, 1963; Young, 1971). In other words, 

there is a limit to how widely we can space repeated study before this begins to actually have 

a detrimental effect on learning outcomes. The finding that learning gains do not increase 

monotonically with longer lags but increase only to a point beyond which learning actually 

11 

 

 

begins to decrease with increasing ISIs cannot be explained by the deficient processing 

theory, as, while the increase in the amount of attention given to progressively wider spaced 

repetitions may well level off at some point, it is unlikely to begin to decrease at a longer ISI. 

As a response to findings of such limitations to single process theories, many current theories 

assume the operation of multiple processes that together contribute to the effects of spacing 

(Delaney et al., 2010; Greene, 1989; Maddox, 2016). In fact, it is argued that no single-

process mechanism can accommodate the broad range of findings from research into the 

spacing effect and its boundary conditions (see, e.g., Benjamin & Tullis, 2010; Delaney, 

Verkoeijen, & Spirgel, 2010; Gerbier & Toppino, 2015; Greene, 1989;  Maddox, 2016; 

Verkoeijen, Rikers, & Schmidt, 2004). A leading explanation that can accommodate both the 

finding that attentional engagement mediates the benefits of spacing on the one hand and the 

fact that the lag function is nonmonotonic, on the other, is the reminding account (Benjamin 

& Tullis, 2010). This account supplements the operation of a deficient processing mechanism 

with the central assumption of the study-phase retrieval theory (Braun & Rubin, 1998; 

Delaney et al., 2010; Greene, 1989; Raaijmakers, 2003; Thios & D’Agostino, 1976; Toppino 

& Bloom, 2002). This is the assumption that, for spacing to have its benefits, a repeated 

encounter must involve retrieval of its previous presentation from long term memory 

(Wahlheim, Maddox, & Jacoby, 2014). Other evidence of the importance of such dependency 

among memory traces comes from the finding of super-additive effects in learning from 

repetition. Super-additivity refers to the fact that the probability of recalling an item that was 

studied twice is found to be higher than the probability of recalling any of two items studied 

once (the additive assumption) (Begg & Greene, 1988; Ross & Landauer, 1978; Watkins & 

Kerkar, 1985; Waugh, 1963). Such a finding that memory for an item studied twice usually 

12 

 

 

exceeds what would be expected from two independent learning events indicates that effects 

of repetition on learning are more than just the sum of learning events.  

Theories that can accommodate the curvilinearity in the lag function explain the shape 

of the function in terms of the importance of preserving memory trace dependency between 

repetitions (the study-phase retrieval assumption discussed above). Thus, at relatively shorter 

lags such a dependency is preserved and repetitions are processed as repetitions rather than as 

independent events while at longer lags this dependency may be broken, which has a negative 

effect on learning outcomes. A number of other findings in the field of psychology that are 

potentially relevant for second language learning can be accommodated by a theory that 

assumes the importance of memory trace survival between repetitions, or successful study-

phase retrieval. One such finding is that the optimal ISI (the inflection point in the lag 

function at which learning is best and beyond which learning begins to decrease with 

increasing ISIs) under intentional learning is farther out (at a higher level of ISI) than that 

under incidental learning (Verkoeijen, Rikers, & Schmidt, 2005). This can be explained in 

terms of the stronger memory traces laid down under intentional learning conditions, which 

are traces that are more likely to survive over longer ISIs. Another important finding is that 

when repeated exposures occur within contexts that are intentionally made different through 

experimental manipulation, spacing repeated exposures more widely may have a detrimental 

effect on learning outcomes (Verkoeijen, et al., 2004). This finding can also be 

accommodated by a theory that assumes an important role for successful retrieval of the 

previous study event, because when an item repeats in a context that is different from its 

previous encounter it is less likely to be recognized as repeated, in which case the dependency 

between the memory traces may not be preserved. Further, study time has been found to 

13 

 

 

positively affect learning from spaced repetitions (Verkoeijen & Bouwmeester, 2008) while 

task complexity and the difficulty of the intervening task coupled with lower working-

memory capacity have been shown to negatively affect learning from spaced repetitions (Bui, 

Maddox, & Balota, 2013; Donovan & Radosevich, 1999). Thus, the findings that positive 

effects of spaced study may be tempered or even reversed under certain levels of the relevant 

variables can also be explained through this affecting the probability of study-phase retrieval 

success.  

The two-process reminding account  

The reminding account (Benjamin & Ross, 2010; Benjamin & Tullis, 2010; Hintzman, 

2004; 2010; Tullis, Benjamin, & Ross, 2014) is currently a leading explanation for the lag and 

spacing effects. It is a dual mechanism account that combines beneficial effects of desirable 

difficulty (Bjork, 1994, 1999) with an important role for study-phase retrieval, or reminding 

(Hintzman, 2004, 2010; Thios & D’Agostino, 1976). Both desirable difficulty and reminding 

are believed to benefit memory independently of any effects of spacing (Bjork, 1994; 

McKinley, Ross, & Benjamin, 2019). Bjork (1994) has argued that retrieval is most beneficial 

when the to-be-retrieved item is difficult but still not impossible to remember. According to 

the reminding explanation of the spacing effect, learning from repetition is optimal when the 

second encounter with an item triggers retrieval of (or reminds of) its first occurrence and, at 

the same time, such retrieval requires more effortful processing (or the information is 

retrieved from long-term rather than short-term memory). With increasing ISIs, retrieval of a 

previous encounter requires more effort, which is beneficial for learning. At the same time, 

however, retrieval is only likely to be successful within a limited range of ISIs, beyond which 

such retrieval may fail, resulting in detrimental effects on learning. In this way, the dual 

14 

 

 

process assumed by the reminding account can accommodate the above discussed findings of 

nonmonotonicity in the lag function as well as the other previously discussed findings that are 

potentially of relevance for second language acquisition. Importantly, the reminding account 

may be able to explain the mixed findings obtained in the field of SLA regarding the effects 

of spacing repeated study of SLA material. A failure to retrieve the previous encounter with a 

repeated item, or to process a repeated encounter as repeated, may be the reason, as has been 

speculated though not directly tested in a number of SLA studies (see, e.g., Elgort & Warren, 

2014; Serrano, 2011), for failure to observe benefits of spacing in some SLA research.  

Investigating the role of attention and effort in learning from repetition  

Attention is known to be important for learning a second language (Gass, 1988; 

Robinson, 2003; Schmidt, 1990, 2010). Amount of attention or study given to a target L2 

word has been shown to be positively related to memory for the words (Godfroid et al., 2018; 

Godfroid, et al., 2013; Koval, 2019). In both psychology and SLA, studies investigating 

repeated study of target items show that the more time a learner spends studying a given word 

per repetition, the better the learning outcomes (Godfroid et al., 2018; Godfroid, et al., 2013; 

Koval, 2019; Rundus, 1971). Such studies further showed that when learning targets are 

encountered or studied multiple times, reading or study time decreases across repetitions, 

though the steepness of the slope may depend on the temporal distribution of repetitions 

(Koval, 2019; Rundus, 1971; Shaughnessy et al., 1972). In Koval (2019), I showed that the 

amount of study given to L2 words studied in sentence contexts, which was greater with 

spaced repetitions than with massed repetitions, mediated the learning benefits obtained by 

spacing the repetitions more widely. 

15 

 

 

Studies testing deficient processing of massed repetitions as an explanation for the 

beneficial effects of spacing have employed different methods to measure effort and amount 

of attentional processing that learners choose to allocate to target items. In some studies, 

participants have studied words presented one per slide and pressed a button to indicate that 

they wished to move on to the next slide with a new word. The time between the onset of each 

slide and the button press was recorded and used as an index of study time for the word in 

question (Rundus, 1971; Shaughnessy et al., 1972; Zimmerman, 1975). In some such studies, 

participants were asked to rehearse aloud during study and the time during which such overt 

rehearsal was produced was used as a more precise measure of processing time (Rundus, 

1971, Experiment 3; Zimmerman, 1975). In Koval (2019), I recorded participants’ eye 

movements as they read L1 sentences with embedded L2 words that participants studied for a 

subsequent test. For my main analysis, I used the measure of total reading time, which is an 

index of the amount of time a word was looked at within a given sentence in total, that is, 

during the first time the gaze landed on the word and each time the word was subsequently 

revisited, before the participant chose to move on to the next sentence. Based on this measure, 

I inferred the amount of attention the words received in the massed condition, where the same 

word repeated in consecutive sentences, and in the spaced condition, where the same word 

repeated in sentences separated by other sentences containing other target L2 words plus a 

distractor math task.  

Studies investigating how learners choose to allocate study time have shown that 

learners tend to overestimate their knowledge of items that are repeatedly studied in close 

succession (massed practice) and consequently give less study time to these items (Benjamin 

et al., 1998; Kornell, & Bjork, 2007; Koval, 2019; Rundus, 1971; Shaughnessy et al., 1972; 

16 

 

 

Zechmeister & Shaughnessy, 1980; Zimmerman, 1975). Generally, learners are known to be 

quite ineffective at pacing their own study (Benjamin et al., 1998; Jacoby, Bjork, & Kelley, 

1994; Kornell, & Bjork, 2007). Consequently, an interesting question that has important 

practical implications for the development of pedagogical tools, including computer programs 

that present L2 words for learning using the PAL method, is whether the amount of time a 

learner is given for study of an L2 word per encounter affects learning in the same way as 

does learner-regulated study (De Jonge, Tabbers, Pecher, & Zeelenberg, 2012). Intuitively, 

one would expect that the amount of time available for study of a novel word should be 

positively related to learning: the longer a learner spends on the task of learning a given word, 

the better they will remember it on a subsequent test (Ebbinghaus, 1885/1964), in line with 

what has been found with self-paced study. If learners tend to not be effective at pacing their 

study, can we improve learning by controlling the pace at which words are studied? 

Predetermining study time for a given item to be longer may help counteract poor study 

strategies and the ineffective pacing that learners tend to adopt. Studies that have measured 

the time participants choose to allocate to study of massed and spaced items have argued that 

the underlying reason for benefits of spaced practice is that learners choose to spend more 

time studying the items in the a spaced condition relative to a massed condition. If, in a purely 

quantitative way, longer study time underlies the beneficial effects of spacing, by holding 

study time constant at two levels across the ISI conditions, we may fail to observe any effects 

of ISI but instead observe a strong effect of presentation duration, or study time. 

Alternatively, it may be the case that the quality of processing may change beyond the point at 

which a learner would have chosen to move on to a different item if they were free to control 

their own pace. It is quite likely that the processes that are engaged during the initial stages of 

17 

 

 

presentation of an L2-L1 translation pair, where the learner establishes or revises form-

meaning mappings, differ qualitatively from those engaged once this process is complete and 

the learner simply repeats the information to themselves to maintain it in short-term memory. 

However, there may further be a qualitative difference between processing that is beyond 

such an initial recognition and encoding stage though it is still learner-regulated, where 

learners feel like they have not reached a kind of a saturation point at which they would wish 

to stop studying a given word and move on to the next item, and processing that occurs after 

such a saturation point, where rehearsal is externally imposed on the learner.  

Psychology studies have examined the effects of other-imposed total time given for 

study on subsequent recall (Bugelski, 1962; De Jonge et al., 2012; Johnson, 1964; Murdock, 

1960), as well as presentation duration per trial while holding total time allowed for study 

constant (Zeelenberg, de Jonge, Tabbers, & Pecher, 2015). As intuition would suggest, the 

amount of time a participant is given for study of target items was often shown to be  

positively related to later recall of the items (Bugelski, 1962; Johnson, 1964). This is in line 

with proposals that the time an item spends in primary, or short-term, memory during study is 

positively related to later recall (Atkinson, & Shiffrin, 1968; Braun & Rubin, 1998; Rundus, 

1971; Rundus, & Atkinson, 1970, Waugh & Norman, 1965). There are, however, important 

findings to the contrary. Thus, for example, in their well-known Experiment 1, Craik & 

Watkins (1973) had participants listen to a list of L1 words for an immediate memory test, 

where they would have to report the last word that started with a given letter. This forced 

participants to maintain each word that starts with the letter in question (critical word) in 

memory until they encountered the next word that started with the same letter, at which point 

they switched to rehearsing this new word. The number of intervening noncritical words 

18 

 

 

(which did not begin with the critical letter) was varied, resulting in different lengths of time 

during which a critical word had to be maintained in working memory. The results showed no 

benefit of longer intervals over short intervals on a surprise recall test given after a short break 

following the last list of study items. Such a finding goes against evidence that amount of 

rehearsal has benefits for learning (Atkinson, & Shiffrin, 1968; Rundus, 1971; Waugh & 

Norman, 1965). Following Craik and Lockhart’s proposal (1972), Craik and Watkins 

suggested that the mode of rehearsal may be key: simply repeating a word to oneself to 

maintain it in primary memory (known as maintenance rehearsal) may not hold much benefit 

for longer-term retention. Thus, the amount of rehearsal, or the time an item spends in short-

term memory, is argued to only have benefits for long-term retention when the item is being 

processed elaboratively (or associatively). Craik and Watkins’ Experiment 2 further showed 

that an increased number of overt maintenance rehearsals did not improve long-term retention 

of target items. The authors conclude that maintenance of a studied item in short-term storage 

does not necessarily increase its strength in the long-term store. While maintenance rehearsal 

may have limited benefits for the final test of free-recalling which of the many well-known L1 

words had been seen during an experiment, it is not obvious that the amount of time a learner 

is allowed to rehearse a novel L2 word form presented with its L1 translation will produce the 

same pattern of results. Thus, it is an interesting question whether increasing study time, or 

adding rehearsal time, for an L2-L1 translation pair at each repetition will benefit learning of 

the L2 word.  

Longer study time at each repetition may additionally have an effect on the effort and 

success of retrieval during subsequent repetitions within the study phase and thus may affect 

the operation of the investigated underlying mechanisms of spacing practice (Verkoeijen & 

19 

 

 

Bouwmeester, 2008). Verkoeijen and Bouwmeester manipulated presentation rate during 

study (1 second vs. 4 seconds per word). Based on posttest results, they identified a high 

performance group and a low performance group among their participants. They found that 

while the former group benefitted from spaced practice regardless of the presentation rate, the 

latter group benefitted from spacing only when the presentation rate was longer. They suggest 

that longer presentation duration serves to establish stronger memory traces at each repetition 

which may be more likely to survive longer lags between repetitions.  

Effort has been shown to benefit learning in diverse experimental paradigms in 

psychology. For example, Auble and Franks (1978) showed that providing more time for 

effort toward sentence comprehension resulted in better subsequent recall performance. More 

work and effort that is required by a task has widely been shown to be beneficial for learning 

outcomes (e.g., Benjamin et al., 1998; Gardiner et al., 1973; Soderstrom, Kerr, & Bjork, 2016; 

Whitten & Bjork, 1977). A number of studies have operationally defined effort as response 

latencies in the performance of various tasks (Braun & Rubin, 1998; Glover, 1989; Karpicke 

& Roediger, 2007; Logan & Balota, 2008; Maddox & Balota, 2015; Maddox et al., 2018; Pyc 

& Rawson, 2009). In investigating effects of spacing and lag on response latencies and 

success as well as on subsequent learning gains, Braun and Rubin (1998) found that effort in 

covert retrieval of a previous presentation of L1 words that were related in form increased 

with lag however no lag effect was observed in learning gains beyond a spacing effect. The 

opposite pattern was observed in Maddox et al. (2018). Using L1 word recognition latencies 

as a proxy for retrieval effort, the authors found a lag effect in posttest scores but no 

difference in study-phase recognition latencies beyond the effect of spacing versus massing of 

repetitions, contrary to the predictions of the reminding account. In their 2015 study, Maddox 

20 

 

 

and Balota had participants study arbitrary L1 word pairs. They recorded latencies for overt 

retrieval of paired associates (cued recall) across a number of repeated retrieval attempts in 

younger and older adults. The results of their experiments were overall consistent with the 

reminding account. However, the task of learning novel L2 forms with their meanings may 

involve a different dynamic of underlying processes than the task of recognizing known L1 

words or retrieving their arbitrary L1 word associates as well as the process of studying 

arbitrary pairings of known L1 words. The learning outcomes measured in the field of 

psychology are also often different: while in L2 vocabulary learning we are concerned with 

learners’ acquisition of novel word forms and the development of form-meaning mappings, in 

psychology research the target knowledge may be associations of arbitrary well-known L1 

words, ability to free recall as many as possible, or even memory of their relative order during 

acquisition. Thus, results from psychology studies may often have limited relevance for 

learning of an L2 (Nelson & Dunlosky, 1994).  

Often, in the spacing effect research, target words or other items are studied only 

twice, although there are exceptions. Maddox and Balota (2015), for example, investigated 

paired-associate learning of known L1 words over a number of repetitions. In this study, 

however, the retrieval attempts were not followed by feedback. Feedback is often not 

provided in psychology research due to the specific research questions that are often different 

from those in second language learning. Further, because no feedback is provided, here only 

items that are correctly retrieved during the study phase are usually analyzed in terms of effort 

and learning outcomes (Braun & Rubin, 1998; Maddox & Balota, 2015; Maddox et al., 2018; 

Pyc & Rawson, 2009). In an L2 vocabulary learning context, however, because feedback is 

usually provided, it makes sense to analyze both successful and unsuccessful retrieval 

21 

 

 

attempts during the study phase. This is because, while in a study such as Maddox and Balota 

(2015), items that are not successfully retrieved in early repetitions during study phase are 

very unlikely to be successfully retrieved in later repetitions and are mostly simply forgotten, 

this pattern is reversed with the type of practice that is done in L2 vocabulary learning and 

where feedback is provided: here, retrieval success will likely grow across repetitions as 

learners learn from the feedback that is provided following each retrieval attempt. 

Investigating the role of reminding in learning from repeated study 

Reminding, or the retrieval of the previous encounter(s) with the to be learned 

material, has been shown to be important for retention of studied material (Batchelder, & 

Riefer, 1980; Bellezza, Winkler, & Andrasik, 1975; Bruce & Weaver, 1973; Glanzer, 1969; 

Glover, 1989; Jacoby, 1974; McKinley et al., 2019; Robbins & Bray, 1974; Wahlheim et al., 

2014). Such reminding may be triggered by a repeated encounter with the same material or an 

encounter with related material (Benjamin & Tullis, 2010; Braun & Rubin, 1998; McKinley et 

al., 2019). The effects of reminding on memory have been observed with various tasks 

employed in psychology research, such as classification (category) learning (Medin & 

Schaffer, 1978; Ross, Perkins, & Tenpenny, 1990), ambiguity resolution (Ross & Bradshaw, 

1994; Tullis, Braverman, Ross, & Benjamin, 2014), and problem solving (Ross, 1984), and 

with various outcome measures, such as cued recall (Jacoby & Wahlheim, 2013), free recall 

(Tullis et al., 2014), absolute and relative temporal (recency or order) judgments (Hintzman, 

2010; Jacoby & Wahlheim, 2013), frequency judgements (Hintzman, 2004), and list 

discrimination (Jacoby & Wahlheim, 2013).  

Current theories of the spacing effect include a key role for reminding, or retrieval of 

an item’s earlier presentation upon repeated encounters, during the study phase (referred to as 

22 

 

 

study-phase retrieval) for observing the beneficial effects of spacing repeated study. Positive 

effects of successful study-phase retrieval on learning from spaced study have been found in 

various tasks employed by psychology research to investigate the spacing or lag effects 

(Appleton-Knapp, Bjork, & Wickens, 2005; Benjamin & Tullis, 2010; Braun & Rubin, 1998; 

Greene, 1989; Hintzman, 2004, 2010; Hintzman, Summers, & Block, 1975; Pavlik & 

Anderson, 2005; Raaijmakers, 2003; Siegel & Kahana, 2014; Thios and D’Agostino, 1976). 

Thus, in addition to enhancing learning from repetition, reminding may be crucial for 

observing beneficial effects of spacing (Thios and D’Agostino, 1976). A higher chance of 

study-phase retrieval success is believed to be the reason underlying the findings of benefits 

of expanding spacing schedules. Thus, for example, Maddox et al. (2011) found that positive 

effects of an expanding schedule were conditional on initial repetitions being close enough to 

the original encoding to produce successful retrieval. This is explained based on the logic that 

if an item can be retrieved more easily from working memory upon its second presentation, 

scheduling initial repetitions closely together might ensure a stronger encoding that is more 

likely to survive increasingly longer subsequent lags. However, again, Maddox et al.’s design 

did not include feedback. A different pattern may be observed when each retrieval attempt is 

followed by the presentation of the target material, as here initial retrieval success may not be 

as crucial.  

While in much psychology research study-phase retrieval is inferred based on the 

experimental design, some studies have attempted more direct investigation of the reminding 

process. This has been accomplished with the help of a number of techniques, such as the 

continuous recognition or repetition detection paradigm (Bellezza et al., 1975; Braun & 

Rubin, 1998; Kiliç et al., 2013; Maddox et al., 2018; Wahlheim et al., 2014). Here, 

23 

 

 

participants are presented with stimuli, such as advertisements (Appleton-Knapp et al., 2005), 

L1 words (Maddox et al., 2018), or novel letter strings such as CCC and CVC strings 

(Bellezza et al., 1975), that repeat at different lags and whose repeated presentations are 

interleaved with the presentation of other advertisements, L1 words, novel letter strings, etc. 

Participants are to perform a repetition detection task (or old/new judgment), that is, they are 

to judge whether a given item has or has not occurred previously during the study phase. 

Bellezza et al. (1975) were among the first to demonstrate that items that are recognized as 

repeated upon their second presentation have a memorial advantage in the posttest 

performance.  

Success/failure of study-phase retrieval has also been investigated with the help of 

what are known as indirect or implicit memory tests (Richardson-Klavehn & Bjork, 1988). In 

an indirect memory test, participants do not engage in an active search of their memory. 

Instead, retrieval of previously presented information is inferred based on changes in task 

performance, such as faster task performance (Koval, 2019; McKinley et al., 2019). 

Finally, the effects of study-phase retrieval success have also been investigated by 

asking participants to overtly retrieve studied information, such as the second member of a 

pair of words studied in a PAL format (Maddox & Balota, 2015; Maddox et al., 2011). 

Maddox and Balota (2015) used successful overt retrieval of the paired associate as an index 

of successful study-phase retrieval (or reminding). Additionally, as done in previous research 

(Glover, 1989; Karpicke & Roediger, 2007; Logan & Balota, 2008; Maddox et al., 2011; 

Maddox et al., 2018; Pyc & Rawson, 2009), they used study-phase response latencies as a 

proxy for retrieval difficulty, which enabled them to test successful effortful overt retrieval in 

L1 paired-associate learning as a proposed mechanism for the effects of spacing.  

24 

 

 

Success of study-phase retrieval may depend on certain variables that may affect the 

probability such retrieval. One such variable may be how similar the context at repetition is to 

that at a prior encounter (Appleton-Knapp et al., 2005; Verkoeijen et al., 2004). Crucially, the 

probability of study-phase retrieval at a repeated encounter also depends on the strength of the 

memory trace that was laid down at a previous encounter. Verkoeijen et al. (2005) showed 

that when items are studied intentionally they show larger spacing effects and a longer 

optimal ISI. This may be attributed to stronger memory traces laid down during intentional 

study. Verkoeijen and Bouwmeester (2008) manipulated presentation rate during study (1 

second vs. 4 seconds per word) and found that participants who had lower performance on the 

posttest benefitted from spaced practice only when presentation duration was longer. 

Verkoeijen and Bouwmeester discuss these results in terms of differential success of study-

phase retrieval and the role of presentation rate for establishing stronger encodings that make 

such success more likely. However, the authors acknowledge that a limitation of their design 

is that they did not include a direct measure of study-phase retrieval but only inferred it based 

on the logic that participants who recalled more items at test likely had a higher rate of 

successful retrieval during study.  

Working under the assumption that study-phase retrieval plays an important role in 

learning from repetition, Bui et al. (2013) asked the question of whether individual differences 

in the ability to retrieve a previous exposure affected learning from spaced repetition. Holding 

the ISI constant at 30 seconds, the researchers manipulated the difficulty of the intervening 

30-second task, reasoning that this should modulate participants’ ability to retrieve the earlier 

information due to differential degrees of interference. These authors did not interleave 

studied words but, instead, used an unrelated intervening task between repeated study of 

25 

 

 

target words. They found that individuals with lower working memory capacity showed 

greater learning when the intervening task difficulty was low while individuals with higher 

memory capacity benefitted from a difficult intervening task. These results, too, are 

interpreted in terms of difficult reminding, or successful effortful retrieval. Here, again, study-

phase retrieval was inferred rather than directly tested. 

The nature of study-phase retrieval – that is, what exactly must be retrieved – is yet to 

be fully specified. Some efforts have been made in this direction, however. Delaney, Godbole, 

Holden and Chang (2018) investigated the nature of study-phase retrieval. Specifically, they 

asked whether the reminding mechanism relies on recollection, which is a process that 

involves retrieval of an earlier presentation, or on simple recognition, which does not involve 

an active memory search process but relies only on a judgment of familiarity (Oberauer, 2005; 

Yonelinas & Jacoby, 2012). The authors addressed this question by testing potential 

moderating effects of working memory span on the effect of spacing. If successful study-

phase retrieval relied on explicit retrieval of episodic information, which depends on an 

individual’s operational span (McCabe, Roediger, McDaniel, Balota, & Hambrick, 2010), a 

lag by span interaction was expected, where longer lags benefit learning in individuals with 

high working memory capacity but not in individuals with low working memory capacity. 

The authors found that spacing and working memory had an additive, rather than a 

multiplicative, effect on learning, suggesting no involvement of capacity-dependent 

mechanisms. The authors conclude that study-phase retrieval relies on a process of 

recognition rather than recollection. These results contradict the finding by Bui et al. (2013), 

who found that working memory capacity did play a role when repeated study was separated 

by a more difficult task.  

26 

 

 

The nature of study-phase retrieval and the ways in which its operation may be 

affected by variables that are relevant for specific learning situations are still far from being 

fully understood. Further, its operation during study of a second language has not been 

investigated directly. The present study is a first step towards understanding the complex 

nature of  the relationships between retrieval effort and success with regard to novel L2 

vocabulary learning by investigating the role of overt form-meaning mapping retrieval effort 

and success over six repetitions that occur at three different levels of ISI in the presence of 

feedback that follows each retrieval attempt, as well as the ways in which the time a learner is 

given for study of the target L2-L1 pairs may affect these relationships.  

 

Overt retrieval practice and its effects on memory. The present study investigates 

the mechanism of retrieval effort and success as underlying any effects of lag in overt 

retrieval practice. Overt retrieval practice has been widely shown to enhance learning of target 

material. This known as the retrieval effect (Carrier & Pashler, 1992; Cull et al., 1996). The 

act of retrieval is known to be a “memory modifier” (Bjork, 1975), which refers to the fact 

that the memory trace of the information that is retrieved is altered such that it becomes more 

strongly represented and better connected with more robust, more elaborate, and more 

numerous retrieval routes, and is, consequently, more accessible for future recall (Birnbaum 

& Eichner, 1971; Bjork, 1975; Izawa, 1971, 1985; Karpicke, & Roediger, 2008; McDaniel, & 

Masson, 1985; Myers (1914); Storm, Bjork, & Storm, 2010; Wenger, Thompson, & Bartling, 

1980;  Whitten & Bjork, 1977). The act of retrieval is known to slow and otherwise interfere 

with forgetting of learned information (Hogan & Kintsch, 1971; Izawa, 1970; Maddox & 

Balota, 2015; Runquist, 1986; Wheeler & Roediger, 1992). Retrieval practice may further 

often constitute more transfer appropriate processing for many skills (Kolers & Roediger, 

27 

 

 

1984; McDaniel, Friedman, & Bourne, 1978; Morris, Bransford, & Franks, 1977), such as 

when the meaning of an L2 word must be retrieved during comprehension of the second 

language input. Because most use of acquired knowledge involves retrieval of various aspects 

of learned material as well as of their interrelationships, according to transfer appropriate 

processing theory (Morris et al., 1977), retrieval practice may promote such subsequent 

retrieval to a greater extent than practice that does not involve retrieval.   

The retrieval effect is closely related to the testing effect, which is the widely observed 

finding that taking a test on the to-be-learned material is a more potent learning event than 

restudying the material, particularly for long term retention (Allen, Mahler, & Estes, 1969; 

Carpenter, Pashler, & Vul, 2006; Carrier & Pashler, 1992; Hogan & Kintsch, 1971; Kuo & 

Hirshman, 1996; Roediger & Butler, 2011; Roediger & Karpike, 2006; Spitzer, 1939; 

Thompson, Wenger, & Bartling,1978; Wheeler, Ewers, & Buonanno, 2003). The effects of 

testing have been obtained even in situations where there is no feedback following learners’ 

attempts at retrieving information (Balota, Duchek, Sergent-Marshall, & Roediger, 2006; 

Hogan & Kintsch, 1971). Testing effects are still observed when processing time between a 

tested and a study-only condition is equated or is in favor of the restudy condition (Carpenter 

et al, 2006; Glover, 1989), indicating that the act of retrieving information is a cognitive 

process that differs fundamentally from simple study or exposure to the target material. Thus, 

the benefit of retrieval cannot be reduced to additional time on task (Carrier & Pashler, 1992; 

Kuo & Hirshman, 1996; Roediger & Karpike, 2006).  

The terms testing effect and retrieval effect are often used interchangeably in research 

on their effects and underlying causes for their benefits. Further, it is widely believed today 

that the effects of testing are primarily due to retrieval processes that act on memory traces by 

28 

 

 

elaborating and strengthening them (Bjork, 1975; Glover, 1989; Kornell, Hays, & Bjork 2009; 

McDaniel & Masson,1985; Roediger & Karpike, 2006). The effects of testing have been 

shown to increase with repeated testing (Karpicke & Roediger, 2008; Soderstrom et al., 2016; 

Wheeler & Roediger, 1992) and with feedback provided after retrieval attempts (Cull, 2000; 

Pashler, Cepeda,Wixted, & Rohrer, 2005). Further, unsuccessful retrieval attempts are still 

known as powerful learning events (Donaldson, 1971; Izawa, 1970; Kornell et al., 2009) and 

are known to promote deeper processing or encoding of the information contained in the 

feedback that follows than when the presentation of the same information is not preceded by a 

retrieval attempt. This is known as test-potentiated learning (Arnold & McDermott, 2013; 

Hays, Kornell, & Bjork, 2013; Izawa, 1970; Kornell et al., 2009; Roediger & Karpike, 2006).  

Retrieval effort is argued to underlie the benefits of testing as well as findings that 

tests involving recall or constructed response lead to better subsequent retention than tests that 

only require easier tasks such as recognition or identification (Gardiner et al., 1973; Jacoby, 

1978; Rowland, 2014). Retrieval effort is generally known to be beneficial for learning 

(Benjamin et al, 1998; Gardiner et al., 1973), and retrieval practice is known to be more 

beneficial the more effortful or complete the retrieval (Bjork, 1975; Glover, 1989; Whitten & 

Bjork, 1977). In fact, even when effort leads to more retrieval failures or errors during the 

learning phase, this still leads to better retention in the long term (Pashler, Zarow, & Triplett, 

2003; Schmidt & Bjork, 1992; Soderstrom et al., 2016; Storm et al., 2010).  

One way to induce more effortful retrieval is to put more time between the encoding 

event and the retrieval event (Cull, 2000; Glover, 1989; Jacoby, 1978; Modigliani, 1976; 

Roediger & Karpicke, 2006b; Soderstrom et al., 2016; Whitten & Bjork, 1977). Such a delay 

of retrieval has been shown to enhance learning from tests (Jacoby, 1978; Modigliani, 1976) 

29 

 

 

and is attributed to greater effort required to retrieve information after some time has gone by 

since the encoding event. Extending the concept of fuller or more complete encoding that is 

argued to underly the benefits of spaced study relative to massed study, Glover (1989) argued 

that spaced retrieval attempts involve fuller retrieval of information than massed retrieval 

attempts because spaced retrieval is not supported by residual activation of the target stimulus 

as is the case when information is retrieved from short-term memory in massed retrieval. Such 

completeness of the retrieval process, in turn, leads to better memory for the studied material 

in spaced relative to massed retrieval. Thus, unlike retrieving information that was only 

recently presented and that still resides in short-term memory, retrieving information that was 

presented longer ago is more difficult, requires a more complete retrieval operation, and is, 

consequently, a more powerful learning method. Spaced retrieval practice has been widely 

found to be superior to massed retrieval practice (Craik, 1970; Cull, 2000; Cull et al., 1996; 

Logan & Balota, 2008). 

Another way to ensure more difficult retrieval is to increase contextual interference 

(Bjork, 1994; Storm et al., 2010). This means that retrieval is more difficult when there is 

more similarity among the numerous learning targets or learning occurs amidst a multitude of 

other similar forms that a participant is exposed to even if these are not the focus of learning. 

Such high interference is usually characteristic of L2 learning contexts, where, the input 

contains large numbers of forms that often resemble each other and many of which follow the 

same phono- or orthotactic patterns. Particularly for novice learners, input can be 

overwhelming when it contains multiple unknown (and often not targeted in initial stages) 

forms, which may create interference. In a similar vein, interleaving retrieval attempts for 

different target items produces superior learning in the long term, which is attributed to more 

30 

 

 

retrieval difficulty resulting from the interference of intervening retrieval attempts 

(Linderholm, Dobson, & Yarbrough, 2016). 

Just as is the case with the spacing effect, retrieval practice produces benefits of 

considerable size and is a very general and consistent finding. Just as is the case with the 

spacing effect, its full potential has not been used in education (McDaniel & Fisher, 1991; 

Roediger & Karpicke, 2006a). Given that retrieval practice improves learning and that 

repeated retrieval attempts may further increase learning gains (Bahrick, 1979), a good 

question is how these retrieval attempts should be optimally distributed to achieve maximum 

learning. Spaced retrieval practice combines the benefits of spacing and retrieval and thus 

potentially maximizes learning. How best to do it is still a question (Storm et al., 2010), 

however.  

In experiments that have directly measured study-phase retrieval success, study-phase 

performance has been shown to be consistently better in the massed condition than in the 

spaced condition, while the opposite holds for long term retention (e.g., Bahrick, 1979; Balota 

et al., 2006; Carpenter & DeLosh, 2005; Karpicke & Roediger, 2007). Similarly, in studies 

that have compared expanding schedules with uniform-interval schedules, acquisition 

performance is usually better in an expanding schedule (e.g., 1-3-5) than in an uniformly-

spaced schedule (e.g., 3-3-3); however, performance on posttests that are administered with a 

longer delay is usually either equal in the two conditions or in favor of the uniformly-spaced 

condition (Balota et al., 2006; Carpenter & DeLosh, 2005; Logan & Balota, 2008; Storm et 

al., 2010). This seems counter-intuitive as the main rationale behind using expanding spacing 

schedules is that such a schedule supports successful study-phase retrieval at ever-increasing 

intervals, which is argued to underlie the beneficial effects of spacing. Further, in later 

31 

 

 

repetitions that follow an expanding schedule, target items are retrieved after intervals that are 

considerably longer than those in the uniform-interval condition (because the average spacing 

is usually equated between the two conditions), which should further promote more effortful 

successful retrieval in an expanding schedule. Advantages of uniformly-spaced schedules 

over expanding schedules are often obtained in the absence of feedback, which means that 

information that is not retrieved during the study phase in the uniform-interval condition is 

simply forgotten. However, in terms of delayed posttest scores this condition still outperforms 

an expanding schedule condition that is specifically designed to minimize forgetting during 

the study phase. This finding is puzzling. It has been proposed that the initial retrieval attempt 

must be effortful to produce memory benefits (Karpicke & Roediger, 2007; Logan & Balota, 

2008; Modigliani, 1976), which may explain why uniformly-spaced schedules (where the 

initial retrieval attempt is always after a longer interval than is the case in an expanding 

schedule) do no worse and often even better than expanding schedules, where retrieval 

success is higher during study. In fact, the benefits of equal-interval schedules have been 

attributed by some researchers to less retrieval success during acquisition under such 

conditions (Storm et al., 2010). This suggests that study-phase retrieval success may play a 

limited role under certain circumstances, particularly when such retrieval is less effortful 

(Pashler et al., 2003; Storm et al., 2010). 

 Retrieval practice is usually investigated within a paired-associate learning format. 

PAL consists of learning to associate two members of a pair of stimuli (Allen et al., 1969; 

Carrier & Pashler, 1992; Cull et al., 1996; Greeno, 1964; McDaniel & Masson, 1985; Nelson, 

Leonesio, Shimamura, Landwehr, & Narens, 1982). The task of retrieving the second member 

of a pair of associates is also sometimes referred to as a cued-recall task (e.g., Carpenter, et 

32 

 

 

al., 2006; McDaniel & Masson, 1985). This task is relevant for many learning situations, such 

as for learning to associate a meaning with a foreign word, and is a method that is often used 

in L2 vocabulary learning. In the field of psychology, the studied pairs are most often two 

weakly related L1 words (e.g., Jacoby, 1978; Logan & Balota, 2008; Maddox & Balota, 

2015). While useful for the investigation of many memory phenomena, the task of associating 

two L1 words is not in itself a real-life task. More real-world learning targets have also been 

used, such as the learning of low-frequency L1 words with their definitions (Gardiner et al., 

1973; Rohrer, Taylor, Pashler, Wixted, & Cepeda, 2005, Exp. 2), or L1-L2 or L2-L1 

translation pairs (Arnold & McDermott, 2013; Barrick, 1979; Bahrick, Bahrick, Bahrick, & 

Bahrick,1993; Callan & Schweighofer, 2010; Carrier & Pashler, 1992; Kang, Lindsey, Mozer, 

& Pashler, 2014; Karpicke & Roediger, 2008; Pashler, et al., 2005; Pashler et al., 2003; Pavlik 

& Anderson, 2005; Pyc & Rawson, 2009). Psychology studies using foreign word learning 

have generally obtained benefits of spaced retrieval practice over massed retrieval practice as 

well as benefits of retrieval over restudying, particularly on delayed tests, which reflect long-

term knowledge that is more relevant for L2 learning.   

Research into the spacing effect and retrieval practice in second language acquisition   

The spacing effect has generated some interest in the field of second language 

acquisition. A small number of studies have looked at the effects of spacing practice on L2 

grammar acquisition (Bird, 2010; Miles, 2014; Kasprowicz, Marsden, & Sephton, 2019; 

Rogers, 2015; Suzuki, 2017; Suzuki & DeKeyser, 2017; Suzuki, & Sunada, 2019). Other 

studies have explored the effect in the context of vocabulary acquisition (Bahrick et al.,1993; 

Bahrick & Phelps, 1987; Bloom & Shuell, 1981; Miles & Kwon, 2008; Nakata, 2015; Nakata 

& Suzuki, 2018; Nakata & Webb, 2016; Schuetze, 2015). Thus, for instance, Nakata (2015) 

33 

 

 

investigated the effects of spacing study of vocabulary within a PAL format. More 

specifically, he investigated the effects of an expanding spacing schedule. Recall that an 

expanding spacing schedule refers to using increasingly longer time intervals between 

repetitions (Kang et al., 2014; Landauer & Bjork (1978) rather than constant intervals. Nakata 

found a large positive main effect for spacing but only a small positive effect of using an 

expanding schedule.  

Similar results were obtained in Schuetze (2015), where in a between-subjects design, 

students studied English-German translation pairs in a classroom setting. The translation pairs 

were presented four times in total for 8 seconds per presentation with a different number of 

days between the repeated presentations. Participants were tested for production of the L2 

German words cued by their L1 English translations three times, with the last test being eight 

weeks after the study phase. Schuetze found that results from the expanding-interval schedule 

practice were superior to those for the equal-interval schedule in the shorter term while this 

pattern was reversed in the longer term, where the equal-interval group showed much less 

forgetting than the expanding-interval group. This is in line with findings in psychology. An 

important difference between immediate and four-day delayed posttests was also found by 

Bloom and Shuell (1981) in another between-subject classroom study, where L1 English 

learners studied L2 French words in written vocabulary activities. The words were practiced 

either within one session (the massed condition), or distributed over three days (the spaced 

condition). While similar levels of learning gains were obtained in the massed and spaced 

conditions on the immediate posttests, on the delayed test administered seven days later, the 

scores in the spaced study condition were superior to those in the massed study condition.  

34 

 

 

Some SLA research into spacing and vocabulary learning has also included 

investigations of other variables that are relevant for vocabulary learning contexts. Thus, 

Nakata and Suzuki (2019) investigated the effects of spaced practice on the acquisition of 

semantically related and unrelated words, also in a PAL format. Because learning of 

semantically related words (semantic clustering) had been found in previous research to 

produce interference effects that hinder acquisition, the authors reasoned that spacing practice 

of semantically related words would alleviate such interference and would, therefore, be 

beneficial for learning of semantically related words. Thus, the authors asked whether spacing 

practice benefits semantically related and unrelated words differently. The authors found that 

spacing was beneficial for both related and unrelated words and that, contrary to expectation, 

unrelated words benefited from spacing more than did related words. Nakata and Webb 

(2016) manipulated learning set size, or the number of words studied at one time, (Experiment 

1) and spacing (Experiment 2) in learning of low-frequency L2 English words in a PAL 

format with retrieval practice, where participants were to produce the second member of a 

pair (both L2-L1 and L1-L2 translation for Experiment 1 and only L1-L2 translation for 

Experiment 2) before being provided with feedback. The authors found that spacing had 

larger beneficial effects than did the size of the learning set.    

Thus, for the most part, second language vocabulary learning studies have shown that 

spacing repeated study is beneficial. However, some second language studies have reported 

no effect of spacing or even the opposite effects, where spacing was found to be actually 

detrimental to learning outcomes. Thus, Elgort and Warren (2014), who investigated novel 

word learning from incidental exposure during reading of a long authentic text (without the 

use of a dictionary) over a ten-day period, found that novel words that repeated in the same 

35 

 

 

chapter of the book were remembered better than those that repeated across chapters, 

especially for the less proficient readers. The authors speculate that this may be due to 

memory trace decay between repetitions, which may interfere with the development of lexical 

semantic representations and abstraction of a core meaning of a word. The fact that the more 

widely spaced repetitions were particularly detrimental for the lower proficiency learners is in 

line with the argument that memory trace survival is important. Retrieving the previous 

encounter with a word or processing an encounter with a given word as a repetition may be 

less likely to be successful if the process of L2 comprehension is a difficult task (Bui et al., 

2013).  

Similarly, Suzuki and DeKeyser (2017) found no advantage of practice separated by a 

week over practice repeated by a day for proceduralization of grammatical knowledge (and, in 

fact, found some benefit for the latter). The authors attribute this finding to the fact that the 

task used in their study was more complex compared to psychology experiments that have 

used simple tasks and showed large benefits of spacing. Indeed, optimal ISI is known to be 

shorter for more complex tasks (Donovan & Radosevich, 1999), which, again, makes sense if 

one assumes that the memory traces that are established need to be strong enough to survive 

longer lags and that any interference produced by a complex task that is performed in the 

interim may decrease the chances of retrieving prior encounters at a subsequent repetition 

(Bui et al., 2013; Verkoeijen et al., 2005).  

Detrimental effects of distributing second language study have also been obtained in 

the context of program evaluation. Such research has compared the effectiveness of intensive 

programs, where study sessions are massed closely together, with extensive programs, where 

study sessions are spread more widely over time. This research has consistently shown that 

36 

 

 

intensive programs are more effective (Collins et al., 1999; Serrano, 2011; Serrano & Munoz, 

2007; White & Turner, 2005), particularly for lower-proficiency learners (Serrano, 2011). 

This, again, is contrary to the widely observed benefits of distributing practice documented in 

the field of psychology and constitutes a finding of a reverse effect. Such a reverse finding 

suggests that effects of distributing practice may depend on variables that need to be taken 

into account and whose effects need to be known (Rogers, 2017). Some have attributed the 

failure to obtain a spacing effect in this research context to the simple fact that these studies 

did not use a delayed posttest (Bird, 2010; Serrano & Munoz, 2007), where the spacing effect 

usually manifests itself much more strongly (Rawson & Kintsch, 2005). Others have stressed 

that the type of knowledge targeted and the context of acquisition of this knowledge may be 

different or more complex in a language learning context than what is widely used in 

psychology experiments. It is argued, therefore, that applying findings from psychology 

studies to language learning contexts is not always straightforward (Bird, 2010, p. 640; 

Rogers, 2017). If we assume that processing repetitions as repetitions is important for learning 

from spaced practice, it may also be the case that detrimental effects of spacing on learning in 

extensive programs comes from the fact that it is more difficult to retrieve material presented 

in a previous session, or each new session may not have a high reminding potential of the 

previous session, when it is separated from the previous session by a longer time interval. 

While there may be some overlap between consecutive sessions, this may be more clearly felt 

when the sessions occur closely together than when they are separated by longer periods of 

time, allowing many of the details that might be used as cues for retrieval of previous 

encounters to fade to a greater extent.  

37 

 

 

SLA research has, thus far, focused mainly on the question of whether or not 

distributing practice produces superior learning outcomes for different aspects of a second 

language, without much direct investigation of the underlying mechanisms. The expectation 

that spaced practice should be beneficial for learning is based on the ubiquitous finding of 

benefits of spacing in psychological research. However, as discussed above, applying findings 

from psychology to L2 learning and teaching situations may not always be straightforward 

(Rogers, 2017). Further, it is widely believed today that beneficial effects of spacing study 

may rely on an interplay of different underlying mechanisms depending on the learning 

situation or target task (Gerbier &Toppino, 2015; Glenberg & Smith, 1981; Greene, 1989; 

Kornell & Bjork, 2008; Russo & Mammarella, 2002). The operation of these different 

mechanisms may further be affected by variables that characterize specific learning contexts 

(Verkoeijen et al., 2004; Verkoeijen et al., 2005). It is, therefore, important to investigate the 

process as well as the product of second language study under different levels of spacing. 

Only a few SLA studies have attempted an investigation of the process itself, however. 

Nakata and Suzuki (2019), for instance, measured learners’ retrieval success during the study 

phase through the task of overt L2-L1 translation. This methodology is similar to the one used 

by Maddox and Balota (2015), who asked their participants to retrieve the second member of 

a paired associate. In addition to using learning targets that are more relevant for SLA, an 

important difference that also makes Nakata and Suzuki’s study more relevant for L2 learning 

is that they provided feedback to the learners after each retrieval attempt. However, Nakata 

and Suzuki did not investigate posttest performance as a function of successful study-phase 

retrieval and, therefore, cannot inform as to the potential mediating effects of study-phase 

retrieval success. Further, in order to avoid a large number of unsuccessful retrieval attempts 

38 

 

 

by their participants, they broke down study of their 48 target words into two sets of 24, 

thereby avoiding a situation where the effects of study-phase retrieval failure on learning 

outcomes could be directly tested. Suzuki and DeKeyser (2017) included an ad hoc analysis 

of lexical retrieval performance during training on an element of L2 Japanese morphology. 

The distributed practice group, who practiced in two sessions separated by a week (versus one 

day, which was the case for the massed group), had more difficulty retrieving the vocabulary 

during the second session. The authors considered this variable ad hoc and speculated that 

ease and success of lexical retrieval may affect the nature of cognitive processes involved in 

distributed and massed learning.  

Another study that investigated the process as well as the product of learning under 

differential spacing is Koval (2019), in which I used eye-tracking methodology to test the 

deficient processing account of the spacing effect in L2 vocabulary learning from sentence 

reading. Two levels of ISI were used: the target words appeared either in consecutive 

sentences or in sentences that were separated by other sentences containing other target words 

plus a six-minute distractor math task. The choice of account was motivated by proposals in 

the field of SLA that attentional processing benefits learning of a second language in general 

(Gass, 1988; Robinson, 2003; Schmidt, 1990) and vocabulary learning success in particular 

(Godfroid et al., 2018; Godfroid, et al., 2013). I found that reading times on the target words 

decreased with repeated encounters for both spaced and massed repetitions (as had been found 

in other studies of L2 vocabulary learning from reading, Godfroid, et al., 2013) but did so 

more dramatically in the massed condition, resulting in less overt visual attention given to 

repeated encounters with the target vocabulary that occurred in consecutive sentences. I 

further found that attentional processing, as measured by total reading time, was a significant 

39 

 

 

mediator for the beneficial effects of spacing that were observed in the study, confirming that 

an attentional account of the spacing effect has relevance for contextual second language 

vocabulary learning. In this study, target words were embedded in different sentence contexts. 

Different contexts have previously been shown to benefit massed repetitions but to have a 

detrimental effect on learning from spaced repetitions (Verkoeijen et al., 2004). This finding 

has been explained in terms of a higher chance of failure to recognize a word as repeated 

(failure of study-phase retrieval) when it repeats in different contexts and the repetitions are 

widely spaced. To investigate whether differences in the sentence contexts may have 

detracted from learning in the spaced condition, I additionally investigated the downward 

trajectory in reading times in the spaced repetitions for evidence of a repetition effect (Joseph, 

Wonnacott, Forbes,  & Nation, 2014; Pellicer-Sánchez, 2016; Rayner & Duffy, 1986; Rayner, 

Raney, & Pollatsek, 1995). I used first exposures in the massed condition that occurred across 

the four blocks of the study phase as controls for potential effects of order or fatigue, thus 

isolating the effects of repetition from order effects. I found that there was significant 

facilitation in the total reading time measure that came with repeated encounters, suggesting 

that repeated encounters in the spaced condition were, in fact, mostly processed as repetitions 

despite differences in sentence contexts. Such an investigation of a repetition effect in terms 

of facilitation in reading times constitutes an indirect memory test (Richardson-Klavehn & 

Bjork, 1988), one in which participants are not asked to provide an overt retrieval response – 

as was the case in the explicit repetition detection judgments in studies such as Bellezza et al., 

(1975) and Maddox et al., (2018) or retrieval of the paired associate in Maddox & Balota 

(2015). In my study, intentionality of learning (Verkoeijen et al., 2005) combined with the 

relative ease of the intervening task (L1 sentence reading and simple math operations), which 

40 

 

 

means relatively low levels of interference (Bui et al., 2013), may have aided successful 

study-phase retrieval across the spaced encounters.  

More research is needed that explores the process as well as the product of learning 

second language material under different levels of temporal distribution of repetitions. SLA 

research needs to explore the potentially relevant mechanisms that may underly any effects of 

spacing as well as how the operation of the mechanisms may be affected by variables that are 

relevant for SLA contexts. The present study sets out to test the predictions of the dual-

mechanism reminding account (Benjamin & Tullis, 2010) by exploring the contribution of 

study-phase retrieval success and effort to learning L2 vocabulary from repeated exposures at 

three different levels of within-session ISI in a PAL format. The focus on a dual-process 

account that includes successful study-phase retrieval as an underlying mechanism for this 

investigation is motivated by the fact that current theories of the spacing effect include study-

phase retrieval as an important element in learning from repetition and a crucial precondition 

for observing beneficial effects of spacing. It is further motivated by the fact that a failure to 

process repeated encounters with target items as repetitions has been cited in SLA research as 

a potential explanation for failures to observe benefits of spacing (see, e.g., Elgort & Warren, 

2014; Serrano, 2011). The inclusion of the second element of effortful processing is motivated 

by the widely-held belief that attentional engagement and effort are beneficial for learning of 

second language vocabulary (Godfroid et al., 2013; Laufer & Hulstijn, 2001; Mohamed, 

2018; Schmitt, 2008) as well as my finding that deficient processing of massed encounters 

mediates the benefits of spacing in L2 vocabulary learning (Koval, 2019).  

Both success and effort of retrieval at repetition may depend on a number of factors. 

One such factor is likely to be the length of time a learner spends studying a word per 

41 

 

 

repetition. The more time a learner spends studying such a word the stronger the resulting 

encoding is likely to be, which may be more likely to survive longer ISIs (Verkoeijen & 

Bouwmeester, 2008) and thus promote retrieval success at a subsequent repetition. Further, 

the longer a word is studied with its meaning, the less effort may be required for retrieval of 

the meaning at a subsequent repetition. Thus, study time may have important effects on the 

operation of both underlying mechanisms tested in the present study.  

Research questions  

The aim of the present study is to test the contribution of the dual mechanism of effortful 

successful retrieval (Benjamin & Tullis, 2010) to any effects of lag on learning second 

language vocabulary from L2-L1 retrieval practice in a PAL format. Another aim is to test 

any effects of study time on the operation of the two proposed mechanisms as well as on 

learning outcomes. The present study is motivated by the following research questions: 

1.  Does the amount of lag between repeated retrieval attempts affect learning from 

retrieval practice in a PAL format, as measured by immediate and delayed form-

recognition and translation posttests? Does the amount of time given for study of an 

L2-L1 translation as feedback affect this relationship? 

2.  Does the amount of lag between repeated retrieval attempts affect study-phase 

retrieval effort and success? Does the amount of time given for study of an L2-L1 

translation as feedback affect this relationship? 

3.  Does the dual mechanism of successful effortful retrieval mediate effects of spacing? 

Is the operation of the two mechanisms affected by the amount of time a learner is 

given to study an L2-L1 translation pair per repetition and in total? 

 

42 

 

 

Participants 

CHAPTER 3 

METHOD 

Fifty-two native speakers of American English (healthy young adults) participated in 

the experiment. These were mostly undergraduate students in a wide variety of majors at 

Michigan State University who had responded to an ad about the study that had been placed 

through the Office of the Registrar. Twenty-two were male and 28 were female, ages 18-29 

(M = 20.04, SD = 2.08, Median = 20). Most of these students had studied at least one foreign 

language, with the number of foreign languages varying from one to four (M = 1.72, SD = 

0.81, Median = 2). Proficiencies ranged from 1 to 5 (M = 2.31, SD = 1.03, Median = 2) on the 

self-assessment question that ranged from 1 (lowest) to 5 (highest). Spanish French, and 

German were the most frequently indicated as languages studied by the participants. Other 

languages studied included Chinese, Japanese, Korean, Russian, Thai, ASL, Burmese, Italian, 

Arabic, Polish, and Greek. However, none of the participants were familiar with the Finnish 

language. One participant reported having travelled to Finland; however, he was not familiar 

with the language beyond one name of a dish, as he said. Another participant reported that his 

grandfather is from Finland. However, the participant reported to have no knowledge of the 

language. Participants’ responses to the first encounters with the target words will be further 

used in this study as a kind of pretest to ensure no prior knowledge of the target words. A 

question on the background questionnaire administered after the study phase will further 

explore participants pre-existing familiarity with the target words. 

 

 

43 

 

 

Materials and design 

I used a fully counterbalanced within-item within-participant design. The experiment 

consisted of a study phase, a distractor math task, 30-minitue delayed vocabulary posttests 

(referred to as immediate posttests) that measured form recognition of the target words as well 

as participants’ ability to produce and select their L1 English translations, one- to two-weeks 

delayed vocabulary posttests (referred to as delayed posttests) that were identical to the 

immediate posttests except for item order randomization within and between participants, and 

a linguistic background questionnaire. 

Study phase. I selected Finnish as the target language for the study. The use of an 

existing language, where each word is paired with its actual English translation, was deemed 

to be more ecologically valid. Finnish is a relatively uncommon language in the US, which 

minimizes the chance of prior exposure among American students (the target participant 

population). Being a language of the Finnic family, it also bears little resemblance to English 

or languages that are commonly studied by US students. Further, Finnish is written in the 

same alphabet as English, the participants’ L1, which allows to control for reading difficulty.  

Seventy-two simple generic Finnish nouns with all diacritic marks removed were 

chosen as the target words for this study. None of these nouns were cognates of their English 

translations. The 72 words were divided into two main lists (36 words each). The words on 

each list served as experimental repeated targets half of the time, and as once-presented 

controls the other half. The purpose of the unrepeated controls was to investigate the effects 

of retrieval practice in the three ISI conditions against a baseline of no practice beyond one 

study event. Within each of the two lists, the words were further divided into three ISI lists 

(12 words each), each to be used in each of the three levels of ISI (massed, short-spaced, and 

44 

 

 

long-spaced) when serving as experimental items. A rotation was performed on the items for 

counterbalancing. Each time the repeated items were changed from one ISI condition to the 

next, the control items changed place in terms of their order within the experimental 

sequence. This way, each control item got to appear at the beginning, in the middle, and 

toward the end of the experimental sequence. Each ISI list (12 words) was further divided in 

half for the two levels of study time (3 vs. 9 sec). This was done such that words in the two 

study time lists were matched on the number of letters. Thus, each of the two levels of study 

time was equated on the number of words and the number of letters per word; it also had each 

condition equally represented. I further counterbalanced the words in terms of study time. 

Four-five participants fell into each of the 12 resulting counterbalancing lists.  

The target words ranged in length from four to eight letters. On both lists, each ISI 

sublist contained two four-letter words, four five-letter words, three six-letter words, one 

seven-letter word, and two eight-letter words (see Appendix A for a list of the target Finnish 

words with their English translations). The N-Watch program (Davis, 2005) was used for 

information on frequency of the English translations. CELEX frequency and LOG 10 

frequency were used. In N-Watch, LOG 10 frequency is based on the CELEX English 

Linguistic Database (Baayen, Piepenbrock, & van Rijn, 1995). The reason for including LOG 

10 transformed indices is the fact that the relationship between word frequency and 

psycholinguistic measures such as lexical decision time is known to follow a logarithmic 

function (Davis, 2005). This refers to the fact that the frequency difference between any two 

low-frequency words has been found to have a larger effect on psycholinguistic measures 

such as reaction time than the same difference between two high-frequency words. Brysbaert, 

Warriner, and Kuperman’s (2014) database of concreteness ratings was used for indices of 

45 

 

 

concreteness. The English translations ranged from 0.43 to 2.63 on their LOG10 frequency 

and from 3.3 to 5 on their concreteness values. The target nouns were matched exactly on the 

number of letters between the two lists and also among the three ISI sublists within each such 

list. The resulting lists were further roughly matched on indices such as frequency and 

concreteness (see Appendix B for frequency and concreteness information for the English 

translations in the two main lists as well the three sublists within each list). Two hundred and 

ten additional Finnish words were selected to serve as practice and recency items as well as 

filler trials during the study phase. Some of these repeated and others were only presented 

once. Some of these were followed by their translations and others were not. The filler items 

were similar to the target items in terms of structure (the same overall length and orthotactic 

patterns, as would be expected among words from the same language).   

A practice block preceded the experimental sequence. A recency block followed the 

sixth experimental block. These blocks contained many of the same fillers that were used in 

the study phase. None of the target words were used in the practice block or the recency 

block. The purpose for the recency block was to minimize any recency or order effects on the 

30-min delayed (immediate) posttest for words that occurred later rather than earlier in the 

experimental sequence. Fillers that were associated with their L1 translations were not in any 

way different from the target words from the point of view of the participant. Further, these 

often repeated in a similar pattern to the target words, except that the number of repetitions 

and the pattern of repetition was different and more haphazard. This was done to prevent 

participants from anticipating a pattern of repetition for the target items. The practice block 

served to minimize any effects of primacy on the target items that were introduced at the 

beginning of the study phase as well as to familiarize the participant with the procedure.  

46 

 

 

During the experimental portion of the study phase, the target words were studied in 

six experimental blocks. The words in the massed condition repeated six times within each 

block. These were separated by 0-1 intervening trials (1 second in the case of zero intervening 

trials: here the interval refers to the time between the offset of the Finnish word presented 

with its translation and the onset of the next corresponding trial, where only the Finnish word 

is presented on the screen until a response is made; or, in the case of one intervening trial, 5-

21 seconds, depending on the speed of response to the filler item). The intervening Finnish 

words that separated massed repetitions were always fillers and were never accompanied by a 

translation in order to preserve the massed nature of study. The words in the short-spaced 

condition repeated six times over two consecutive blocks (three times per block) and were 

separated by 17-38 trials within a block and by 12-22 trials plus the 6 minute-distractor math 

task between two adjacent blocks (3-4 or 6-8 minutes between repetitions). The words in the 

long-spaced condition repeated once per block and were separated by 71-119 trials plus the 

six-minute intervening distractor math task (16 -19 minutes between repetitions). The average 

position across the experimental sequence was equated for the words in all four conditions 

(massed: 249.82; short-spaced: 249.97; long-spaced: 251.10; controls: 248.44) and was not 

different statistically, F(3, 13104) = .159, p = .924. Figure 1 presents graphically the 

conceptual pattern of repetition for one item in each of the three ISI conditions across the six 

blocks.  

47 

 

 

 

Figure 1: A conceptual illustration of the repetition pattern for one item  

Each experimental block started and ended with three filler items. Further, the 

conditions were equally represented at the beginnings and ends of blocks: blocks 2, 4, and 6 

began and ended with two control items; block 1 began and ended with a massed item (all six 

repetitions); block 3 began and ended with two short-spaced items (1 repetition); block 5 

began and ended with two long-spaced items (1 repetition). The reason for one item in the 

massed condition beginning and ending a block was because six repetitions had to be 

consecutive in this condition. It was hoped that using one repetition of two different items in 

the other conditions would offset this difference. Table 2 presents the variables used in the 

latency analysis.  

Distractor math task. A simple math task was performed for six minutes between the 

six blocks as well as between the final sixth block and the recency block. During this time, 

participants were given multiplication, addition, subtraction, and division tasks to perform. 

Participants did both mental math and math that they wrote out on paper to ensure variety in 

the activity and minimize boredom and fatigue.  

48 

 

 

 

Table 2: Variables used in the study-phase analyses 

 

 

Posttests. Three identical (except for item order randomization) sets of immediate and 

delayed paper and pencil posttests were used to measure learning gains. In each of the two 

administrations, Posttest 1 was a form-recognition test. Here, the 72 target words were 

presented among 156 new Finnish words (distractors) that had not occurred during the study 

phase. Participants were to underline words that they recognized as ones studied during the 

study phase (see Appendix C for the instructions for this test and Appendix D for the test 

sheet). An effort was made to ensure that the distractors that appeared on the posttests were 

not too similar in form to the target words and to the distractors that were encountered during 

49 

 

 

the study phase, particularly for distractors that appeared on the immediate posttest, as this 

posttest followed only 30 minutes after the study phase.  

In each of the two administrations, Posttest 2 was an L2-L1 translation test. Here, 

participants were to write the English translations next to the target Finnish words (on Sheet 

A) presented without distractors (see Appendix C for the instructions for this test and 

Appendix E for the test sheet).    

In each of the two administrations, Posttest 3 was a form-meaning matching test. Here, 

participants were presented with the English translations for all the target Finnish words 

(Sheet B). Participants were to add the number associated with each English translation on 

Sheet B next to the corresponding Finnish word on Sheet A, which had been used in Posttest 

2 (see Appendix C for the instruction for this test and Appendix F for the test sheet). The 

Finnish word sheet from Posttest 2 was used here instead of a new sheet because participants 

had at this point familiarized themselves with the layout of the Finnish words, resulting in 

more ease of location of the words. Presenting them with a new sheet of Finnish words would 

have added unnecessary search for the words. 

A different set of distractors was used for the immediate and delayed form-recognition 

tests. This was done to prevent participants from selecting an item on the delayed posttest due 

to the fact that they had seen it on the immediate posttest. All posttests were randomized in 

terms of order for each participant and also between the immediate and delayed 

administrations within each participant. Table 3 presents a summary of the variables used in 

the posttest analyses. Table 4 presents the variables for the mediation analyses. 

 

 

50 

 

 

Table 3: Variables used in the posttest analyses 

 

 

 

 

 

 

 

 

 

 

 

51 

 

 

Table 4: Variables used in the moderated mediation analyses 

Linguistic background questionnaire. A background questionnaire (see Appendix 

 

G) was used to collect information on participants’ age, sex, any foreign languages studied, 

and any other information that the participant felt was relevant. The questionnaire also asked 

the participants to indicate whether any of the studied words had struck them as familiar upon 

initial encounter and to elaborate if the answer was yes.   

 

52 

 

 

Instruments 

The DMDX software (Forster & Forster, 2003) was used on an HP lap top computer 

for stimulus presentation and recording of the response latencies. Two Transcend voice 

recorders were used to record participants’ oral responses. All posttests and the background 

questionnaire were on paper. Microsoft Office 365 Excel was used for building and rotating 

the study-phase scripts as well as for randomizing posttest item presentation order and coding 

of the auditory responses.  

Procedure 

The experimental procedure is summarized in Figure 2. 

 

Figure 2: A summary of the experimental procedure 

The entire experiment was approximately 3 hours 45 minutes in duration, over two 

sessions, per participant. Session one was about 3 hours and 10 minutes in duration. Session 

two was between 20 and 35 minutes in duration. Session one included the study phase, a 15-

minute break, the immediate posttests, and the background questionnaire. Session two 

included only the delayed posttests. The two sessions were separated, depending on 

53 

 

 

participant availability, by approximately one or two weeks. The experiment was conducted 

with each participant individually, in a small quiet lab. The researcher met with each 

participant at a time scheduled via email.  

The experimental sequence was as follows. First, the participant read and signed the 

consent form. They also asked any questions that they had during the reading of the consent 

form. This was followed by reading of the instructions for the study phase from the computer 

screen (Appendix H). After and during reading of the instructions, the participants were 

encouraged to ask any clarification questions. This was followed by the practice block, which 

consisted of 83 trials. After and during the practice block, the participants were encouraged to 

ask any further questions they may have. Following the completion of the practice block, the 

experimental blocks were completed in order, separated by 6-minute distractor tasks. Block 

one consisted of 110 trials. Each subsequent block consisted of 90 trials. Block one took 12 

minutes, on average, and each subsequent block took 11-12 minutes, on average, to complete.  

Figure 3 presents an example of an experimental study-phase trial sequence.  

Figure 3: An example of one experimental trial sequence 

 

Each trial started with the presentation of a row of hash marks (########) that stayed 

on the screen for 1 second and was replaced by a target Finnish word with a dash and an 

54 

 

 

underscore with a question mark (norsu --- _________?) prompting the participants to 

produce the English translation for the word. The participants were to say these translations 

aloud while their responses were audio-recorded. If the participant could not remember the 

translation or if they believed that they had never seen the translation for a given word, they 

were to say “I don’t know”. Response time was recorded through a button press by the 

researcher (as in Maddox & Balota, 2015), which initiated the next screen, on which the 

Finnish word was presented with its paired associate L1 translation (norsu --- elephant). The 

pair stayed on the screen for either 3 seconds or 9 seconds, depending on the level of exposure 

duration assigned to the word for the specific rotation version, after which the next trial 

began. Distractor words that were presented with translations followed the same sequence. If 

a distractor word was not presented with a translation, the button press initiated the next trial. 

However, the next trial did not begin until the distractor had been on the screen for 3 seconds, 

which was held constant across all distractors that were not followed by a translation. A line 

of hash marks (#########) preceded the presentation of each word. This was used to signal 

the beginning of a new trial and a new word that was about to be presented.   

At the end of each experimental block, the participants were asked whether they 

needed to step out. Whenever a participant indicated that they did, such as to use the 

bathroom or get a drink of water, they were allowed to do so before beginning the distractor 

math task. With these participants, the math task was cut a bit short, however, the break was a 

bit longer than 6 minutes to strike a balance between the loss in terms of the time spent on the 

cognitive activities involved in the math task and the gain in absolute time between the 

experimental blocks. Most participants never asked to step out but indicated that they could 

“keep going”, in which case the distractor math task began immediately after the experimental 

55 

 

 

block. The researcher asked the participants how they were feeling at the end of each block 

and, based on the observation during piloting, that most participants felt like it was difficult to 

remain seated for the entire duration of the study phase, after blocks 4, 5, and 6, the researcher 

suggested a walk outside the lab as part of the distractor math task. During the walk, 

participants performed mental math operations that the researcher asked them to perform. A 

few participants indicated that they did not feel like taking a walk – these participants 

performed the distractor math task in its entirety in the lab. 

The six experimental blocks were followed by the recency block, which was 

composed of 70 trials. After the recency block, participants were given a 15-min break, during 

which they were free to leave the lab. Upon their return to the lab, the participants performed 

Posttests 1, 2, and 3, in that sequence. Participants were given unlimited time to perform these 

tasks. This was done to make sure that any knowledge that they had was captured and not 

only that which they could produce within a limited time window. This also took into account 

the fact that participants may differ in how quickly they perform the tasks. The immediate 

posttests were followed by the completion of the background questionnaire. After this, 

participants received cash compensation for session one. 

Participants were asked to return for the second session two weeks after session one. 

However, not all participants were able to come back exactly two weeks after session one. For 

the participants who were not able to come back after two weeks, session two was mostly 

conducted with a shorter retention interval between the two sessions. Participants were not 

told anything about the content of the second session. Session two was identical in content to 

the immediate posttests. At the end of session two, participants were asked whether they had 

had any exposure to the targeted Finnish words outside of the lab between the two sessions. 

56 

 

 

This was noted by the researcher. All participants except one (whose delayed posttest data 

was removed from the analysis) stated that they had had no such exposure. At the end of 

session two, participants received cash compensation for the session.  

Analyses 

SPSS version 25 (IBM Corp., 2017) was used for all statistical analyses in this study. 

SPSS version 25, Microsoft Office 365 Excel and PowerPoint were used for data management 

and some of the graphics. Linear Mixed modeling and Moderated Mediation analyses were 

used. All statistical analyses are two-tailed and conducted at an alpha level of .05 except for 

cases where a Bonferroni correction is performed to adjust for multiple testing. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

57 

 

 

CHAPTER 4 

RESULTS 

Background questionnaire  

 

See the Participants section for demographic information collected through the 

background questionnaire. Most participants noted that none of the words struck them as 

familiar. No participants were able to produce the correct translation upon initial encounter, 

indicating no prior knowledge. Six  participants noted that some or many of the words looked 

like Spanish words or words from other languages in terms of the spelling.  

Posttests results 

To answer the first research question, which asks whether the length of the interval 

between repeated retrieval events and the amount of time given for study of an L2-L1 

translation as feedback affect learning from retrieval practice in a PAL format, posttest results 

were examined as a function of ISI and study time. The no-practice condition was used as a 

baseline in some of the analyses to isolate more effectively the effects of retrieval practice at 

different levels of ISI.  

Reliability for the six posttests was as follows: immediate form-recognition test: α = 

.694; immediate L2-L1 translation test: Cronbach's α = .790; immediate form-meaning 

mapping test: Cronbach's α = .789; delayed form-recognition test: Cronbach's α = .779; 

delayed L2-L1 translation test: Cronbach's α = .724; delayed form-meaning mapping test: 

Cronbach's α = .882. Accuracy was acceptable for all participants (< 10%) except that for two 

participants on the immediate Posttest 1 and one participant on the delayed Posttest 1. These 

participants’ data were excluded for the corresponding tests.   

58 

 

 

 

Four participants did not come back for the delayed posttest. Therefore, these 

participants only provided immediate posttest data. The posttests were scored as follows: one 

point was awarded for each correct response and zero points were awarded for an incorrect 

response or no response (where participants did not underline a target word on Posttest 1 or 

did not attempt to write its translation on Posttest 2 or did not attempt to match it with a 

translation on Posttest 3). Not all participants were able to come back two weeks after session 

one; therefore, there is a number of levels of time of delayed test in the present data. 

Participants can be divided in to two groups: 21 participants who came back 6-8 days after 

session one and 26 participants who came back 11-16 days after session one. 

 

Posttest results: Descriptive statistics. Table 5 presents raw scores for the three 

immediate and delayed posttests in the experimental and control conditions separately for the 

shorter and longer study time duration conditions. Here, each score is out of 18 possible 

points (as there are 36 words in the experimental and in the control condition and half of each 

was presented under the short study time condition while the other half was presented under 

the long study time condition for any given participant). Cohen’s d effect sizes were 

calculated relative to the results in the short study time condition to investigate the effect of 

study time.  

The results show that there is a small effect of study time across the practice and no-practice 

conditions and across the two retention intervals (immediate vs. delayed test). There is further 

a positive effect of repetition in these numbers. Recall that in the control condition, no true 

retrieval attempts occurred for the target items, as here, the words were studied only once, 

while in the experimental condition participants were additionally given the opportunity for 

five true retrieval attempts and five additional restudy opportunities. 

59 

 

 

Table 5: Raw posttest scores in the practice and no-practice conditions  

 

60 

 

 

 

Table 6: Raw posttest scores across the three experimental conditions  

     

 

61 

 

 

Table 6 presents raw scores for the three immediate and delayed posttests in the three 

ISI conditions separately. The scores are out of six possible points. Effect sizes are calculated 

relative to the scores in the massed condition to explore any benefits of spacing practice. 

Table 6 shows a considerable difference between the massed and the two spaced conditions 

across the different test types and different levels of RI. The difference between the two 

spaced conditions is smaller and is not consistent. There appears to be a small lag effect, 

whereby the longer spaced condition produced slightly better scores, particularly in the 

delayed posttests. The numbers also show a small benefit of longer study time that is, again, 

quite consistent across the conditions, test types, and RIs.  

Tables 7-9 present the scores across the three ISI conditions and in the control 

condition as percentages. Percentages are presented because of the difference in the number 

of possible correct responses between the control condition and each ISI condition. The 

Cohen’s d effect sizes are calculated relative to the no-practice control condition, to 

investigate the effects of repetition in the three different repetition schedules.  

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

62 

 

 

Table 7: Percent correct in the massed practice and no-practice conditions 

 

 

 

 

 

 

 

 

 

63 

 

 

Table 8: Percent correct in the short-spaced practice and no-practice conditions 

 

 

 

 

 

 

 

 

64 

 

 

Table 9: Percent correct in the long-spaced practice and no-practice conditions 

 

This comparison shows that the beneficial effect of repetition is seen across the three 

ISI conditions, although it is much smaller in the massed condition than in the two spaced 

conditions. This suggests that massed retrieval practice may have little benefit over a single 

study event. In fact, median values for scores on some of the tests (particularly in the delayed 

tests) are zero in the massed condition, suggesting no knowledge gained from massed 

retrieval practice. Although increasing the time a learner spends studying an L2-L1 translation 

pair per repetition and in total seems to benefit learning, even when this is done through 

65 

 

 

simple maintenance rehearsal, spacing repeated retrieval practice appears to have a larger 

benefit than does increasing study duration. 

 

Figures 4-6 present the scores on the three immediate and delayed posttests across the 

three ISI conditions.  

Figure 4: Form-recognition scores in the three ISI conditions  

 

Figure 5: L2- L1 translation scores in the three ISI conditions  

 

66 

 

 

Figure 6: Form-meaning mapping scores in the three ISI conditions  

 

For each test type, there was a considerable increase in posttest scores between the 

massed and the short-spaced condition both on immediate and delayed test iterations. 

However, the difference between the short- and long-spaced conditions seems to differ across 

test iterations: there seems to be no difference between the two spaced conditions on the 

immediate posttests, however, there seems to be an increase in the scores from the short- to 

the long-spaced condition in the delayed posttests. The delayed posttests all show lower 

scores than the scores on the immediate posttests, with the difference being relatively smaller 

in the form-recognition posttests. The difference between the immediate and delayed posttest 

scores indicates a forgetting process. The pattern of results suggests a slower rate of forgetting 

in the longer spaced condition than in the shorter spaced condition.  

 

Participants differed with respect to time of delayed test. The time of delayed test will 

be taken into account in statistical tests. The different retention intervals center around one 

and two weeks. Further, based on the fact that there is a break in the continuity of RI lengths 

that mirrors that in forgetting slopes, differences in scores will be investigated descriptively 

between the resulting two groups of participants. Factor scores from a principle component 

67 

 

 

analysis were used here for a more succinct presentation of scores. Table 10 presents the 

correlations among the three test types as well as the results of the principal component 

analysis.  

Table 10: Correlations and loadings for each test on the extracted component  

 

The three posttests load quite highly on the extracted component and the variance 

explained by this component alone is quite high. Figure 7 presents the rate of forgetting in the 

two groups of participants that differ with respect to time of delayed test (one vs. two weeks).  

 

Figure 7: Posttest results in the three ISIs for the two groups of participants  

Figure 7 shows that the group that returned for the delayed posttests two weeks after 

the study phase had a steeper forgetting slope than the group that returned one week after the 

study-phase. However, it also shows that the former group had higher scores on most of the 

 

68 

 

 

immediate tests, suggesting that the two groups of participants differ with respect to 

knowledge gained and this difference is independent of time of delayed test administration. 

Figure 8 presents these scores separately in the two study time conditions. 

 

Figure 8: Effect of study time on scores in the two groups  

For both study duration conditions, there is a similar pattern of a steeper slope between 

the immediate and delayed posttests, but also higher scores on the immediate posttests, in the 

group that took the delayed posttests two weeks after the study phase, again suggesting a 

difference between the two groups that may be independent of time of delayed test. 

Posttest results: Inferential statistics. An omnibus test including the immediate and 

delayed scores in a long format was run for each of the three test types. I included ISI, RI, and 

study time as the independent variables. Linear mixed modeling was used to account for the 

nested structure of the data, as here multiple data points were contributed by each of the 

participants. Because participants varied in the time between the immediate and delayed 

posttests, which means that they likely differed in the forgetting slopes between the two tests 

(the RI variable), a random slope was included for this level-two variable to control for such 

69 

 

 

differences. The unstructured covariance type was selected as the most robust type. Because 

of the large number of independent variables, I used a simultaneous entry and Restricted 

Maximum Likelihood (REML) estimation. Due to high collinearity between the two variables 

of ISI and the variable that distinguishes experimental items from control items, these were 

collapsed into one variable that in these analyses will be called practice type. Thus, the 

practice type variable used in these analyses includes the three levels of temporal distribution 

of repeated encounters and one level of non-repeated control words.       

The form-recognition test. The residuals for the form-recognition test were close to 

normally distributed with 3 outliers beyond -3SD, which were removed. The removal of the 

outliers resulted in a normal distribution according to the Kolmogorov-Smirnov (p = .200) 

and Shapiro-Wilk (p = .832) tests of normality. The distribution further had skewness and 

kurtosis values within acceptable ranges (skewness = -.022, SEskewness = .089; kurtosis = -.025, 

SEkurtosis = .178). For this reason, no data transformation was performed and, instead, raw 

percent correct scores were used. The ICC was .059, suggesting that roughly 6% of the 

variance in the dependent variable was attributable to the effect of participant differences. 

While this, again, is a small value of ICC, I used multi-level modeling because the software 

used allowed such an analysis but also because a random slope was of interest in the present 

case.  

The omnibus analysis revealed a significant interaction between RI (immediate vs. 

delayed test) and practice type, F(3, 669.768) = 6.659, p < .001, but no other significant 

interactions (all ps > .05). There was further a significant main effect of practice type, F(3, 

669.768) = 675.566, p < .001, and a main effect of RI, F(1, 129.349) = 8.916, p = .003. Study time 

did not interact with any of the variables (all ps > .05) and also did not have a significant main 

70 

 

 

effect, β = 2.364, F(1, 669.768) = 1.550, p = .214. To investigate the RI by practice type 

interaction, separate linear mixed effects analyses were run for the immediate and delayed 

posttests with practice type as a four-level independent variable and time of delayed test as a 

covariate that should affect only scores on the delayed posttest. Parameter estimates were 

further examined with the no-practice condition and the short-spaced condition as the 

reference categories in two separate analyses. This allowed to compare all the levels of 

practice type with a minimum number of separate comparisons. The Bonferroni correction 

was used to adjust the alpha level for multiple testing: α = .05/3 = .016. Table 11 presents the 

omnibus test results separately for the immediate and delayed test with practice type as the 

independent variable.    

Table 11: Form-recognition omnibus test 

The results of the separate omnibus tests for the immediate and delayed posttests show 

 

a significant effect of practice type for both RIs. Further, the results show that there is actually 

a significant difference between the two groups of participants that differ with respect to time 

of delayed posttest in the immediate scores but not in the delayed scores, contrary to what 

should be observed. Further, this variable also interacts with practice type in the immediate 

71 

 

 

scores. Time of delayed test should not have an effect on the immediate scores and should not 

interact with other variables in these scores, as participants in the two groups do not differ 

with respect to time of the immediate test. This pattern of results confirms statistically the 

observation from Figure 7 that the two groups differ in their learning gains overall and that 

this difference may exist independently of when the delayed test is administered. For this 

reason, any difference between the delayed posttest scores in the two groups needs to be 

interpreted with caution.  

Table 12 presents parameter estimates for a comparison between the effect of practice 

under the three practice type conditions against the no-practice condition on the immediate 

and delayed form-recognition tests. Here, the estimates are all in raw percentages. The 

intercept respresents the mean score in the no-practice condition and each slope represents the 

mean difference between the no-practice condition and the corresponding practice schedule 

condition. The null hypothesis for the effect of intercept is that the mean of the scores in the 

no-practice condition is equal to zero. The null hypothesis for each slope is that the scores in 

the corresponding condition are not different from the scores in the no-practice condition, 

which is the reference category represented by the intercept. 

 
 
 
 
 
 
 
 
 
 
 
 
 
 

72 

 

 

Table 12: Form-recognition results against the no-practice condition 

The table  shows that there was a significant difference between results in the no-

 

practice condition and results in each of the practice type conditions on both immediate and 

delayed form-recognition tests. Further, the slopes are positive throughout, indicating that 

practice under each of the temporal distributions was significantly better than no retrieval 

practice at all and this is true of whether the learning gains are measured 30 minutes or a week 

or two after the study phase. However, the slopes are of different magnitudes. Thus, the effect 

of retrieval practice in the massed condition is considerably smaller than that in the two 

spaced conditions. This pattern holds for both immediate and delayed form-recognition tests. 

Table 13 presents parameter estimates for a comparison between scores in the short-spaced 

practice and the other practice schedules as well as the no-practice condition.  

 
 
 
 
 
 
 
 
 
 

73 

 

 

Table 13: Form-recognition results against the short-spaced practice condition 

In both the immediate and delayed form-recognition tests, the massed retrieval 

 

practice schedule produced significantly lower scores than did the short-spaced retrieval 

practice schedule. In both tests, there was no significant difference between the long-spaced 

retrieval practice schedule and the short-spaced retrieval practice schedule, with the former 

showing a very small nonsignificant negative slope relative to the latter, indicating a 

nonsignificant nonmonotonic function of lag. Further, as expected based on the previous 

comparisons, where the massed retrieval practice schedule was shown to produce higher 

scores than no practice, the no-practice condition produced significantly lower scores than the 

short-spaced retrieval practice schedule.  

The L2-L1 translation test. In the L2-L1 translation test, participants did not need to 

select target forms but were presented with them and were asked to recall their meanings. The 

residuals for this test were close to normally distributed with four outliers in the lower tail (2 

from the short-spaced and 2 from the long-spaced condition). After the removal of these 

outliers, the distribution was normal according to the Kolmogorov-Smirnov (p = .200) and 

Shapiro-Wilk (p = .606) tests of normality. The distribution further had skewness and kurtosis 

74 

 

 

values within acceptable ranges (skewness = -.116, SEskewness = .088; kurtosis = -.082, SEkurtosis 

= .176). For this reason, no data transformation was performed and, instead, raw percent 

correct scores were used. The ICC was .059, suggesting that roughly 6% of the variance in the 

dependent variable was attributable to the effect of participant. While this, again, is a small 

value of ICC, I used multi-level modeling because the software used allowed such an analysis 

but also because a random slope was of interest in the present case.  

The same independent variables were used as those in the form-recognition test 

presented above. The omnibus analysis revealed a significant interaction between RI and 

practice schedule, F(3, 690.080) = 47.297, p < .001, but no other significant interactions (all ps > 

.05), as in the results of the form-recognition test. Similarly to the results of the form-

recognition test presented above, there was also a significant main effect of practice type, F(3, 

690.080) = 636.334, p < .001 and a main effect of RI, F(1, 129.702) = 95.307, p < .001. Unlike the 

results of the form-recognition test, however, there was further a main effect of study time, β 

= 0.347, F(1, 690.080) = 14.176, p < .001. To investigate the RI by practice type interaction, 

separate linear mixed effects analyses were conducted for the immediate and delayed posttests 

with practice type as a four-level independent variable and time of delayed test as a covariate 

that should affect only scores on the delayed posttest. Parameter estimates were further 

examined with the no-practice condition and the short-spaced condition as the reference 

categories in two separate analyses. This allowed to compare all the levels of practice type 

with a minimum number of separate comparisons. The Bonferroni correction was used to 

adjust the alpha level for multiple testing: α = .05/3 = .016. Table 14 presents the results of 

this analysis. 

 
 

75 

 

 

Table 14: L2-L1 translation omnibus test  

There is a significant effect of practice type for both test iterations, indicating that 

 

there is a significant difference between at least two of the levels of practice in each of the 

two L2-L1 translation tests. Time of delayed test, the covariate, is significant for both the 

immediate test and the delayed test and it interacts, in both test iterations, with practice type. 

Because this variable cannot have an effect on the immediate scores, this again suggests that 

the two groups of participants that differ with regards to time of delayed test also differ in the 

level of knowledge gained, which in turn means that no firm conclusions can be made about 

the effect of time of delayed test on forgetting curves in the present case.  

Table 15 presents a comparison of the different practice schedules to the no-practice 

condition in terms of the immediate and delayed L2-L1 translation posttest scores in percent 

correct translations. Here, again, the estimates are all in raw percentages and the intercept 

respresents the scores in the no-practice condition while each slope represents the difference 

between the no-practice condition and the corresponding practice schedule condition. The null 

76 

 

 

hypothesis for the effect of intercept is that the scores in the no-practice condition are equal to 

zero. The null hypothesis for each slope is that the scores in the corresponding condition are 

not different from the scores in the no-practice condition, which is the reference category 

represented by the intercept.  

Table 15: L2-L1 translation results against the no-practice condition 

 

The results show that all practice schedules resulted in significantly higher scores 

relative to the no-practice condition on the immediate test, although the size of the benefit 

varied across the different practice schedules. On the delayed test, however, there was no 

significant difference between the massed retrieval practice condition and the no-practice 

condition at the corrected alpha level, while the significant benefits of the two spaced 

conditions persisted across time. The nonsignificant p-value associated with the score in the 

no-practice condition on both the immediate and the delayed tests in turn suggests that in this 

condition the learning gains were close to zero.  

Table 16 presents a comparison of the scores in the short-spaced retrieval practice 

condition against the scores in the other conditions, including the no-practice condition. The 

intercept respresents the scores in the short-spaced practice condition and each slope 

77 

 

 

represents the difference between this condition and the corresponding practice schedule 

condition or the no-practice condition. The null hypothesis for the effect of intercept is that 

the scores in the short-spaced practice condition are equal to zero. The null hypothesis for 

each slope is that the scores in the corresponding condition are not different from the scores in 

the short-spaced practice condition, which is the reference category represented by the 

intercept. 

Table 16: L2-L1 translation results against the short-spaced practice condition 

The scores in the massed condition are significantly lower than in the short-spaced 

 

condition, across the two posttests, indicating a spacing effect between these two conditions. 

The scores in the long-spaced condition are a tiny bit lower on the immediate posttest 

(showing a nonmonotonic function) though this difference is not statistically significant. 

However, the scores on the delayed posttest are 8% higher in the long-spaced condition than 

in the short-spaced condition and this difference is statistically significant, indicating a 

significant lag effect in the delayed L2-L1 translation posttest scores.    

The form-meaning matching test. On the form-meaning matching tests, participants 

were presented with the Finnish words and their translations and asked to match between the 

78 

 

 

two lists. The distribution of the residuals for the form-meaning matching test scores was 

close to normally distributed with one outlier above 3SD in the upper and one in the lower 

tails. These outliers were removed, which resulted in more nearly normal distribution 

(skewness = -.246, SEskewness = .087; kurtosis = .154, SEkurtosis = .174). Although the 

Kolmogorov-Smirnov and Shapiro-Wilk tests of normality were significant at the .05 alpha 

level, (p = .037 and .005, respectively), no data transformation was performed due to the fact 

that a .001 alpha level is recommended for these tests of normality because of how 

conservative they are and how sensitive they are (Field, 2013). Further, the distribution 

looked symmetrical and bell-shaped and the Normal Q-Q plot also did not show much 

deviation from the diagonal. The ICC was .059, suggesting that roughly 6% of the variance in 

the outcome was attributable to the effect of participant. While this, again, is a small value of 

ICC, I used multi-level modeling because the software used allowed such an analysis but also 

because a random slope was of interest in the present case.  

The omnibus test showed a significant interaction between RI and practice type, F(3, 

691.462) = 24.333, p < .001 but no other significant interactions (all ps > .05). There was further 

a main effect of practice type, F(3, 691.462) = 910.132, p < .001 and a main effect of RI, F(1, 

127.754) = 112.564, p < .001, as well as a significant main effect of study time, β = .810, F(1, 

691.462) = 16.456, p < .001.  

To investigate the RI by practice type interaction, separate linear mixed effects 

analyses were conducted for the immediate and delayed posttests with practice type as a four-

level independent variable and time of delayed test as a covariate that should affect only 

scores on the delayed posttest. Parameter estimates were further examined with the no-

practice condition and the short-spaced condition as the reference categories in two separate 

79 

 

 

analyses. This allowed to compare all the levels of practice type with a minimum number of 

separate comparisons. The Bonferroni correction was used to adjust the alpha level for 

multiple testing: α = .05/3 = .016. Table 17 presents the results of this analysis. 

Table 17: Form-meaning matching omnibus test  

The omnibus tests show that there is a significant difference between at least two of 

 

the four practice types on both the immediate and the delayed test. Further, contrary to the 

results of the previous two tests, here, as would logically be expected, the time of delayed test 

is significant only for the delayed test scores and it further significantly interacts with practice 

type in this test.  

Table 18 presents a comparison of the different practice schedules to the no-practice 

condition in terms of the immediate and delayed L2-L1 form-meaning matching posttest 

scores in percent correct matches. Here, again, the estimates are all in raw percentages and the 

80 

 

 

intercept respresents the scores in the no-practice condition while each slope represents the 

difference between the no-practice condition and the corresponding practice schedule 

condition. The null hypothesis for the effect of intercept is that the scores in the no-practice 

condition are equal to zero. The null hypothesis for each slope is that the scores in the 

corresponding condition are not different from the scores in the no-practice condition, which 

is the reference category represented by the intercept.  

Table 18: Form-meaning matching results against the no-practice condition 

In the immediate and delayed posttest scores, all three practice conditions show 

 

significantly higher scores than the no-practice condition. In the delayed posttest scores, 

however, the benefit of massed practice over no practice is much smaller in magnitude (only 

7%) while the benefits of the two spaced practice conditions remain quite large across time 

(49% and 58%). Table 19 presents a comparison between the scores in the short-spaced 

retrieval practice condition and the scores in all the other conditions, including the no-practice 

condition.  

 
 
 
 

81 

 

 

Table 19: Form-meaning matching results against the short-spaced practice condition 

There is a significant negative slope for the massed practice and no-practice conditions 

 

relative to the short-spaced practice condition on both the immediate and the delayed 

posttests. There is further a small negative slope for the long-spaced practice condition 

relative to the short-spaced practice condition in the immediate scores (a nonmonotonic 

function), though this effect is not statistically significant. On the delayed posttest, by 

contrast, the long-spaced practice condition shows a significant 8% benefit over the short-

spaced practice condition, indicating a significant lag effect. Thus, the pattern of the effect of 

lag is reversed between the immediate and delayed posttests. This, again, confirms the 

previous observation that the benefit of long-spaced practice seems to be more evident after 

more time, suggesting that long-spaced retrieval practice interferes more strongly with 

forgetting in the longer term than short-spaced or massed retrieval practice.  

Study-phase results 

The study phase produced quite a low percentage of errors (M = 2.4%, SD = 1.9%, 

Median = 1.8%, Min = 0.2%, Max = 8.8%). A cutoff point of 10% error rate was used 

because of the low chance of providing the correct translation for a target word by mistake 

82 

 

 

due to the large number of potential translations. Errors in this case were incorrect translations 

given by a participant in response to either a Finnish word they had not seen before 

(distractors) or one they had seen before. Therefore, all participants’ data were included in the 

study-phase analysis. Further, zero correct translations were given upon the first encounters 

with all the Finnish words, before a learner was given a chance to study the word, further 

confirming no prior knowledge of the target words. Effort indices were investigated in the 

three ISI conditions and also in the two study time conditions.  

Study-phase response latencies: Descriptive statistics. Table 20 presents the 

descriptive statistics for the study-phase response latencies across the four conditions. Recall 

that response latencies indicate the amount of time, in milliseconds, between the moment a 

Finnish word appears on the screen and the time when the participant either supplies its 

translation by saying it aloud or states aloud that they don’t know the translation. Recall, also, 

that the words in the experimental conditions repeated six times, while the controls were only 

presented once. Thus, the latencies for the control words indicate how much time or effort 

was spent on identifying a given word as one that had not been seen before or one for which a 

translation had not been seen before, or a vain search of one’s memory for a nonexistent 

representation not encoded on any level, as the translations for these words had not been 

presented yet. Effect sizes in the practice conditions were calculated relative to the massed 

condition in order to investigate the question of whether increasing ISI leads to greater effort. 

 
 
 
 
 
 
 
 

83 

 

 

Table 20: Response latencies across the practice conditions  

 

On average, the least retrieval effort was observed in the massed condition. The short-

spaced repetitions produced almost twice as much effort as the massed repetitions and the 

long-spaced repetitions produced more effort than the short-spaced repetitions though this 

difference is not as dramatic as that between the massed and the short-spaced repetitions. The 

reason for such a small difference is likely more frequent failure to recognize long-spaced 

repetitions as words for which a participant is able to produce the translation, which resulted 

in more quick “I don’t know” responses without an attempt at retrieval.  

Table 21 presents these statistics separately for the long and short study time, or 

presentation duration, conditions. Effect sizes here are calculated relative to the massed 

practice condition. No effect sizes are shown for the control condition because here 

presentation duration cannot impact response latencies as no words in this condition repeated.  

 
 
 
 
 
 
 
 
 
 
 
 
 
 

84 

 

 

Table 21: Response latencies in the two study time conditions 

 

 

This table shows that words that were presented for study with their translations for 9 

seconds received slightly less overall translation effort than did words that were presented for 

study only 3 seconds. However, this difference is very small. Further, the fact that the control 

condition appears to show the same pattern suggests that the difference is so small as to be 

easily obtained by chance. Statistical tests may help to adjudicate between these possibilities.  

Table 22 presents the response latencies separately for successful and unsuccessful 

retrieval attempts. The first encounters are excluded from these statistics for a pure effect of 

success/failure, where retrieval attempts actually represented a search of one’s memory for an 

existing memory trace. Note that the number of cases is different between these two 

conditions. This is because these were not set a priori by the researcher but rather were a 

function of participants’ ability to recall a given translation and were thus outside of direct 

experimental control. The effect sizes here were calculated relative to the massed condition.  

 
 
 
 
 
 

85 

 

 

Table 22: Response latencies in successful and unsuccessful retrieval attempts  

Here, overall, we see the same pattern of differences among the three ISI conditions. 

The table further shows more overall effort in the unsuccessful than in the successful retrieval 

 

attempts.  

Figure 9 presents a line graph of the study-phase response latencies across the six 

repetitions in the three ISI conditions. Median values are presented instead of means due to a 

significant positive skew in the raw response latency data. This figure shows overall response 

times, regardless of the correctness of the response. 

Figure 9: Median study-phase response latencies across the six repetitions  

 

86 

 

 

Overall, response latencies show a decrease across the repeated encounters; however, 

response times in the massed condition decrease quite dramatically in the early retrieval 

attempts, after which they do not decrease much because of a kind of a floor effect. In fact, 

the first true retrieval attempt, which occurs at repetition two, already exhibits a very low 

effort value in this condition. Responses in the two spaced conditions show much longer 

latencies across the repetitions, with the long-spaced condition continuing to elicit longer 

response latencies than those in the short-spaced condition until the very last repetition.  

Figure 10 presents these response latencies separately for the long and short study 

duration conditions.  

 

Figure 10: Response latencies in the short and long study duration conditions 

The two study duration conditions appear to have produced very similar response 

latencies in the three conditions across the six repetitions. Figure 11 presents response 

latencies separately for successful and unsuccessful retrieval attempts. It also presents the 

number of cases at each repetition in each ISI condition for successful and unsuccessful 

retrieval attempts. These numbers must be kept in mind when interpreting the trends in Figure 

11. In this figure, the first encounter is excluded because it cannot be included in one of the 

87 

 

 

graphs (the success graph, as all retrieval attempts for repetition one were, as expected, 

unsuccessful). This was done for ease of comparison across the two graphs. Further, two 

separate graphs are presented instead of a single graph because there is considerable overlap 

in the lines between the two study time conditions. Thus, Figure 11 shows latencies only for 

true retrieval attempts, where the participants had studied each word before and therefore 

retrieval was possible, excluding retrieval attempts where the relevant translation had not yet 

been seen.  

 

Figure 11: Study-phase latencies in successful and unsuccessful retrieval attempts  

The figure shows overall longer latencies for the unsuccessful retrieval attempts than 

for successful retrieval attempts. Further, while the latencies decrease quite steadily across 

repetitions in the successful retrieval attempts, though in a bit of a quadratic trend, latencies 

for the unsuccessful retrieval attempts do not seem to decrease across repetitions except in the 

massed condition, where the total number of unsuccessful retrieval attempts is very small. 

Repetition two seems to have produced similar effort between the two spaced conditions in 

88 

 

 

the successful retrieval attempts at around two and a half seconds, however, the number of 

successful retrieval attempts in the long-spaced condition here is relatively small. The 

successful retrieval latencies further show a considerable difference between the massed 

condition and the two spaced conditions, the latter conditions producing considerably longer 

response times.  

Study-phase response latencies: Inferential statistics. Growth curve modelling was 

used for initial exploration of changes in effort in the three conditions across repetitions as 

well as of how this may be affected by the time participants are allowed to study each word 

with its translation at each repetition. For this analysis, only experimental items were used 

because the control items did not repeat and thus cannot have a growth process. Further, 

latencies for the first encounters were removed from this analysis as well, as these do not 

represent true retrieval attempts (here, a search of the memory is performed in vain, as the 

translation for a given Finnish word has not been seen yet). Figure 12 shows the growth 

trajectories across the three ISI conditions with the use of median values due to the positive 

skew in the distribution of latencies. Note that these latencies contain both correct and 

incorrect responses.  

Figure 12: Growth in the latencies in the three conditions across repetitions  

 

89 

 

 

The distribution of residuals in response latencies was positively skewed (skewness = 

5.265, SEskewness = .025) and leptokurtic (kurtosis = 49.488, SEkurtosis = .051). A natural log 

transformation was used to bring residuals to approximate more closely a normal distribution. 

Further, outliers above 3SD were removed (138 cases: 5 from the massed condition, 50 from 

the short-spaced condition, and 81 from the long-spaced condition). There were no outliers 

below 3SD. The resulting distribution was bell-shaped and followed the diagonal of the Q-Q 

plot quite closely (skewness = .562, SEskewness = .026; kurtosis = .795, SEkurtosis = .051). 

A linear mixed-effects growth curve modeling analysis was used to explore change 

processes in retrieval effort across the five repeated retrieval attempts as well as how the trend 

may differ depending on retrieval success, study time duration, and ISI. However, because an 

initial analysis showed that there was a significant interaction between retrieval success and a 

linear and quadratic growth trajectories, χ(1) = 65.839, p < .001 and χ(1) = 4.767, p = .029, 

respectively), the latencies for the successful and unsuccessful retrievals were investigated 

separately. This also makes theoretical sense. In the field of psychology, only correct 

responses are usually investigated (e.g., Maddox et al., 2018), though this may be due to the 

fact that no feedback is usually provided in such studies and, consequently, effortful search of 

one’s memory that is unsuccessful still results in probable forgetting of the item in question in 

the absence of feedback. This may be different for effortful unsuccessful retrieval attempts 

that are followed by feedback (Kornell et al., 2009). However, while an investigation of 

latencies may be important even in the unsuccessful retrieval attempts in the present case, any 

growth processes will only be investigated in the successful retrieval attempts. The main 

reason for this is that in addition to capturing latencies in cases where a participant thought 

long and hard and still failed to produce the correct translation, the latencies in incorrect 

90 

 

 

responses in the present experiment also capture situations where a participant did not 

recognize a word as repeated at all or did not attempt retrieval due to a quick estimation of 

how low the likelihood of success was. Here, some of the latencies at longer ISIs may actually 

often be shorter due to this and not to any effort processes while other latencies may be longer 

due to the ISI and its effect on effort processes, the two effects pulling in opposite directions. 

Thus, for instance, the upward growth in the long-spaced condition between repetitions two 

and three (see Figure 12) likely indicates that while upon the second repetition, which was 

separated from the initial encounter by a considerable amount of time and number of other 

items, many participants may not have recognized a given word as one they had studied 

before (or, if they did recognize it, did not attempt retrieval of its translation), upon repetition 

three, they may have been more likely to recognize the word and take the time to try to 

remember its translation, unsuccessful as this attempt may have been. Therefore, the amount 

of effort here depends on whether a retrieval attempt was undertaken at all as well as the 

actual effort of a search for the translation in one’s memory. As it is impossible to disentangle 

these effects, analyzing latencies as a growth process may not be useful here. Additionally, 

one of the ISI conditions has too few observations to be useful in this analysis.  

A multi-level framework was adopted to adjust for the nested structure of the data, as 

multiple data points were contributed by each of the participants. The intraclass correlation 

coefficient (ICC) for the effect of participant was .065, which indicates that roughly 6.5% of 

the variability in reading times can be attributed to the differences among the participants 

(Hayes, 2006). While this is a relatively small ICC, including the second level may still be 

safer than ignoring any, however small, dependency in the data, particularly since the 

software used allows such an analysis (Hayes, 2006). The fact that multiple encounters with 

91 

 

 

the same Finnish words occurred both within and between participants further makes the 

words a potential level two variable within which encounters are nested. Finnish words 

produced an ICC of the same magnitude as participants (ICC = .065). The inclusion of both 

participants and words as random intercepts significantly improved model fit, χ(1) = 2171.373, 

p < .001. Both random intercepts were included. Model fit improvement was used as a 

measure of significance with Full Maximum Likelihood Estimation.  

The inclusion of  repetition as an independent variable significantly improved model 

fit, χ(1) = 787.039, p < .001. The addition of a quadratic term further significantly improved 

model fit, χ(1) = 4.439, p = .035, suggesting that the growth trajectory may not be linear. 

However, there was further a significant interaction between the quadratic term and condition, 

χ(1) = 473.149, p < .001, suggesting that the shape of the trajectory may differ depending on 

condition. Duration did not add a significant effect, χ(1) = .098, p = .754; there were further no 

other significant interactions, all ps > .05.  

Because there was a significant condition by trend interaction, growth in the three 

conditions was examined separately. In each condition, there was a significant quadratic trend 

(massed: χ(1) = 10.231, p = .001; short-spaced: χ(1) = 66.809, p < .001; long-spaced: χ(1) = 

26.698, p < .001). However, the conditions differed in terms of a cubic trend: while the 

massed condition exhibited a cubic trend (χ(1) = 9.844, p = .002), the other two did not (all ps 

> .05). However, the cubic trend in the massed condition might be an artefact of distractors 

being presented always after the second and, often, the third repetition, which may have 

produced slight amounts of forgetting between the respective repetitions in the massed 

condition, therefore, this trend may not be a reliable indication of changes in effort with 

repetition per se. Let us now turn to an investigation of the difference, at each repetition, in 

92 

 

 

the amount of effort that a successful retrieval required. As can be seen in Figure 12, despite 

the quadratic trends, within the five true retrieval attempts, the effort in the long-spaced 

condition never decreased to the point of being equal to that in the short-spaced condition, 

which, in turn, never decreased to the point of being equal to the massed condition. This 

suggests that retrieval continued to be more effortful in the longer spaced condition than in the 

shorter spaced condition, even in later repetitions. Figure 13 presents a growth curve that 

contains only those words for which successful retrieval attempts occurred at repetition two. 

Because it was nearly always the case that once a translation was correctly retrieved it 

continued to be correctly retrieved across later repetitions, Figure 13 is a more pure 

illustration of how retrieval effort changed across the five repeated successful retrieval events 

in the three conditions.  

 

Figure 13: Response latencies across five successful retrieval attempts  

No statistical analysis will be performed on the differences in these trajectories 

because of the considerable differences in the number of cases across the ISI conditions. Core 

statistical analyses will, instead, focus on the amount of effort, collapsed across repetitions, 

induced by the different levels of ISI as well as how this may interact with retrieval success 

93 

 

 

rate and exposure duration, a variable that did not seem to affect the change across repetitions 

in the growth curve analysis latencies.  

The sum of latencies across the five true retrieval attempts were used as the outcome 

variable to investigate any effects of ISI and presentation duration on response latencies 

during the study phase. The distribution of the residuals in the dependent variable was not 

normally distributed, (skewness = 2.183, SEskewness = .057, kurtosis = 9.998, SEkurtosis = 

.113). The natural log transformation was used to bring the distribution closer to a normal 

distribution. The resulting distribution of residuals was more nearly normal (skewness = .458, 

SEskewness = .057, kurtosis = .868, SEkurtosis = .113). The ICC for the effect of participant 

was .099, suggesting that roughly 10% of the variability in the dependent variable can be 

attributed to the differences between participants (Hayes, 2006). The ICC for the effect of the 

target words was .013, suggesting that roughly 1% of the variability in the dependent variable 

can be attributed to the target words. The inclusion of words as a random effect did not 

improve model fit and interfered with convergence, therefore this random effect was not 

included. The addition of participants as a random intercept significantly improved model fit, 

χ(1) = 109.681, p < .001. The addition of the number of correct retrieval attempts as a 

covariate significantly improved model fit, χ(1) = 1124.907, p < .001, indicating that there is, 

in fact, statistically significant difference in latencies between the successful and unsuccessful 

retrieval attempts. The addition of condition significantly improved model fit, χ(1) = 697.172, 

p < .001. However, there was also a significant interaction between condition and the number 

of correct retrieval attempts, χ(1) = 19.141, p < .001. Figure 14 presents this interaction 

graphically.  

94 

 

 

 

n

i
 
s
e
i
c
n
e
t
a

l
 

e
s
n
o
p
s
e
r
 
n
a
d
e
M

i

 
s
d
n
o
c
e
s
i
l
l
i

m

 

Figure 14: Response latencies as a function of condition and success of retrieval  

Here, we see that failed retrieval attempts produced longer latencies overall and that 

spacing produced longer latencies as well. However, while effort appeared to grow 

monotonically across the levels of spacing in successful retrieval attempts, in unsuccessful 

retrieval attempts, the short-spaced condition appears to have produced slightly longer 

response latencies than the long-spaced condition. This might be due to the fact that more 

words were recognized as repeated in the short-spaced condition than in the long-spaced 

condition, in which case, more retrieval attempts (though unsuccessful in this case) were 

undertaken in the short-spaced condition, which shows up as more effort overall. Restricted 

Maximum Likelihood estimation was used to investigate these interactions due to the 

complexity of the model. Separate analyses were done for successful and unsuccessful 

retrieval attempts. The analyses showed a significant effect of ISI in both success conditions, 

all ps < .001. Parameter estimates with the long-spaced condition as the intercept were 

examined for more detailed information on how the three ISI conditions differed among 

themselves. This analysis showed that, in the successful responses, the massed and short-

spaced conditions both significantly differed from the long-spaced condition (massed: t(6761) = 

-54.127, p < .001; short-spaced: t(6749) = -9.406, p < .001), the negative t values suggesting 

95 

 

 

that both the massed and the short-spaced conditions received less effort than the long-spaced 

condition. However, in the unsuccessful attempts, the latencies in the short-spaced condition 

were significantly longer than those in the long-spaced condition (t(2378) = 4.722, p < .001). 

Further, the massed condition was not significantly different from the long-spaced condition 

(t(2377) = -1.673, p = .094). Thus, in unsuccessful  attempts, retrieval effort was greatest in the 

shorter spaced condition while in the long-spaced condition, retrieval effort was almost of the 

same magnitude as that in the massed condition. This pattern may be explained in the same 

terms as the pattern in the graph: when a participant quickly estimated that they would not be 

able to produce a translation for a word they had not seen for a long time – or when they did 

not even recognize it as one they had studied before – they often gave a very quick “I don’t 

know” response. Thus, because in the short-spaced condition, the previous encounter was 

always a shorter time ago, here more retrieval attempts were undertaken (which means some 

effort was put into them) even if they were ultimately unsuccessful. A further analysis 

revealed that in the successful attempts, the massed condition produced significantly less 

effort than the short-spaced condition (t(6000) = -53.086, p < .001). Thus, the analysis of 

latencies has revealed a significant effect of lag on latencies in successful retrieval attempts, 

whereby retrieval effort increased with longer ISIs.  

Study-phase retrieval success: Descriptive statistics. Table 23 presents the mean 

and median numbers of correct retrieval events during study phase in the three experimental 

conditions. The effect sizes here are calculated relative to the massed condition. Recall that 

there were a total of six repetitions per word and that upon the first repetition the word had 

not been presented before, therefore, "I don’t know” was the correct response. Thus, there 

were a total of five correct retrieval events possible out of the six total repetitions.  

96 

 

 

Table 23: Correct retrieval events per experimental condition 

 

The retrieval attempts in the massed condition were almost always successful. The 

average number of successful retrieval attempts decreased with spacing such that in the short-

spaced condition there were fewer successful retrieval events and in the long-spaced condition 

these were even fewer. The median values show a linear decrease in retrieval success across 

the spacing intervals while the means exhibit a bit of a quadratic trend, where the difference 

between the massed and short-spaced conditions is larger than that between the short- and 

long-spaced conditions. Table 24 presents these statistics separately in the two study-duration 

conditions (3 seconds vs. 9 seconds of studying a Finnish word with its translation). The 

effect sizes here represent differences between the short and long study time conditions. 

Table 24: Study-phase retrieval success in the short and long study time conditions 

 

 

 

97 

 

 

The number of retrieval successes show a small benefit of longer study time, with this 

difference becoming larger the longer the spacing between repetitions.  

Figure 15 presents a line graph that shows the growth in retrieval success across the 

repetitions in the three ISI conditions.  

 

Figure 15: Successful retrievals at each repetition in the three conditions 

The graph in Figure 15 shows positive growth in the median number of successful 

retrieval attempts across the repetitions in the two spaced conditions; however, the conditions 

differ in the rate of such positive growth. In the massed condition, on the other hand, the 

median success value is at 100% from the very first retrieval attempt. In the short-spaced 

condition the median success value reached 100% only upon the last repetition and in the 

long-spaced condition the median success value never reached 100% within the five retrieval 

attempts. Further, the growth in success in the two spaced conditions looks to be almost 

parallel, with the short-spaced condition exhibiting a higher rate of success across the 

repetitions. Figure 16 presents the growth in retrieval success separately in the two study time 

conditions. Two graphs are presented side by side due to a considerable overlap in the lines.  

98 

 

 

 

Figure 16: Growth in retrieval successes in the two study time conditions  

This graph shows that longer study time was beneficial for both spaced conditions; 

however, it looks to be a bit more beneficial for the long-spaced condition.  

Study-phase retrieval success: Inferential statistics. To investigate how retrieval 

success changes with repetition in the three ISI conditions and whether this is affected by 

presentation duration, the number of successful retrieval attempts across repetitions was used 

as the dependent variable in a growth curve analysis. While the number of successes was a 

count variable consisting of 5 possible values (the lowest acceptable number for doing linear 

analyses on count data), residuals presented almost a normal distribution (skewness = -.162, 

SEskewness = .062; kurtosis = .256, SEkurtosis = .124) with the exception of 4 outliers in the lower 

values that were above 3SD. These outliers were removed and the distribution became even 

more nearly normal (skewness = -.065, SEskewness = .062; kurtosis = -.027, SEkurtosis = .124). 

The Q-Q plot further showed that the data closely followed the diagonal. Further, the 

histogram was bell-shaped as well, suggesting that a linear analysis was an acceptable option. 

A linear mixed effects growth curve model was fitted. The ICC for the effect of participant 

was .087, suggesting that roughly 9% of the variance in the dependent variable was due to the 

effect of participant differences. The addition of participants as random intercepts 

99 

 

 

significantly improved model fit,  χ(1) = 68.252, p < .001. For these reasons, participants were 

included as random effects in the analysis. Full Maximum Likelihood Estimation was used to 

investigate these growth processes. 

The inclusion of repetition as an independent variable significantly improved model 

fit,  χ(1) = 431.771, p < .001. The trend was further significantly quadratic, χ(1) = 50.808, p < 

.001. The addition of a cubic term negatively affected model fit. Therefore, only the quadratic 

term was retained. The addition of condition as an independent variable significantly 

improved model fit, χ(1) = 1035.410, p < .001. There was further a significant interaction 

between condition and repetition χ(1) = 551.684, p < .001, as well as between condition and 

the quadratic trend χ(1) = 85.926, p < .001, suggesting that in addition to a difference in slopes, 

or the rate of growth, the conditions also differed in the shape of these trajectories. For this 

reason, the growth trajectories as well as any effects of study on these trajectories were 

examined separately in the three ISI conditions. The alpha level was adjusted accordingly for 

all subsequent analyses: α = .05/8 = .006. In the massed condition, there was a significant 

positive slope for the effect of repetition, χ(1) = 13.571, p < .001, however, the addition of a 

quadratic term did not improve the model,  χ(1) = 1.094, p > .05. There was, further, no effect 

of presentation duration, χ(1) = 2.135, p = .144. There was further no significant interaction 

between study time and the linear and quadratic trends, all ps > .05.  In the short-spaced 

condition, there was also a significant positive slope, χ(1) = 482.042, p < .001, and there was a 

significant quadratic trend, χ(1) = 128.631, p < .001 but no cubic trend, χ(1) = 2.415, p > .05. 

There was, further, a significant positive effect of presentation duration, χ(1) = 13.521, p < 

.001 but no significant interaction between presentation duration and the linear trend, χ(1) = 

4.637, p = .031, nor the quadratic trend, χ(1) = .429, p > .05. An examination of the parameter 

100 

 

 

estimates suggests that, on average, longer study time produced learning of .78 more words 

than shorter study time in this condition. Recall that raw numbers of words are used as the 

dependent variable in this analysis, therefore, the slope can be easily interpreted as the mean 

difference in the number of words successfully translated. In the long-spaced condition, the 

addition of repetition significantly improved model fit, χ(1) = 503.060, p < .001, and there was 

a significant quadratic trend, χ(1) = 83.011, p < .001 but no cubic trend, χ(1) = 1.228, p > .05. 

There was, further, a significant positive effect of study time, χ(1) = 33.987, p < .001, but no 

significant interaction between study time and the linear trend, χ(1) = .223, p > .05, or the 

quadratic trend, χ(1) = .982, p > .05. An examination of the parameter estimates suggests that, 

on average, longer study time produced learning of .48 more words than the shorter study 

time in this ISI condition.  

The analyses presented above suggest that there was overall positive growth in the 

number of successful retrieval attempts across the repetitions and also that the steepness of 

this positive slope depended on the ISI condition. To investigate the effect of ISI condition on 

retrieval success, each repetition was investigated separately. Five omnibus analyses were run 

– one for each repetition – and the parameter estimates were investigated. Linear mixed 

effects modeling was used with REML due to the complexity of the model. The short-spaced 

condition was the reference category against which the effects of the other two conditions 

were tested for significance. Table 25 presents the results of the omnibus tests across the 

repetitions.  

 
 
 
 
 
 

101 

 

 

Table 25: The effect of ISI on retrieval success at the five repetitions 

Here we see that there was a significant difference between at least two of the groups 

 

(the alternative hypothesis for the omnibus test) in each repetition, all ps < .001. Table 26 

presents the parameter estimates that provide information about differences among the three 

conditions. Here, the short-spaced condition is used as the reference category and, therefore, 

all comparisons are made against this condition. Recall that the data represent raw counts of 

words that were correctly retrieved during the study phase at each repetition in the three 

conditions. For this reason, the intercept can be interpreted in terms of the number of 

translations correctly retrieved in the short-spaced condition (the reference category) and each 

slope can be interpreted in terms of the difference, in raw numbers of words correctly 

retrieved, between the short-spaced condition and each of the other two conditions.  

 
 
 
 
 
 
 

102 

 

 

Table 26: Parameter estimates for the effect of ISI on study-phase retrieval success 

 

Table 26 shows that there were significantly more successes in the massed condition 

 

than in the short-spaced condition at each repetition. It also shows that there were 

significantly fewer successes in the long-spaced condition than in the short-spaced condition 

at each repetition. This pattern of results obviates the need for a separate comparison between 

the massed and the long-spaced conditions as these conditions have significant slopes in 

opposite directions from the intermediate short-spaced condition and, therefore, we can 

conclude that they, too, are significantly different from each other.  

103 

 

 

Thus, the analyses presented above have shown that the number of successes grew 

across the repetitions throughout the study phase, although the rate of this growth slowed in 

later repetitions (the quadratic trend), that growth in the three ISI conditions differed 

significantly in the number of successful retrievals, and that this difference did not disappear 

with repeated encounters throughout the study phase. The analyses further showed a 

significant positive effect of presentation duration on study-phase retrieval success in the two 

spaced conditions but not in the massed condition.     

Moderated mediation analyses  

The results of the previous analyses have shown that spacing repeated retrieval 

practice more widely results in superior learning outcomes. It further makes the study-phase 

retrieval process more effortful but also less successful. To answer the third research question 

that asks whether the dual mechanism of successful effortful retrieval underlies the effects of 

ISI on learning outcomes and whether study time moderates this relationship, two moderated 

mediation analyses were performed with the SPSS PROCESS 4.3 macro (Hayes, 2018).  

Moderated parallel mediation analyses. Because learning outcomes were measured 

with multiple posttests, data reduction was performed to reduce the six tests to fewer 

dependent variables. Based on correlations, theoretical reasons, and principle component 

analyses, three dependent variables emerged. These combined together (1) the immediate and 

delayed form- recognition tests, (2) the two immediate meaning tests, and (3) the two delayed 

meaning tests. The three resulting tests will be named, respectively, the form-recognition 

tests, the immediate meaning tests, and the delayed meaning tests. Tables 27-29 present the 

bivariate two-tailed correlations between each member of a pair as well as loadings of each 

pair of tests on their corresponding extracted component.  

104 

 

 

Table 27: Correlation coefficients and loadings for form-recognition tests 

Table 28: Correlation coefficients and loadings for immediate meaning tests 

Table 29: Correlation coefficients and loadings for delayed meaning tests 

 

105 

 

 

 

 

 

Each table shows quite high loadings, suggesting that the corresponding test pair likely 

measures the same underlying construct. Because multiple models were run on the same or 

related data, the alpha level was corrected accordingly. Further, robust tests were run to 

ensure against any violations of normality. Thus, bootstrapped 99% confidence intervals were 

used with 10,000 bootstrap samples. An initial model investigated whether the two 

mechanisms of success and effort underlie any effects of lag in the present results as well as 

whether the operation of these two mechanisms as a function of ISI is affected by study time. 

The moderated parallel mediation included study-phase retrieval effort and success and the 

two mediators and study time as the moderator of the relationship between ISI and the two 

mediators (model 7). This model was tested with each of the three tests (each of which 

combined a pair of tests as discussed above). Further, time of delayed test was included as a 

covariate in the form-recognition test and the delayed meaning test because each of these two 

tests contained scores from a delayed test. Figure 17 presents the conceptual structure of this 

analysis with obtained coefficients for each of the three tests. 

106 

 

 

 

**p < .01, ***p < .001 

 

Figure 17: Conceptual structure for the moderated parallel mediation analysis 

The form-recognition test. The coefficients for the form-recognition test show that, as 

found in earlier analyses, ISI had a significant positive effect on learning outcomes as well as 

a significant positive effect on effort and a significant negative effect on study-phase retrieval 

success. Additionally, the coefficients for the effect of study time show that this variable does 

not have a significant effect on the relationship between ISI and the two mediators of retrieval 

effort and success. This means that effort increases and success decreases across the three 

levels of ISI and these trends are not significantly affected by how much time a learner is 

given for study of a given Finnish word with its translation. The coefficients further show a 

107 

 

 

significant positive effect of successful retrieval of a word’s meaning during study on form-

recognition posttest scores and a significant positive effect of effort on these same scores. 

This means that both retrieval effort and retrieval success positively affect learning, which is 

in line with the predictions of the reminding account. Note that both effort and success are 

modeled here as main effects. However, the effect of one may depend on the level of the 

other, thus, the effect of effort on learning may depend on whether or not retrieval is 

successful, as proposed by the dual mechanism account under investigation. Whether this is 

the case will be explored in a subsequent analysis.  

The tests of the indirect effects showed significant mediation by retrieval success as a 

negative effect across the two levels of study time: β = -.3463, bootstrapped standard error = 

.0820, 99% bootstrapped confidence interval [-.5754, -.1420] for short presentation duration; 

and β = -.2957, bootstrapped standard error = .0754, 99% bootstrapped confidence interval [-

.5180, -.1224] for long presentation duration. This suggests that, despite the fact that there 

was no nonmonotonicity in the form-recognition scores as a function of lag in the present 

results, a negative effect of longer ISI was still present and operated through consequent lower 

study-phase retrieval success, which was true for both levels of presentation duration. Thus, 

lower levels of study-phase retrieval success significantly mediated negative effects on 

learning of wider spacing between repetitions, regardless of how long a given word was 

studied for. 

The tests of the indirect effects further showed significant mediation by retrieval effort 

as a positive effect across the two levels of study time: β = .1689, bootstrapped standard error 

= .0528, 99% bootstrapped confidence interval [.0474, .3194] for short presentation duration; 

and β = 1624, bootstrapped standard error = .0484, 99% bootstrapped confidence interval 

108 

 

 

[.0472, .3018] for long presentation duration. This means that increased effort that resulted 

from spacing retrieval attempts more widely was beneficial for learning to recognize the 

target words. However, there was no significant overall moderated mediation process, Index 

of Moderated Mediation = -.0065, bootstrapped standard error = .0232, 99% bootstrapped 

confidence interval [-.0780, .0542], indicating that the operation of the two underlying 

mechanisms of retrieval effort and success did not depend on whether the Finnish words and 

their English translations were presented for 3 or 9 seconds after each retrieval attempt. 

The immediate meaning test. The coefficients in Figure 17 show a significant positive 

direct effect of ISI on the immediate meaning scores. The tests of the simple effects of ISI on 

effort and success as well as the moderating effects of duration on these variables are not 

affected by what outcome test is the dependent variable in any given model and will, 

therefore, be similar for the immediate meaning scores to those presented in the previous 

analysis of form-recognition scores as well as in the following analysis of the delayed 

meaning posttest scores. However, the entire model needs to be tested for each of the outcome 

tests because of the complexity of the underlying relationships. Therefore, despite being 

almost redundant, coefficients for the entire model are presented in Figure 17, for 

consistency, for each of the three outcome tests. These coefficients may look slightly different 

among the three outcome tests due to bootstrapping. However, the difference should be very 

small and should not affect interpretation. The effects of effort and success, however, will be 

different, as we have a different outcome variable. These coefficients show a small but 

significant positive effect of retrieval effort on the immediate meaning scores as well as a 

significant positive effect of successful retrieval on these scores. This means that for the 

109 

 

 

immediate meaning scores, both retrieval effort and success have a significant positive effect, 

again in line with the predictions of the dual mechanism account.  

The tests of the indirect effects on immediate meaning scores showed significant 

mediation by retrieval success as a negative effect across the two levels of study time: β = -

.4972, bootstrapped standard error = .0794, 99% bootstrapped confidence interval [-.7238, -

.3085] for short presentation duration and β = -.4204, bootstrapped standard error = .0713, 

99% bootstrapped confidence interval [-.6220, -.2576] for long presentation duration. This 

suggests that, despite the fact that there was no nonmonotonicity in the immediate meaning 

scores as a function of lag in the present experiment, a negative effect of longer ISI was still 

present and operated through consequent lower study-phase retrieval success, which was true 

for both levels of presentation duration. Thus, lower levels of study-phase retrieval success 

significantly mediated negative effects on learning of wider spacing between repetitions, 

regardless of how long a given word was studied for. 

The test of the indirect effects further showed significant mediation by retrieval effort 

as a positive effect across the two levels of study time: β = .1703, bootstrapped standard error 

= .0470, 99% bootstrapped confidence interval [.0568, .3038] for short presentation duration 

and β = 1588, bootstrapped standard error = .0414, 99% bootstrapped confidence interval 

[.0541, .2734] for long presentation duration. However, as with the form-recognition results, 

there was no significant overall moderated mediation process, Index of Moderated Mediation 

= -.0115, bootstrapped standard error = .0210, 99% bootstrapped confidence interval [-.0756, 

.0406], indicating that the operation of the two underlying mechanisms of retrieval effort and 

success did not depend on whether the Finnish words and their English translations were 

presented for 3 or 9 seconds after each retrieval attempt. 

110 

 

 

The delayed meaning test. The coefficients in Figure 17 show a significant positive 

direct effect of ISI on the delayed meaning scores. There is also a significant positive effect of 

study-phase retrieval effort and success on these scores. The tests of the indirect effects on the 

delayed meaning scores showed significant mediation by retrieval success as a negative effect 

across the two levels of study time: β = -.7032, bootstrapped standard error = .1047, 99% 

bootstrapped confidence interval [-.9909, -.4530] for short presentation duration and β = -

.6135, bootstrapped standard error = .0930, 99% bootstrapped confidence interval [-.8675, -

.4021] for long presentation duration. Here, again, despite the fact that there was no 

nonmonotonicity in the delayed meaning scores as a function of lag in the present results – in 

fact, the delayed posttests showed a lag effect, whereby scores in the long-spaced condition 

were actually significantly higher than scores in the short-spaced condition – a negative effect 

of longer ISI was still present and operated through lower study-phase retrieval success, 

across the two levels of presentation duration. Thus, here again, lower levels of study-phase 

retrieval success significantly mediated negative effects on learning of wider spacing between 

repetitions, regardless of how long a given word was studied for. 

The test of the indirect effects did not show significant mediation by retrieval effort 

and this was true across the two levels of study time: β = .1249, bootstrapped standard error = 

.0496, 99% bootstrapped confidence interval [.-0026, .2572] for short presentation duration 

and β = 1177, bootstrapped standard error = .0505, 99% bootstrapped confidence interval [-

.0023, .2617] for long presentation duration. Further, as in the previous two tests, there was no 

significant overall moderated mediation process, Index of Moderated Mediation = .0896, 

bootstrapped standard error = .0508, 99% bootstrapped confidence interval [-.0293, .2311], 

indicating that the operation of the two mechanisms of retrieval effort and success was not 

111 

 

 

affected by whether the Finnish words and their English translations were presented for 3 or 9 

seconds after each retrieval attempt. 

The moderated parallel mediation analyses showed no significant moderated 

mediation in any of the three sets of vocabulary scores, suggesting that study time did not 

affect the operation of the investigated underlying mechanisms of retrieval effort and success 

in the present study. In all three tests, retrieval success significantly mediated negative effects 

of ISI on learning outcomes. Thus, despite a failure to capture a nonmonotonic function of lag 

in learning outcomes in the present study, increasing the ISI did, in fact, have a negative effect 

on learning outcomes and this effect operated through a lower rate of study-phase retrieval 

success.  

Study-phase retrieval effort did not have a significant main effect on learning, nor did 

it mediate the benefits of ISI, in the delayed meaning test scores, although it had both effects 

on the other two tests. On the surface, this latter finding is surprising and seems to suggest 

that higher amounts of effort are not beneficial for learning meanings of L2 words in the long 

term. However, because the proposed underlying mechanism is essentially an interaction 

between retrieval effort and success – that is, what underlies benefits of ISI is a mechanism of 

effortful successful retrieval – the main effect of effort may not be a stable effect and may, 

therefore, depend on the level of retrieval effort. The question whether the positive effects of 

retrieval effort are conditional on the level of retrieval success will be tested in the following 

moderated mediation analysis.         

Mediation by retrieval effort moderated by retrieval success (a moderated 

mediation analysis). Retrieval effort was chosen as the mediator of the relationship between 

spacing and learning. Retrieval success was chosen as a moderator of this mediation. The 

112 

 

 

reason for the choice of the mediator was theoretical. Because retrieval effort is known to 

promote word learning (Pyc & Rawson, 2009) and the amount of attentional engagement has 

been shown to mediate the benefits of spacing study of L2 vocabulary learning (Koval, 2019), 

it is an interesting question whether the benefits of increased effort that results from longer 

ISIs in retrieval practice are conditional on higher levels of retrieval success. It is further 

interesting to know whether this holds in the presence of feedback that follows each retrieval 

attempt. Provision of feedback after each retrieval attempt is a more usual situation for second 

language vocabulary learning. The moderated parallel mediation analysis showed that despite 

the fact that a nonmonotonic function was not observed in the learning outcomes, failure of 

study-phase retrieval that resulted from spacing retrieval attempts more widely still had a 

negative effect on learning. It is an important question whether retrieval success rate 

moderates beneficial effects of retrieval effort on learning and may thus constitute a limitation 

on how widely we may space retrieval practice even in the presence of feedback. Significant 

mediation in the present case would mean that spacing retrieval practice more widely 

positively affects learning outcomes because it increases retrieval effort, which, in turn, leads 

to better learning. Significant moderation of this mediation by retrieval success would mean 

that the benefits of effort (the mediator) may be conditional on retrieval success (the 

moderator) and, therefore, repetitions should not be spaced so widely that it negatively affects 

retrieval success, even in the presence of feedback.  

Because study time was shown not to moderate the relationship between ISI and 

study-phase retrieval effort and success, participants’ scores were collapsed across the levels 

of this variable for this analysis. Tables 30-32 present the bivariate two-tailed correlations 

113 

 

 

between the members of each pair of tests as well as loadings of each pair of tests on their 

corresponding extracted component.  

Table 30: Correlation coefficients and loadings for form-recognition tests 

Table 31: Correlation coefficients and loadings for immediate meaning tests 

Table 32: Correlation coefficients and loadings for delayed meaning tests 

 

114 

 

 

 

 

 

Not surprisingly, each table for the collapsed scores shows quite high loadings, as in 

the previous analysis, suggesting that in the scores that are collapsed across the two levels of 

study time the corresponding test pairs likely measure the same underlying construct.  

Figure 18 presents the conceptual structure of the moderated mediation analysis 

(Model 14) as well as the obtained coefficients in the three factor analytic test scores.

Figure 18: Conceptual structure for the moderated mediation analysis 

 

 The coefficients show a similar pattern for all three sets of vocabulary scores. The 

coefficients show a positive effect of ISI on study-phase retrieval effort and also on the 

learning outcomes. Effort is shown to actually have a negative effect on learning in each of 

the three sets of vocabulary scores. Study-phase retrieval success, however, has a positive 

effect on the relationship between effort and learning.  

The form-recognition scores. The test of the indirect effects showed significant 

moderated mediation, Index of Moderated Mediation = .3579, bootstrapped standard error = 

.0805, 99% bootstrapped confidence interval [.1935, .5992]. This means that the effect of 

retrieval effort on form-recognition posttest scores significantly depends on retrieval success. 

115 

 

 

To investigate more in depth the moderated mediation process, the effect of the mediator was 

tested at different levels of the moderator variable, in this case, using the 16th, 50th, and 84th 

percentiles. This analysis is the default in the software used. Table 33 presents the effect of 

study-phase retrieval effort on form-recognition scores at the three levels of study-phase 

retrieval success represented by the three percentiles.  

Table 33: Effect of effort at three levels of success for form-recognition 

This table shows that effort has a small nonsignificant negative effect for words whose 

 

translations were least often successfully retrieved during the study phase (the 16th percentile 

in retrieval success rate) and a small significant positive effect for words that received an 

average number of successful retrieval attempts during the study phase (the 50th percentile). 

For words that received the highest number of successful retrieval attempts (the 84th 

percentile), however, the effect of effort was larger and significantly positive. Thus, the 

beneficial effects of effort were shown to be contingent on higher retrieval success in this 

moderated mediation analysis.  

The immediate meaning scores. The test of the indirect effects also showed 

significant moderated mediation, Index of Moderated Mediation = .3887, bootstrapped 

standard error = .0588, 99% bootstrapped confidence interval [.2643, .5605]. To investigate 

more in depth the moderated mediation process, the effect of the mediator was again tested at 

the 16th, 50th, and 84th percentile levels of the moderator variable. Table 34 presents the effect 

of study-phase retrieval effort on immediate meaning scores separately at each of the three 

levels of study-phase retrieval success represented by the three percentiles.  

116 

 

 

Table 34: Effect of effort at three levels of success for immediate meaning tests 

A similar pattern is seen for the immediate meaning scores as that for the form-

 

recognition scores discussed earlier, with the exception of a significant negative effect of 

effort at the lowest level of retrieval success. This latter finding is puzzling because it would 

suggest that spending more effort on a search of one’s memory for the target translation 

actually hurts memory for the word in question if it is not successfully retrieved. This does not 

seem to make intuitive sense. One possibility may be item difficulty: a word that a participant 

has a hard time remembering may lead them to think hard in an effort to retrieve it and still 

fail to do so. This same word may further be hard to get right on the subsequent test. Thus, my 

proposed explanation of the obtained pattern of results is not a negative effect of effort on 

memory but rather the effect of item difficulty on study-phase retrieval effort. The overall 

pattern, however, is again in line with the predictions of the dual mechanism account under 

investigation.  

The delayed meaning scores. The test of the indirect effects showed significant 

moderated mediation, Index of Moderated Mediation = .2545, bootstrapped standard error = 

.0767, 99% bootstrapped confidence interval [.1018, .4860]. This means that the effect of 

retrieval effort on delayed meaning scores also significantly depends on retrieval success. To 

investigate more in depth the moderated mediation process, the effect of the mediator was 

tested at different levels of the moderator variable, in this case, using the 16th, 50th, and 84th 

percentile. Table 35 presents the effect of study-phase retrieval effort on immediate meaning 

117 

 

 

scores separately at each of the three levels of study-phase retrieval success represented by the 

three percentiles. 

Table 35: Effect of effort at three levels of success for delayed meaning tests 

In the delayed meaning scores, similarly to the results of the previous two tests, 

 

retrieval effort was shown to only be beneficial with higher levels of retrieval success. 

However, here retrieval effort is only beneficial at the percentile of success. This is different 

from the previous two tests, where medium study-phase success percentile also showed a 

smaller though still significant benefit of effort. Recall that on the delayed meaning tests 

retrieval effort was shown not to significantly mediate beneficial effects of spacing as a main 

effect.  

The results of the moderated mediation analyses are in line with the predictions of the 

reminding account, which posits successful effortful retrieval as the mechanism underlying 

the effects of spacing. At least with regard to overt L2-L1 translation retrieval practice for L2 

vocabulary learning in a PAL format, the results show that beneficial effects of effort are 

conditional on a high level of retrieval success. While a nonmonotonic learning function of 

lag was not obtained in the posttest scores, the complex underlying relationships included a 

detrimental effect of spacing that operated through a lower rate of study-phase retrieval 

success at the longest ISI tested in the present study. This may have affected the magnitude of 

the lag effect in the present results. In general, while spacing effects are usually found to be 

large, lag effects are often found to be quite small and inconsistent (Maddox et al., 2018). The 

fact that the chance of successful retrieval may decrease the longer the lag between repeated 

118 

 

 

encounters or retrieval attempts might be one reason why increases in learning outcomes 

become smaller the longer the lag.       

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

119 

 

 

CHAPTER 5 

DISCUSSION 

The present research examined the contribution of the dual-mechanism of successful 

effortful retrieval to the effects of spacing overt L2-L1 translation retrieval practice on 

learning novel L2 vocabulary in a PAL format with immediate feedback study. It further 

investigated any effects of the amount of time a learner is given, per encounter and in total, 

for studying each Finnish word with its translation (presented as feedback after each retrieval 

attempt) on learning outcomes as well as on the operation of the two mechanisms of retrieval 

effort and success that are proposed to underlie effects of spacing or lag (Benjamin & Tullis, 

2010). Participants (L1 speakers of English) studied 72 novel simple and generic words (half 

repeating targets and half non-repeating controls) in a language that was completely novel to 

them (Finnish) in a PAL format, where they were required to attempt to produce the L1 

English translation for each L2 Finnish word before being presented with both members of 

the translation pair for study for either 3 or 9 seconds. The experimental targets repeated 

based on three repetition schedules: a massed schedule, a short-spaced schedule, and a long-

spaced schedule. Participants’ study-phase response latency and accuracy were recorded. 

Three immediate (30-min RI) and delayed (1-2 weeks RI) posttests measured participants’ 

learning gains in terms of form recognition ability and ability to produce and select L1 

translations for the 72 studied L2 Finnish words.  

The first research question asked whether spacing L2-L1 translation retrieval practice 

more widely has an effect on learning outcomes as measured by immediate and delayed form-

recognition and translation posttests and whether the time (3 vs. 9 seconds) learners are given 

for study of a Finnish-English translation pair immediately after each retrieval attempt makes 

120 

 

 

a difference for these scores. The results showed a spacing effect of a considerable size across 

the posttest types and RIs – in other words, the scores for words that were practiced in a 

massed fashion throughout the study phase were considerably lower than for those in the two 

spaced practice conditions, regardless of the type or time of test. Importantly, the difference 

between the massed practice condition and the no-practice control condition was very small, 

particularly in terms of the long-term gains, where, on the most challenging L2-L1 translation 

test, scores in the massed practice condition were not significantly different from those in the 

no-practice condition. Using a no-practice condition in the present study allowed to compare 

the effects of massed retrieval practice to no retrieval practice as well as to retrieval practice 

spaced at different intervals. The results suggest that despite the fact that retrieval practice is 

known to be beneficial for learning, massed practice is not an effective learning tool 

(sometimes producing learning equivalent to no practice at all), even if it involves retrieval. 

The present findings are in line with proposals that retrieval from short term memory may not 

involve processes that make retrieval beneficial for memory (Glover, 1989).  

The present study included three levels of lag within the same within-participant 

experiment. The results showed a significant lag effect (advantage of long-spaced practice 

over short-spaced practice) on the delayed meaning posttests but not on the immediate 

meaning posttests, where, on the latter, the longer-spaced condition actually produced slightly 

less learning than the short-spaced condition (a small non-significant nonmonotonic function 

of lag). No lag effect (but only a spacing effect) was observed in either of the two form-

recognition posttests, where the scores in the short- and long- spaced conditions were very 

similar. This pattern is in line with previous findings of more pronounced beneficial effects of 

lag the more challenging the task (Maddox, 2016). Further, the delayed posttests speak to 

121 

 

 

forgetting rates in the different conditions and the present findings are in line with previous 

psychology research showing that effects of spacing become more pronounced when 

knowledge is tested after a longer period of time (Bahrick, 1979; Cepeda, Vul, Rohrer, 

Wixted, & Pashler, 2008; Küpper-Tetzel & Erdfelder, 2012; Rawson & Kintsch, 2005; 

Rohrer, 2015; Serrano, & Huang, 2018). This suggests that longer spaced practice slows 

forgetting more effectively than does shorter spaced or massed practice. Thus, despite the fact 

that retrieval is beneficial for learning, the temporal distribution of retrieval practice may be 

crucial: massed practice may be not much better than no practice at all and longer intervals 

between repetitions may produce more robust knowledge that is forgotten more slowly than if 

the interval between repetitions is shorter. However, see below for the results of the mediation 

analysis that will show that there is a limit to how widely we can space repeated retrieval or 

study events before this begins to have a negative effect.  

The present results showed that longer study time has a small overall significantly 

positive effect on the posttest scores, particularly for knowledge of meaning. In the present 

study, longer study time refers to more time given for the participants to look at and 

maintenance rehearse the L2-L1 translation pair. Psychology studies have shown that 

maintenance rehearsal may not be effective for improving memory (e.g., Craik & Watkins, 

1873). It is likely true that an important difference between the present findings and those of 

such psychology studies is that looking at and maintenance rehearsing a novel L2 word form 

paired with its L1 translation may involve different mechanisms than rehearsing information 

such as well-known L1 words for a subsequent free recall test, which is the usual learning 

target in psychology experiments. According to the present results, the time participants are 

122 

 

 

allowed to study a foreign word with its meaning at each presentation in a PAL format might 

have a small beneficial effect on learning, particularly with spaced practice.  

Research has shown that learners are not effective at pacing their own study (Rundus, 

1971), often devoting more study time to items that they currently believe to be more difficult, 

such as to spaced rather than massed repetitions, when this impression may not always be 

accurate. It was an interesting question whether longer study time that is imposed externally 

can counteract negative consequences of massing repetitions. The obtained small size of the 

effect is different, however, from findings from prior SLA research that has shown 

considerable learning benefits of more attentional processing of L2 words (e.g., Godfroid et 

al., 2018; Koval, 2019). An important difference may be that in such prior research, learners 

were free to self-pace their study. This suggests that when longer study time is imposed 

externally it may not have benefits of the same magnitude as when a participant chooses to 

devote longer study time to a target word. This, in turn, suggests that the processes that 

underlie self-regulated and other-imposed longer study time are likely qualitatively different. 

Recall that the time participants were given for studying a word in the longer study time 

condition was three times longer than that in the short study time condition. However, the 

effect of longer study time was dramatically smaller than that of spacing practice, whereby 

posttest scores resulting from longer study time massed practice were dramatically lower than 

the scores resulting from shorter study time spaced practice, suggesting that spacing retrieval 

practice is a more powerful learning tool than externally imposing longer study time.   

The second research question asked whether increasing the interval between repeated 

retrieval events affects study-phase retrieval effort and success, as well as whether the amount 

of study time that is allowed per encounter affects the relationship between ISI and study-

123 

 

 

phase retrieval effort and success. The results of the study-phase latency analyses showed that 

increasing levels of ISI lead to increasing retrieval effort. Retrieval effort decreased slightly 

from repetition to repetition. The three ISI conditions showed a parallel decrease, with 

latencies in the long-spaced condition remaining longer than those in the short-spaced 

condition until the last repetition and latencies in the short-spaced condition, in turn, 

remaining longer than those in the massed condition until the last repetition. Thus, by 

increasing ISI, I was able to induce increasingly more retrieval effort, which is known to be 

beneficial for learning (Roediger & Karpicke, 2006). The amount of time allowed for study of 

the paired associates per repetition had a very small negative effect on the latencies that was 

not statistically significant.  

The results of study-phase retrieval success analyses showed that retrieval success rate 

increased with repetition in both spaced conditions, which showed parallel growth with a 

consistent higher number of successful retrieval attempts across all five repetitions in the 

short-spaced condition than that in the long-spaced condition. The results showed that (a) in 

the massed condition, retrieval was almost always successful, (b) in the short-spaced 

condition retrieval success was significantly less frequent than in the massed condition, (c) in 

the long-spaced condition retrieval success was significantly less frequent than in the short-

spaced condition, indicating that the longer the intervals are between retrieval attempts that 

are followed by feedback the less successful the retrieval is at respective subsequent retrieval 

attempts. A growth analysis further showed that there was a small significant positive effect 

of longer study time in the two spaced conditions but not in the massed condition.   

The third research question asked whether the effect of ISI on learning outcomes is 

mediated by the dual mechanism of successful effortful retrieval and whether the amount of 

124 

 

 

study time allowed per encounter moderates this relationship. The results of moderated 

mediation analyses showed that the amount of time allowed for study of feedback did not 

affect the operation of the two proposed underlying mechanisms. They further showed that 

despite the fact that an overall nonmonotonic function of lag was not obtained in the present 

learning outcomes, a negative effect of increasing ISI was still present and operated through a 

lower rate of study-phase retrieval success. Further, it was shown that retrieval success 

significantly moderated the beneficial effects of more effort that was induced by longer 

intervals between repetitions on all learning measures used in the present experiment. This 

confirms the predictions of the dual mechanism of retrieval effort and success that is proposed 

to underlie effects of spacing on learning by the reminding account of the spacing effect 

(Benjamin & Tullis, 2010). Recall that, according to this account, retrieval must be effortful 

yet successful. The present results showed that retrieval effort only had beneficial effects on 

learning when it was successful.  

It is surprising that similar results to those obtained from the L2-L1 translation and 

form-meaning matching posttests held for the form-recognition tests as well – that is, retrieval 

effort that is induced through wider spacing of repetitions is only beneficial for form 

recognition ability when retrieval is mostly successful. While longer retrieval effort should 

well benefit subsequent ability to recognize target L2 forms due to the fact that longer 

latencies represent here longer time spent visually processing the L2 form while participants 

searched their memory for its meaning (Kintsch & van Dijk, 1978), the finding of a benefit of 

successful retrieval and the finding that longer effort during retrieval attempts was only 

beneficial when retrieval was successful are quite puzzling. One way that this finding may be 

explained is that learning is known to be facilitated the more meaningful the stimulus (Marks 

125 

 

 

& Miller, 1964; Schulman, 1974). It may be that successfully retrieving a meaning associated 

with an L2 form affects learning of the form because it involves more meaningful processing 

of the L2 form during the process of retrieval.  

Despite the fact that in the present study each retrieval attempt was followed by 

feedback in the form of the target L2-L1 translation pair, failed retrieval attempts did not 

benefit from more effort. This is surprising as one would expect a more effortful search of 

one’s memory to result in higher quality processing of subsequently presented feedback, 

which should, in turn, benefit learning (Izawa, 1970; Kornell, Hays, & Bjork, 2009). Further, 

there have been proposals that a failed retrieval attempt that is followed by the presentation of 

feedback in the form of the target searched-for information should have no less learning 

potential (or even greater learning potential) as does a successful retrieval attempt (Bahrick & 

Hall 2005; Pashler, Zarow, & Triplett, 2003). The present results showed that lower study-

phase retrieval success rate that resulted from spacing retrieval attempts more widely had a 

negative effect on learning even in the presence of feedback. Further, lower rate of retrieval 

success that resulted from spacing interfered with beneficial effects of retrieval effort, even in 

the presence of immediate feedback following each retrieval attempt. This may be due to the 

fact that failed retrieval attempts that are followed by feedback do not constitute true retrieval 

events but only constitute input processing that may, nonetheless, be enhanced by the 

preceding retrieval attempt. 

In the present study, learning gains followed a monotonic function of lag, at least for 

the longer-term gains and for the more challenging tasks of L2-L1 translation and form-

meaning matching. This was despite the negative effect of ISI that operated through a lower 

rate of study-phase retrieval success. One reason for the monotonic function may be the fact 

126 

 

 

that the study-phase task used involves retrieval, which may produce stronger memory traces 

at each repetition that are more likely to survive longer ISIs (Verkoeijen et al., 2005). This 

may have prevented dramatic study-phase retrieval failure with the longest ISI used, which in 

turn failed to have a dramatic negative impact on learning.  

Studies investigating effects of equal versus expanding spacing schedules on learning 

have mostly found an advantage of equally-spaced schedules over expanding schedules 

(Balota et al., 2006; Carpenter & DeLosh, 2005; Logan & Balota, 2008; Storm et al., 2010). 

Recall that the main purpose of an expanding schedule is to ensure study-phase retrieval 

success that can be achieved in this case with progressively longer ISIs. Less learning in the 

expanding schedules is often attributed to such higher rate of study-phase retrieval success 

that is promoted through such schedules. Thus, it is argued by some that more failure during 

study phase may be beneficial. A number of other studies have shown that more study-phase 

performance failures result in superior learning outcomes (Bahrick & Hall, 2005; Pashler et 

al., 2003). In the present study, on the surface, the same pattern seems to hold: the long-

spaced condition produced the lowest study-phase performance success but learning in this 

condition was superior in the long term. However, the moderated mediation analyses showed 

that study-phase performance failure still had a negative effect on learning outcomes. Further, 

effort put into retrieval attempts that were mostly unsuccessful did not have a positive effect 

on learning as it should be expected to have for learning from retrieval practice (Bjork, 1975; 

Glover, 1989; Whitten & Bjork, 1977). The present results suggest that a balance must be 

struck between study-phase performance success and effort: It appears that the higher effort 

that is produced by longer ISIs has a powerful beneficial effect on learning providing that 

retrieval is successful, even when feedback follows the retrieval attempt. Differences in 

127 

 

 

learning gains are therefore likely to be due to the fact that the words that are retrieved 

successfully though with difficulty are remembered better than those that are not retrieved or 

are retrieved with minimal effort.  

Pedagogical implications 

 

 

The findings of the present research have important implications for second language 

vocabulary teaching and learning. The present findings indicate, first of all, that despite the 

fact that retrieval practice is believed to promote learning in and of itself, how closely 

together or widely apart retrieval events occur has very important consequences for L2 

vocabulary learning outcomes. Using a control condition in the present study allowed me to 

evaluate the contribution of time spent on retrieval practice under different levels of lag 

against no practice at all and only a single study event. If retrieval events occur consecutively 

or in very close succession, such practice may have little to no positive effect on learning, 

particularly in the long term. Despite the fact that study in the control condition did not 

involve any true retrieval attempts and only involved one study event that was 3 or 9 seconds 

in duration, whereas massed practice involved five true (and predominantly successful) 

retrieval events and six times longer total study of a translation pair, the difference in learning 

outcomes between these two conditions was very small in the short term and not statistically 

different from zero on some measures in the long term. This finding suggests that increasing 

the number of retrieval-restudy events that occur consecutively or closely together (even if 

this is increased from zero to five retrieval attempts) does not improve learning gains by much 

and may not be a good way to use study time. Learners are known to often engage in such 

self-drilling, whereby they repeat a given word with its translation for a considerable length of 

time, believing that the longer they rehearse it the better it will be remembered; or test 

128 

 

 

themselves on an item that was very recently seen and while retrieval is still very easy 

because the information still resides in working memory. The present research shows no 

benefit of such drilling or massed retrieval practice over a single short study event, which, in 

turn, may produce results that are not significantly different from zero learning gains in terms 

of long-term retention. To use time effectively, I recommend, therefore, to space retrieval-

restudy events. The present results confirm arguments in prior research that spacing practice 

can help save time: spacing repeated retrieval practice does not require much additional study 

time beyond the longer time it takes to retrieve the target information; however, it results in 

far superior learning gains that are more robust to forgetting. With massed practice, it might 

take much more study to achieve the same learning outcomes (Maddox & Balota, 2015). For 

learners, I would recommend adopting a more spaced schedule for self-testing and to attempt 

retrieval of the studied material only once they feel that some, though not complete, forgetting 

of the target information has occurred. This can be done by interleaving retrieval-restudy of 

different information rather than using blocked study. Thus, for example, if a learner is 

studying 20 words with their translations, they may wish to go through the entire list before 

revisiting any given item rather than devoting a number of consecutive retrieval-restudy 

events to the same item before moving on to the next item. To use time more efficiently, the 

learner may also wish to cut study of the same item short as soon as they feel that it has been 

encoded in memory, without engaging in rehearsal, if the information is to be revisited 

repeatedly. 

Longer intervals between retrieval attempts can be used to enhance learning from 

retrieval practice and slow forgetting of learned material. The higher retrieval effort that 

results from longer intervals between repetitions underlies these benefits of more widely 

129 

 

 

spaced retrieval practice. However, the benefit of increased retrieval effort is conditional on 

retrieval success. This means that, while wider spacing of repeated retrieval is beneficial, 

retrieval attempts must be scheduled such that retrieval is still mostly successful, which means 

that retrieval attempts should be spaced but not too widely spaced so that retrieval fails, even 

when feedback is provided after each such retrieval attempt. The provision of feedback after 

each retrieval attempt did not cancel out the negative effects of study-phase retrieval failure, 

suggesting that study-phase retrieval success is important for learning of L2 words with their 

meaning and its absence cannot be offset by the presentation of the target information as 

feedback immediately following a retrieval attempt. I recommend that intervals used in 

retrieval practice, such as those determined by various computer vocabulary learning 

programs that present words in a format such as PAL or the flashcard method and that use 

immediate presentation of feedback, need to be spaced rather than massed in order to make 

retrieval practice more effortful. However, they should not be spaced so widely as to lead to 

dramatic levels of retrieval failure, as this may cancel out the positive effects of retrieval 

effort and lead to diminished learning. 

When selecting a retrieval practice schedule, we need to take into account the 

probability of successful retrieval given our specific circumstances and learner variables. 

Thus, we need to ensure that while increasing intervals between repeated retrieval events 

produces higher amounts of effort these should not be spaced so widely as to lead to failed 

retrieval during study, as in such a situation effort may no longer have its positive effects. 

Many different variables may affect study-phase retrieval success. These may be the difficulty 

of the studied information (the more difficult it is, the lower the chance of successful retrieval 

after considerable time), the age group and memory ability of our target population, the 

130 

 

 

complexity and interference potential of the intervening material or activity (which may 

produce more forgetting, resulting in a lower chance of successful retrieval). Thus, for 

example, the complexities of more naturalistic contexts, such as those found in classroom 

learning are likely to decrease the probability of successful study-phase retrieval, interfering 

with benefits of spacing study more widely (Rogers & Cheung, 2018; Suzuki & DeKeyser, 

2017).  

Increasing the time, per encounter and in total, that a learner is given to study an L2 

word presented with its meaning, such as longer presentation rate in PAL software, has a 

small overall beneficial effect on memory for a target word and its meaning and also increases 

the chance of successful retrieval in overt L2-L1 retrieval practice, which was shown in the 

present study to be important for learning outcomes. Increasing study time does not, however, 

counteract the negative effects of massing practice, even if such practice involves retrieval. 

Previous research showed that more attentional processing of the target words leads to more 

learning (Godfroid et al., 2018; Godfroid, et al., 2013) and may be the reason spacing 

repeated study results in greatly superior learning outcomes (Koval, 2019; Rundus, 1971). 

The present results suggest that large benefits of longer study time may be limited to self-

regulated learner choice to allocate more attention or effort and may not have benefits of the 

same size when longer duration is externally imposed on the learner. Therefore, our efforts 

should be aimed at getting learners to choose to allocate more attention/study time/effort to 

target forms, such as, for example by using spacing (Koval, 2019) rather than imposing longer 

study time externally. Computer programs that present immediate feedback after each 

retrieval attempt need not make feedback presentation longer than is reasonably enough for 

successful encoding of the information (without additional time to simply rehearse), as doing 

131 

 

 

so appears not to have much benefit and may, therefore, not represent efficient use of study 

time.  

Finally, The results suggest that if there is a chance that a learner may be able to 

retrieve a given target piece of information from memory, they should be allowed to take the 

time they need to do so rather than being presented with the information before the retrieval 

process is complete. It is often tempting, in the interest of time, to present information that a 

learner might take a long time to retrieve on their own. However, if we rush to present the 

target information before a learner completes a potentially successful retrieval attempt, this 

may constitute a less powerful learning event than if the information is fully retrieved from 

memory.  

Limitations and suggestions for future research 

The present study has a number of limitations. One of them is the fact that response 

latencies were measured through a button press, which is not as precise a method as voice-

activated recording of latencies, for example. Further, the modality was different between 

study and test: oral translation was the task during the study phase (and it was timed) but the 

posttests were in the written modality and participants were given unlimited time to provide 

their written responses.  

The present study investigated the contribution of the dual mechanism of successful 

effortful retrieval to lag effects in L2 vocabulary learning. Retrieval was operationalized as 

overt retrieval of the L1 translations for target L2 words in a paired-associate learning format. 

The results confirmed an important contribution of successful effortful overt L2-L1 

translation retrieval to L2 vocabulary learning benefits that come from spacing retrieval 

practice more widely. It is important to note, however, that overt L2-L1 translation retrieval is 

132 

 

 

only one type of retrieval practice and only one type of retrieval. This type of retrieval is 

pedagogically interesting primarily because it can be observed. It is an important question 

whether we need to schedule repeated retrieval events such that they are effortful but still 

successful in overt retrieval practice, a question that leads to very straightforward pedagogical 

recommendations. Future research also needs to supplement the present results with an 

investigation of L1-L2 retrieval practice. Such an investigation is also likely to result in very 

important pedagogical recommendations that can be applied with relative ease. Based on the 

findings of the present research, it is quite likely that L1-L2 practice might show a very 

different pattern in terms of the effect of ISI on learning. The reason for such an expectation is 

due to the fact that L1-L2 translation, particularly with novel words, is a more challenging 

task, which is likely to result in dramatically less retrieval success at longer ISIs, which was 

shown in the present experiment to have a negative effect on learning and also to interfere 

with beneficial effects of retrieval effort. Shorter ISIs may be found to be more beneficial. 

Such an investigation may further capture a nonmonotonic function of lag in learning 

outcomes, which was not observed in the present experiment.  

The underlying mechanism of the effects of spaced repeated study of L2 material in a 

learning situation that does not involve overt retrieval may still depend on a covert retrieval 

process. Future studies need to explore the contribution of covert retrieval to any effects of 

spacing study more widely in such learning tasks as well. Such covert retrieval can be 

observed through tests of simple recognition or through indirect memory tests such as 

facilitation, or speed-up, in task performance. In Koval (2019), for example, I examined 

facilitation in reading times on L2 words in my spaced condition with the help of eye-

tracking. Significant facilitation was observed in the spaced condition that could not be 

133 

 

 

attributed to simple effects of time. I concluded that such facilitation indicates a study-phase 

retrieval process in my spaced condition, which likely contributed the considerable beneficial 

effects of spaced study in my experiment. However, I did not intentionally attempt to vary 

study-phase retrieval success, but only explored this post hoc. Future studies should attempt to 

induce study-phase covert retrieval failure through the use of wider spacing to investigate the 

mechanisms underlying spaced study in learning situations that do not involve overt retrieval.  

Although the present study captured negative effects of study-phase retrieval failure, it 

did not capture a nonmonotonic lag function in learning outcomes. This is most likely due to 

the fact that while longer ISIs did produce more study-phase retrieval failure, the failure rate 

was not dramatic. One reason for this may be the fact retrieving an L1 translation for an L2 

word is not as difficult a task as L1-L2 translation, for example. Further, because retrieval 

practice produces stronger memory traces at each repetition, the type of practice used in the 

present experiment may have further promoted stronger memory traces at each repetition, 

resulting in a higher rate of study-phase retrieval success. Future research will need to 

investigate the dual mechanism proposed by the reminding account within a task that may not 

establish such strong memory traces at each repetition, such as incidental learning of 

vocabulary from reading comprehension activities (Verkoeijen, et al., 2005). Another reason 

for not capturing a nonmonotonic function may simply be the fact that the interval between 

the encounters was not long enough or the intervening activities did not produce enough 

interference to have sufficient negative impact on study-phase retrieval success and 

consequently on learning from repeated encounters in this experiment. Future research needs 

to test the dual mechanism of study-phase retrieval effort and success with longer ISIs. 

134 

 

 

The present results showed that study-phase retrieval failure had a negative effect on 

learning outcomes and also cancelled any positive effects of retrieval effort. This is contrary 

to what has been argued in some proposals in psychology research. Thus, for example, higher 

rate of study-phase retrieval success is argued to be the reason for expanding schedules 

producing less learning than do equally-spaced schedules (Bahrick & Hall, 2005; Pashler et 

al., 2003). Future studies investigating the effects of expanding schedules need to measure 

effort as well as retrieval success during the study phase in order to be able to make stronger 

arguments about the complex interplay of retrieval effort and success that may underlie any 

effects of differentially-intervalled schedules. It may be that the performance success that is 

supported by such an expanding schedule also has the effect of decreasing effort, 

counteracting any effects of longer ISIs on study-phase retrieval effort. 

The present research investigated the effects of study time at each repetition, which 

was externally imposed and held at two levels of 3 and 9 seconds. Future research would need 

to investigate whether the same pattern is observed with activities that involve elaborative 

rather than maintenance rehearsal (Stoff & Eagle, 1971). Further, study time is only one 

potentially relevant variable that may affect the operation of the underlying mechanisms of 

retrieval effort and success. Other relevant variables are numerous. An investigation of their 

effects on the underlying mechanisms is an important direction for future research. Such 

research may provide a fuller picture of the conditions under which various amounts of 

spacing may be beneficial or detrimental for L2 learning outcomes. Further, in the present 

study, participants studied novel L2 words that represented simple and generic concepts, in a 

completely novel language, from six repeated L1 translation retrieval attempts that were 

followed by feedback, within one study session. Future research needs to examine other tasks 

135 

 

 

and learning contexts and other learning targets, as well as other learner proficiencies. It will 

be important also to test the effects of different numbers of repetitions to explore the effects of 

relevant variables on the relationship between ISI and learning rate or speed: it may be that 

fewer repetitions will be needed with spaced practice (Maddox & Balota, 2015) although this 

may, in turn, depend on other relevant variables and their effects on the mechanisms that 

underlie learning from different levels of ISI.  

 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

136 

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

APPENDICES 

 
 
 
 
 
 
 

137 

 

 

 

rakennus  
lehtien  
sulka 
sanky 
solmio 
muna 
pusero 
vasara 
ruoka  
sormi 
lelu 
maaseutu 
verho 
avain 
taskuun 
lahja 
lippu 
orja 
kyna 
hammas 
hiekka 
keitto 
ajoneuvo 
toimisto  
savuke 
lumi 
katu 
kengat 
omena 
siipi 
parveke 
kalastaa 
metsa 
mekko 
lompakko 
tehdas 

 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  
 =  

APPENDIX A 

 

Target Finnish words with their English translations 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

building 
leaf 
feather 
bed 
tie 
egg 
shirt 
hammer 
food 
finger 
toy 
village 
curtain 
key 
pocket 
gift 
flag 
worker 
pen 
tooth 
sand 
soup 
car 
office 
cigarette 
snow 
street 
shoe 
apple 
wing 
balcony 
fish 
forest 
dress 
wallet 
factory 

 

bag 
fruit 
butterfly 
air 
bridge 
tail 
glove 
child 
dog 
arrow 
yard 
perfume 
sky 
teacher 
town 
bread 
table 
story 
cow 
cloud 
sun 
grain 
brother 
shower 
juice 
library 
pumpkin 
scarf 
bird 

 = 
laukku 
 = 
hedelma 
perhonen    = 
 = 
ilma 
silta 
 = 
 = 
pyrsto 
 = 
kasine 
 = 
lapsi 
koira 
 = 
 = 
nuoli 
 = 
piha 
hajuvesi 
 = 
 = 
taivas 
 = 
opettaja 
kaupunki 
 = 
 = 
leipa 
 = 
poyta 
 = 
tarina 
lehma 
 = 
 = 
pilvi 
 = 
aurinko 
jyva 
 = 
 = 
veli 
 = 
suihku 
mehu 
 = 
 = 
kirjasto 
 = 
kurpitsa 
 = 
huivi 
lintu 
 = 
 =  woman 
nainen 
 = 
veitsi 
pelia  
 = 
 = 
norsu 
 = 
lusikka 
lattia  
 = 
 = 
kuva 

knife 
game 
elephant 
spoon 
floor 
picture 

138 

 

 

APPENDIX B 

 

Information on the English translations 

 

Table 36: Frequency and concreteness indices for the English translations for the target 
words 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

139 

 

 

APPENDIX C 

 

Instructions for vocabulary posttests 

 

Form-Recognition Test: Please underline the Finnish words that you recognize as ones you 
have studied during the study phase in this experiment. 

L2-L1 Translation Test: Please write the English translation next to each Finnish word 
below. 

Form-meaning Matching Test: Please write the number of each of the English translations 
below next to its Finnish word on sheet A if you were unable to produce its translation from 
memory. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

140 

 

 

APPENDIX D 

 

The form recognition test 

Please underline the Finnish words that you recognize as ones you have studied during the 
study phase in this experiment: 

sisavuoren  

silta 

syyta  

sulka 

nalka 

vasara 

akuutin  

rakennus  

kaveri 

lintu 

valissa 

terve 

arvostettu 

ihmisen  

taskuun 

lapsi 

jaatelo 

ostokset 

vuohi 

joihin  

opetuksen  

veitsi 

sormi 

tilannetta 

vihdoin  

sianliha 

jyva 

tikkua 

kuumeesta 

intohimo 

lusikka 

piha 

nykyinen  

tehdas 

pelia  

verho 

taikausko  

kuva 

141 

 

pusero 

ehka 

leijona 

perhonen  

naen  

sana 

rasva 

kuivempi  

vanhasta  

lasnaol 

ohittaa  

lippu 

ilma 

taso 

palvelu 

rento 

kartano  

valmis 

pakastin 

 

Appendix D (cont’d) 

kyseessa  

lompakko 

verisia  

lehtien  

keitto 

osamisen 

hakijaa 

suihku 

hiekka 

kuten  

ampari 

kahta  

avuton 

lattia  

neste 

kansio 

nummi 

pojan  

opettaja 

paistoi  

toimisto  

taivas 

esitys  

sanky 

puhelin 

uhattuna  

vahvuus 

orja 

oikea 

lopuksi  

hyppasi  

samoin 

siipi 

huulet 

etsia  

tapahtu  

parveke 

laukku 

nelja  

aion  

eniten  

sitruuna 

loyhia  

muna 

mahtava  

142 

 

eivat  

kaiuttim 

yllatys 

savuke 

aurinko 

voileipa 

hammas 

herne 

yhdeksan 

saimme  

metsa 

mutkai 

loput  

luonnolli 

maaseutu 

ajaksi  

muodossa  

ongelma  

lehma 

iloinen  

jalka 

kasine 

haaste 

 

Appendix D (cont’d) 

jainen  

lahja 

vihaani  

hauska 

koyha  

mekko 

kierrosta  

selostus  

solmio 

ottivat  

upea 

paras  

kertoo  

pysyy  

lelu 

nainen 

pyrsto 

uhkaa  

aasi 

katu 

hiljaa  

epatoivo 

erittain  

ajoneuvo 

huivi 

naytto 

kunnes  

teltta 

alueella  

olka 

antoi  

peto 

sohva 

leipa 

ruoka  

pekoni 

kehui  

summia  

tehneet  

kalastaa 

seka 

ankkuria  

rotko 

riskin 

pilvi 

143 

 

kirjasto 

harvat 

tarve 

kutsua  

portaat 

asettua  

hallussa 

hedelma 

levisi  

nuoli 

kolme  

rullaa  

tykonsa  

viinissa 

takki 

otat  

voimalle 

poyta 

veli 

uudesta 

mehu 

anteeksi  

tapa 

 

kurpitsa 

ikioma  

suojaus 

omena 

menestys 

lumi 

asteikko 

kaupunki 

masennus 

yskimaan 

runous 

heista  

puhtaus 

koira 

nosti  

kyna 

avain 

papu 

peruna 

varpaat 

kengat 

sataa 

vehna

norsu 

vainajan  

laastarin 

rohkein 

etelaisen  

uusi  

syvenee  

tarina 

sopeutua 

lounas 

hajuvesi 

olennaisia  

144 

 

 

APPENDIX E 

 

The L2-L1 translation test 

Please write the English translation next to each Finnish word below:  

taskuun 

lompakko 

katu 

veitsi 

omena 

pilvi 

mekko 

ruoka  

toimisto  

hajuvesi 

sanky 

ajoneuvo 

lelu 

lusikka 

taivas 

kaupunki 

pelia  

lumi 

tehdas 

leipa 

huivi 

silta 

nuoli 

lattia  

savuke 

orja 

lintu 

muna 

pusero 

kasine 

mehu 

veli 

hedelma 

parveke 

hammas 

opettaja 

lapsi 

lahja 

145 

 

 

Appendix E (cont’d) 

koira 

kengat 

norsu 

sormi 

piha 

kurpitsa 

jyva 

sulka 

lehma 

ilma 

pyrsto 

rakennus  

keitto 

poyta 

avain 

aurinko 

lehtien  

kuva 

maaseutu 

verho 

siipi 

kalastaa 

vasara 

suihku 

hiekka 

kyna 

nainen 

tarina 

kirjasto 

metsa 

perhonen  

laukku 

solmio 

lippu 

146 

 

 

APPENDIX F 

 

The form-meaning matching test 

Please write the number of each of the English translations below next to its Finnish word on 
sheet A if you were unable to produce its translation from memory:  

1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 

9. 

teacher 

scarf 

shirt 

worker 

town 

street 

table 

elephant 

hammer 

10. 

gift 

11. 

sand 

12. 

forest 

13. 

bed 

14. 

apple 

15. 

sky 

16. 

tie 

17. 

curtain 

18. 

building 

19. 

toy 

20. 

wing 

21. 

tooth 

22. 

balcony 

23. 

bag 

24. 

snow 

25. 

grain 

26. 

picture 

27. 

egg 

28. 

wallet 

29. 

pumpkin 

30. 

bread 

31. 

story 

32. 

yard 

33. 

brother 

34. 

bridge 

35. 

knife 

36. 

tail 

147 

 

 

Appendix F (cont’d) 

37. 

sun 

38. 

game 

39. 

shoe 

40. 

feather 

41. 

key 

42. 

butterfly 

43. 

cow 

44. 

glove 

45. 

car 

46. 

library 

47. 

office 

48. 

perfume 

49. 

cigarette 

50. 

air 

51. 

child 

52. 

fruit 

53. 

soup 

54. 

shower 

55. 

factory 

56. 

arrow 

57. 

food 

58. 

leaf 

59. 

juice 

60. 

pocket 

61. 

dress 

62. 

dog 

63. 

village 

64. 

flag 

65. 

fish 

66. 

woman 

67. 

spoon 

68. 

pen 

69. 

finger 

70. 

bird 

71. 

cloud 

72. 

floor 

148 

 

 

APPENDIX G 

 

Linguistic Background Questionnaire 

Background Questionnaire  

 
1. Participant Number ____________    2. Gender:   M__    F__       3. Age:  _____ 
4. Native Language(s)   ______________ 
5. Home country or countries:  ______________________________________ 
6. What languages have you studied? 

Language 

How long 
have you 
been 
studying it? 

Age at which 
you began 
studying the 
language 

                      
                     Proficiency 
 
Very poor 

 
 
 
Excellent 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

        1         2           3          4          5 

        1         2           3          4          5 

        1         2           3          4          5 

        1         2           3          4          5 

        1         2           3          4          5 

 
7. Is there anything else you would like to tell us about your language background? If so, 
please write it here (you can also use the back of this sheet): 
 
 
8. Did any of the words that you studied in the experiment strike you as familiar upon initial 
encounter?  ___ YES       ____ NO 
9. If you indicated YES above, please explain: 
 

 
 
 
 
 
 
 
 
 
 

149 

 

 

APPENDIX H 

 

Study-phase instructions 

In this experiment, you will study Finnish words with their translations.  
 
You will see words from the Finnish language appear one at a time in the middle of the screen 
with a question mark prompting you to provide its English translation (jipt -- _____ ?). If you 
believe that you have studied the word with its translation, please provide the translation by 
saying it aloud. If you do not believe that you have studied the translation for a given word or 
if you cannot remember the translation, please say “I don’t know”. Your response time and 
accuracy will be recorded.  
 
Please try to recall the English translation even if it requires you to think longer.  
 
Please do not try to guess by simply saying translations you have seen if you do not, in fact, 
consider that it is associated with the word in question.  
 
Sometimes a Finnish word will be presented with its English translation (jipt -- courage). 
When this happens, please study the word and its translation for as long as it remains on the 
screen. Please study the Finnish word with its translation each time until it disappears from 
the screen. Even if you feel that you know the word well while it is still shown, please 
continue studying it until it disappears. 
 
As soon as a word disappears from the screen and a new word appears, please switch your 
attention to the new word at once and focus only on the word that is currently being 
presented. 
There will be a test on your knowledge of the Finnish words and their translations after this 
study phase.   
 
 
 
 
 
 
 
 

 

 

 

150 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
 

REFERENCES 

151 

 

 

REFERENCES 

 

Allen, G.A., Mahler, W.A., & Estes, W.K. (1969). Effects of recall tests on long-term 

retention of paired associates. Journal of Verbal Learning & Verbal Behavior, 8, 463–
470. 

Arnold, K. M., & McDermott, K. B. (2013). Test-potentiated learning: Distinguishing 

between direct and indirect effects of tests. Journal of Experimental Psychology: 
Learning, Memory, & Cognition, 39(3), 940. 

Appleton-Knapp, S. L., Bjork, R. A., & Wickens, T. D. (2005). Examining the spacing effect 
in advertising: Encoding variability, retrieval processes, and their interaction. Journal 
of Consumer Research, 32(2), 266–276. 

Atkinson, R.C., & Shiffrin, R.M. (1968). Human memory: A proposed system and its control 

processes. In K.W. Spence & J.T. Spence (Eds.), The psychology of learning and 
motivation: Advances in research and theory (Vol. 1, pp. 90-195). New York: 
Academic Press. 

Auble, P. M., & Franks, J. J. (1978). The effects of effort toward comprehension on recall. 

Memory & Cognition, 6(1), 20–25. 

Baayen, R. H., Piepenbrock, R., & van Rijn, H. (1995). The CELEX Lexical Database. 
Release 2 [CD-ROM]. Linguistic Data Consortium, University of Pennsylvania, 
Philadelphia. 

Bahrick, H. P. (1979). Maintenance of knowledge: Questions about memory we forgot to ask. 

Journal of Experimental Psychology: General, 108(3), 296. 

Bahrick, H. P., Bahrick, L. E., Bahrick, A. S., & Bahrick, P. E. (1993). Maintenance of 

foreign language vocabulary and the spacing effect. Psychological Science, 4, 316–
321. 

Bahrick, H. P., & Phelphs, E. (1987). Retention of Spanish vocabulary over 8 years. Journal 

of Experimental Psychology: Learning, Memory, & Cognition, 13(2), 344. 

Balota, D. A., Duchek, J. M., & Paullin, R. (1989). Age-related differences in the impact of 

spacing, lag and retention interval. Psychology & Aging, 4, 3–9. 

Balota, D. A., Duchek, J. M., Sergent-Marshall, S. D., & Roediger, H. L., III. (2006). Does 

expanded retrieval produce benefits over equal-interval spacing? Explorations of 
spacing effects in healthy aging and early stage Alzheimer’s disease. Psychology & 
Aging, 21, 19–31. 

152 

 

 

Bahrick, H. P., & Hall, L. K. (2005). The importance of retrieval failures to long-term 

retention: A metacognitive explanation of the spacing effect. Journal of Memory & 
Language, 52(4), 566–577. 

Barcroft, J. (2007). Effects of opportunities for word retrieval during second language 

vocabulary learning. Language Learning, 57, 35–56.  

Batchelder, W. H., & Riefer, D. M. (1980). Separation of storage and retrieval factors in free 

recall of clusterable pairs. Psychological Review, 87, 375–397. 

Begg, I., & Green, C. (1988). Repetition and trace interaction: Super-additivity. Memory & 

Cognition, 16(3), 232–242. 

Bellezza, F. S., Winkler, H. B., & Andrasik, F. (1975). Encoding processes and the spacing 

effect. Memory & Cognition, 3(4), 451–457. 

Benjamin, A.S., Bjork, R.A., & Schwartz, B.L. (1998). The mismeasure of memory: When 
retrieval fluency is misleading as a metamnemonic index. Journal of Experimental 
Psychology: General, 127, 55–68. 

Benjamin, A. S., & Ross, B. H. (2010). The causes and consequences of reminding. In A. S. 
Benjamin (Ed.), Successful remembering and successful forgetting: A Festschrift in 
honor of Robert A. Bjork (pp. 71–87). New York: Psychology Press. 

Benjamin, A. S., & Tullis, J. (2010). What makes distributed practice effective? Cognitive 

Psychology, 61(3), 228–247. 

Bird, S. (2010). Effects of distributed practice on the acquisition of second  language English 

syntax. Applied Psycholinguistics, 31, 635–650. 

Birnbaum, I. M., & Eichner, J. T. (1971). Study versus test trials and long-term retention in 

free-recall learning. Journal of Verbal Learning & Verbal Behavior, 10, 516–521. 

Bjork, R. A. (1975). Retrieval as a memory modifier. In R. Solso (Ed.), Information 

processing and cognition: The Loyola Symposium (pp. 123–144). Hillsdale, NJ: 
Erlbaum. 

Bjork, R. A. (1994). Memory and metamemory considerations in the training of human 

beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about 
knowing (pp. 185–205). Cambridge, MA: MIT Press. 

Bjork, R. A. (1999). Assessing our own competence: Heuristics and illusions. In D. Gopher, 
& A. Koriat (Eds.), Attention and performance (pp. 435–459). Cambridge, MA, US: 
The MIT Press. 

Bjork, R. A. (2013). Desirable difficulties perspective on learning. In H. Pashler (Ed.), 

Encyclopedia of the mind (pp. 242–244). Thousand Oaks, CA: Sage. 

153 

 

 

Bjork, R. A., & Allen, T. W. (1970). The spacing effect: Consolidation or differential 

encoding? Journal of Verbal Learning & Verbal Behavior, 9(5), 567–572. 

Bloom, K.C., & Shuell, T.J. (1981). Effects of massed and distributed practice on the learning 

and retention of second-language vocabulary. The Journal of Educational Research, 
74, 245–248. 

Bower, G. H. (1972). Stimulus-sampling theory of encoding variability. Coding Processes in 

Human Memory, 3, 85–123. 

Braun, K., & Rubin, D. C. (1998). The spacing effect depends on an encoding deficit, 
retrieval, and time in working memory: Evidence from once-presented words. 
Memory, 6, 37–65.  

Bruce, D., & Weaver, G. E. (1973). Retroactive facilitation in short-term retention of 

minimally learned paired associates. Journal of Experimental Psychology, 100, 9–17. 

Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand 
generally known English word lemmas. Behavior Research Methods, 46(3), 904–911. 

Bugelski, B. R. (1962). Presentation time, total time, and mediation in paired-associate 

learning. Journal of Experimental Psychology, 63(4), 409–412.   

Bui, D. C., Maddox, G. B., & Balota, D. A. (2013). The roles of working memory and 

intervening task difficulty in determining the benefits of repetition. Psychonomic 
Bulletin & Review, 20(2), 341–347. 

Calkins, M.W. (1894). Association: I. Psychological Review, 1, 476– 483. 

Callan, D., & Schweighofer, N. (2010). Neural correlates of the spacing effect in explicit 

verbal semantic encoding support the deficient-processing theory. Human Brain 
Mapping, 31(4), 645–659. 

Carpenter, S. K., Cepeda, N. J., Rohrer, D., Kang, S. H., & Pashler, H. (2012). Using spacing 

to enhance diverse forms of learning: Review of recent research and implications for 
instruction. Educational Psychology Review, 24(3), 369–378. 

Carpenter, S. K., & DeLosh, E. L. (2005). Application of the testing and spacing effects to 

name learning. Applied Cognitive Psychology: The Official Journal of the Society for 
Applied Research in Memory & Cognition, 19(5), 619–636. 

Carpenter, S. K., Pashler, H., & Vul, E. (2006). What types of learning are enhanced by a 

cued recall test?. Psychonomic Bulletin & Review, 13(5), 826–830. 

Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention. Memory & 

Cognition, 20(6), 633–642. 

154 

 

 

Cepeda, N. J., Coburn, N., Rohrer, D., Wixted, J. T., Mozer, M. C., & Pashler, H. (2009). 

Optimizing distributed practice: Theoretical analysis and practical implications. 
Experimental Psychology, 56(4), 236–246. 

Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in 

verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 
132(3), 354. 

Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Spacing effects in 

learning: A temporal ridgeline of optimal retention. Psychological Science, 19(11), 
1095–1102. 

Challis, B. H. (1993). Spacing effects on cued-memory tests depend on level of processing. 

Journal of Experimental Psychology: Learning, Memory, & Cognition, 19, 389–396. 

Collins, L., Halter, R. H., Lightbown, P. M., & Spada, N. (1999). Time and the distribution of 

time in L2 instruction. TESOL Quarterly, 33(4), 655–680. 

Commins, S., Cunningham, L., Harvey, D., & Walsh, D. (2003). Massed but not spaced 

training impairs spatial memory. Behavioural Brain Research, 139(1-2), 215–223. 

Craik, F.I.M. (1970). The fate of primary memory items in free recall. Journal of Verbal 

Learning & Verbal Behavior, 9, 143–148. 

Craik, F. I., & Lockhart, R. S. (1972). Levels of processing: A framework for memory 

research. Journal of Verbal Learning & Verbal Behavior, 11(6), 671–684. 

Craik, F. I., & Watkins, M. J. (1973). The role of rehearsal in short-term memory. Journal of 

Verbal Learning & Verbal Behavior, 12(6), 599–607. 

Crowder, R.G. (1976). Principles of learning and memory. Hillsdale, NJ: Erlbaum. 

Cuddy, L.J., & Jacoby, L.L. (1982). When forgetting helps memory: An analysis of repetition 

effects. Journal of Verbal Learning & Verbal Behavior, 21, 451–467.  

Cull, W.L. (2000). Untangling the benefits of multiple study opportunities and repeated 

testing for cued recall. Applied Cognitive Psychology, 14, 215–235. 

Cull, W.L., Shaughnessy, J.J., & Zechmeister, E.B. (1996). Expanding understanding of the 

expanding-pattern-of-retrieval mnemonic: Toward confidence in applicability. Journal 
of Experimental Psychology: Applied, 2, 365–378. 

D'Agostino, P. R., & DeRemer, P. (1973). Repetition effects as a function of rehearsal and 

encoding variability. Journal of Verbal Learning & Verbal Behavior, 12(1), 108–113. 

Davis, C. J. (2005). N-Watch: A program for deriving neighborhood size and other 

psycholinguistic statistics. Behavior Research Methods, 37(1), 65–70. 

155 

 

 

de Jonge, M., Tabbers, H. K., Pecher, D., & Zeelenberg, R. (2012). The effect of study time 

distribution on learning and retention: A Goldilocks principle for presentation rate. 
Journal of Experimental Psychology: Learning, Memory, & Cognition, 38(2), 405. 

Delaney, P. F., Godbole, N. R., Holden, L. R., & Chang, Y. (2018). Working memory 

capacity and the spacing effect in cued recall. Memory, 26(6), 784–797. 

Deisig, N., Sandoz, J-C,Giurfa, M., Lachnit, H. (2007). The trial-spacing effect on olfactory 

patterning discriminations in honeybees. Behavioural Brain Research, 176(2), 314–
322.  

Delaney, P.F., Verkoeijen, P., & Spirgel, A. (2010). Spacing and testing effects: A deeply 

critical, lengthy, and at times discursive review of the literature. In B. H. Ross (Ed.), 
Psychology of learning and motivation: Advances in research and theory (pp. 63–
147). San Diego: Elsevier Academic Press Inc. 

Dellarosa, D., & Bourne, L. E. (1985). Surface form and the spacing effect. Memory & 

Cognition, 13, 529-537. 

Dempster, F.N. (1988). The spacing effect: A case study in the failure to apply the results of 

psychological research. American Psychologist, 43, 627–634. 

Dempster, F. N. (1989). Spacing effects and their implications for theory and practice. 

Educational Psychology Review, 1, 309–330. 

Dempster, F. N. (1996). Distributing and Managing the Conditions of Encoding and Practice. 
In E.L. Bjork & R. A. Bjork (Eds.), Handbook of Perception and Cognition: Memory 
(Vol. 10 pp. 317–344). New York: Academic Press. 

Donaldson, W. (1971). Output effects in multi-trial free recall. Journal of Verbal Learning & 

Verbal Behavior, 10, 577–585. 

Donovan, J. J., & Radosevich, D. J. (1999). A meta-analytic review of the distribution of 

practice effect: Now you see it, now you don’t. Journal of Applied Psychology, 84, 
795–805. 

Ebbinghaus, H. (1964). Memory: A contribution to experimental psychology (H.A. Ruger & 

C.E. Bussenius, Trans.). New York, NY: Dover. (Original work published 1885) 

Elgort, I., & Warren, P. (2014). L2 Vocabulary learning from reading: Explicit and tacit 

lexical knowledge and the role of learner and item variables. Language Learning, 64, 
365–414. 

Estes, W. K. (1955). Statistical theory of distributional phenomena in learning. Psychological 

Review, 62, 369–377. 

Estes, W.K. (1960). Learning theory and the new ‘mental chemistry’. Psychological Review, 

67, 207–223. 

156 

 

 

Field, A. (2013). Discovering statistics using IBM SPSS statistics. Sage. 

Forster, K. I., & Forster, J. (2003). DMDX: A windows display program with millisecond 
accuracy. Behavioral Research Methods, Instruments & Computers, 35, 116–124. 

Gardiner, J.M., Craik, F.I.M., & Bleasdale, F.A. (1973). Retrieval difficulty and subsequent 

recall. Memory & Cognition, 1, 213–216. 

Gass, S. (1988). Integrating research areas: A framework for second language studies. Applied 

Linguistics, 9, 198–217. 

Gerbier, E., & Toppino, T. C. (2015). The effect of distributed practice: Neuroscience, 

cognition, and education. Trends in Neuroscience & Education, 4(3), 49–59. 

Glanzer, M. (1969). Distance between related words in free recall: Trace of the STS. Journal 

of Verbal Learning & Verbal Behavior, 8, 105–111. 

Glenberg, A. M. (1979). Component-levels theory of the effects of spacing of repetitions on 

recall and recognition. Memory & Cognition, 7, 95–112. 

Glenberg, A. M., & Smith, S. M. (1981). Spacing repetitions and solving problems are not the 

same. Journal of Verbal Learning & Verbal Behavior, 20(1), 110–119. 

Glover, J. A. (1989). The" testing" phenomenon: Not gone but nearly forgotten. Journal of 

Educational Psychology, 81(3), 392. 

Godfroid, A., Ahn, J., Choi, I., Ballard, L., Cui, Y., Johnston, S., ... & Yoon, H. J. (2018). 
Incidental vocabulary learning in a natural reading context: An eye-tracking study. 
Bilingualism: Language & Cognition, 21(3), 563-584. 

Godfroid, A., Boers, F., & Housen, A. (2013). An eye for words: Gauging the role of attention 

in incidental L2 vocabulary acquisition by means of eye tracking. Studies in Second 
Language Acquisition, 35, 483–517. 

Green, J. L., Weston, T., Wiseheart, M., & Rosenbaum, R. S. (2014). Long-term spacing 

effect benefits in developmental amnesia: Case experiments in rehabilitation. 
Neuropsychology, 28, 685–694. 

Greene, R. L. (1989). Spacing effects in memory: Evidence for a two-process account. 

Journal of Experimental Psychology: Learning, Memory, & Cognition, 15(3), 371–
377. 

Greene, R. L. (1990). Spacing effects on implicit memory tests. Journal of Experimental 

Psychology: Learning, Memory, & Cognition, 16, 1004–1011. 

Greeno, J. G. (1964). Paired-associate learning with massed and distributed repetitions of 

items. Journal of Experimental Psychology, 67(3), 286. 

157 

 

 

Hayes, A. F. (2006). A primer on multilevel modeling. Human Communication Research, 32, 

385–410. 

Hayes, A. F. (2018). Introduction to mediation, moderation, and conditional process analysis 

(2nd ed.). New York: Guilford Press. 

Hays, M. J., Kornell, N., & Bjork, R. A. (2013). When and why a failed test potentiates the 
effectiveness of subsequent study. Journal of Experimental Psychology: Learning, 
Memory, & Cognition, 39(1), 290. 

Hillary, F. G., Schultheis, M. T., Challis, B. H., Millis, S. R., Carnevale, G. J., Galshi, T., & 

DeLuca, J. (2003). Spacing of repetitions improves learning and memory after 
moderate and severe TBI. Journal of Clinical & Experimental Neuropsychology, 25, 
49–58. 

Hintzman, D. L. (1974). Theoretical implications of the spacing effect. In R. L. Solso (Ed.), 

Theories in cognitive psychology: The Loyola Symposium (pp. 77–99). Potomac, MD: 
Erlbaum. 

Hintzman, D. L. (1976). Repetition and memory. In G. H. Bower (Ed.), The psychology of 

learning and memory (pp. 47–91). New York: Academic Press. 

Hintzman, D. L. (2004). Judgment of frequency versus recognition confidence: Repetition and 

recursive reminding. Memory & Cognition, 32(2), 336–350. 

Hintzman, D. L. (2010). How does repetition affect memory? Evidence from judgments of 

recency. Memory & Cognition, 38(1), 102–115. 

Hintzman, D. L., Summers, J. J., & Block, R. A. (1975). Spacing judgments as an index of 

study-phase retrieval. Journal of Experimental Psychology: Human Learning & 
Memory, 1(1), 31. 

Hogan, R.M., & Kintsch,W. (1971). Differential effects of study and test trials on long-term 
recognition and recall. Journal of Verbal Learning & Verbal Behavior, 10, 562–567. 

IBM Corp. Released 2017. IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY: 

IBM Corp. 

Izawa, C. (1970). Optimal potentiating effects and forgetting-prevention effects of tests in 

paired-associate learning. Journal of Experimental Psychology, 83, 340–344. 

Izawa,C. (1971). The test trial potentiating model. Journal of Mathematical Psychology, 8, 

200–224. 

Izawa, C. (l985b). A test of the differences between anticipation and study-test methods of 

paired-associate learning. Journal of Experimental Psychology: Learning, Memory, & 
Cognition, 11, 165–184. 

158 

 

 

Jacoby, L. L. (1974). The role of mental contiguity in memory: Registration and retrieval 

effects. Journal of Memory & Language, 13(5), 483–496. 

Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a problem versus 

remembering a solution. Journal of Verbal Learning & Verbal Behavior, 17(6), 649–
667. 

Jacoby, L. L., Bjork, R. A., & Kelley, C. M. (1994). Illusions of comprehension and 
competence. In D. Druckman & R. A. Bjork (Eds.), Learning, remembering, 
believing: Enhancing team and individual performance (pp. 57–80). Washington, DC: 
National Academy Press. 

Jacoby, L. L., & Wahlheim, C. N. (2013). On the importance of looking back: The role of 

recursive remindings in recency judgments and cued recall. Memory &Cognition, 41, 
625–637. 

Johnson, N. F. (1964). The functional relationship between amount learned and frequency vs. 

rate vs. total time of exposure of verbal materials. Journal of Verbal Learning & 
Verbal Behavior, 3(6), 502–504. 

Johnston, W. A., & Uhl, C. N. (1976). The contributions of encoding effort and variability to 

the spacing effect on free recall. Journal of Experimental Psychology: Human 
Learning & Memory, 2, 153–160. 

Joseph, H., Wonnacott, E., Forbes, P., & Nation, K. (2014). Becoming a written word: Eye- 
movements reveal order of acquisition effects following incidental exposure to new 
words during silent reading. Cognition, 133, 238–248.  

Jost, A. (1897). Die Assoziationsfestigkeit in ihrer Abhӓngigkeit von der Verteilung der 

Wiederholungen [The strength of associations in their dependence on the distribution 
of repetitions]. Zeitschrift für Psychologie und Physiologie der Sinnesorgane, 14, 
436–472. 

Kahana, M. J., & Howard, M. W. (2005). Spacing and lag effects in free recall of pure lists. 

Psychonomic Bulletin & Review, 12(1), 159–164. 

Kang, S. (2016). Spaced repetition promotes efficient and effective learning: Policy 

implications for instruction. Policy Insights from the Behavioral & Brain Sciences, 
3(1), 12–19. 

Kang, S., Lindsey, R. V., Mozer, M. C., & Pashler, H. (2014). Retrieval practice over the long 

term: Should spacing be expanding or equal-interval? Psychonomic Bulletin & 
Review, 21, 1544–1550. 

Karpicke, J. D., & Roediger, H. L., III. (2007). Expanding retrieval practice promotes short-
term retention, but equally spaced retrieval promotes long-term retention. Journal of 
Experimental Psychology Learning Memory & Cognition, 33, 704–719. 

159 

 

 

Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. 

Science, 319, 966–968. 

Kasprowicz, R. E., Marsden, E., & Sephton, N. (2019). Investigating distribution of practice 

effects for the learning of foreign language verb morphology in the young learner 
classroom. The Modern Language Journal, 103(3), 580–606. 

Kiliç, A., Hoyer, W. J., & Howard, M. W. (2013). Effects of spacing of item repetitions in 

continuous recognition memory: Does item retrieval difficulty promote item retention 
in older adults?. Experimental Aging Research, 39(3), 322–341. 

Kintsch, W., & van Dijk, T.A. (1978). Toward a model of text comprehension. Psychological 

Review, 85, 363–394.  

Kolers, P.A., & Roediger, H.L., III. (1984). Procedures of mind. Journal of Verbal Learning 

& Verbal Behavior, 23, 425–449. 

Kornell, N., & Bjork, R. A. (2007). The promise and perils of self-regulated study. 

Psychonomic Bulletin & Review, 14(2), 219–224. 

Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories: Is spacing the “enemy 

of induction”? Psychological Science, 19, 585–592. 

Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval attempts enhance 
subsequent learning. Journal of Experimental Psychology: Learning, Memory, & 
Cognition, 35(4), 989. 

Koval, N.G. (2019). Testing the deficient processing account of the spacing effect in L2 
vocabulary learning: Evidence from eye-tracking. Applied Psycholinguistics, 40, 
1103–1139. 

Kuo, T., & Hirshman, E. (1996). Investigations of the testing effect. American Journal of 

Psychology, 109, 451–464. 

Küpper-Tetzel, C. E., & Erdfelder, E. (2012). Encoding, maintenance, and retrieval processes 

in the lag effect: A multinomial processing tree analysis. Memory, 20, 37–47. 

Landauer, T. K. (1969). Reinforcement as consolidation. Psychological Review 76(1), 82–96. 

Landauer, T.K., & Bjork, R.A. (1978). Optimum rehearsal patterns and name learning. In 

M.M. Gruneberg, P.E. Morris, & R.N. Sykes (Eds.), Practical aspects of memory (pp. 
625–632). London: Academic Press. 

Laufer, B., & Hulstijn, J. H. (2001). Incidental vocabulary acquisition in a second language: 

The construct of task-induced involvement. Applied Linguistics, 22, 1–26. 

160 

 

 

Linderholm, T., Dobson, J., & Yarbrough, M. B. (2016). The benefit of self-testing and 

interleaving for synthesizing concepts across multiple physiology texts. Advances in 
Physiology Education, 40(3), 329–334. 

Logan, J. M., & Balota, D. A. (2008). Expanded versus equal interval spaced retrieval 

practice: Exploring different schedules of spacing and retention interval in younger 
and older adults. Aging, Neuropsychology, & Cognition, 15, 257–280. 

Maddox, G. B. (2016). Understanding the underlying mechanism of the spacing effect in 

verbal learning: A case for encoding variability and study-phase retrieval. Journal of 
Cognitive Psychology, 28(6), 684–706. 

Maddox, G. B., & Balota, D. A. (2015). Retrieval practice and spacing effects in young and 

older adults: An examination of the benefits of desirable difficulty. Memory & 
Cognition, 43(5), 760–774. 

Maddox, G. B., Balota, D. A., Coane, J. H., & Duchek, J. M. (2011). The role of forgetting 

rate in producing a benefit of expanded over equal spaced retrieval in young and older 
adults. Psychology & Aging, 26(3), 661. 

Maddox, G. B., Pyc, M. A., Kauffman, Z. S., Gatewood, J. D., & Schonhoff, A. M. (2018). 

Examining the contributions of desirable difficulty and reminding to the spacing 
effect. Memory & Cognition, 46(8), 1376–1388. 

Madigan, S. A. (1969). Intraserial repetition and coding processes in free recall. Journal of 

Verbal Learning & Verbal Behavior, 8(6), 828–835. 

Mammarella, N., Avons, S. E., & Russo, R. (2004). A short-term perceptual priming account 

of spacing effects in explicit cued-memory tasks for unfamiliar stimuli. European 
Journal of Cognitive Psychology, 16(3), 387–402. 

Marks, L. E., & Miller, G. A. (1964). The role of semantic and syntactic constraints in the 

memorization of English sentences. Journal of Verbal Learning & Verbal Behavior, 
3(1), 1–5. 

McCabe, D. P., Roediger III, H. L., McDaniel, M. A., Balota, D. A., & Hambrick, D. Z. 

(2010). The relationship between working memory capacity and executive 
functioning: Evidence for a common executive attention construct. Neuropsychology, 
24(2), 222. 

McDaniel, M. A., & Fisher, R. P. (1991). Tests and test feedback as learning sources. 

Contemporary Educational Psychology, 16, 192–201. 

McDaniel, M.A., Friedman, A., & Bourne, L.E. (1978). Remembering the levels of 

information in words. Memory & Cognition, 6, 156–164. 

161 

 

 

McDaniel, M.A., & Masson, M.E.J. (1985). Altering memory representations through 

retrieval. Journal of Experimental Psychology: Learning, Memory, & Cognition, 11, 
371–385. 

McKinley, G. L., Ross, B. H., & Benjamin, A. S. (2019). The role of retrieval during study: 

Evidence of reminding from self-paced study time. Memory & Cognition, 47(5), 877–
892. 

Medin, D. L., & Schaffer, M. M. (1978). A context theory of classification learning. 

Psychological Review, 85, 207–238. 

Miles, S. W. (2014). Spaced vs. massed distribution instruction for L2 grammar learning. 

System, 42, 412–428. 

Miles, S., & Kwon, C. J. (2008). Benefits of using CALL vocabulary programs to provide 

systematic word recycling. English Teaching, 63(1), 199–216. 

Modigliani, V. (1976). Effects on a later recall by delaying initial recall. Journal of 

Experimental Psychology: Human Learning & Memory, 2, 609–622. 

Mohamed, A. A. (2018). Exposure frequency in L2 reading: An eye-movement perspective of 

incidental vocabulary learning. Studies in Second Language Acquisition, 40(2), 269–
293. 

Melton, A.W. (1970). The situation with respect to the spacing of repetitions and memory. 

Journal of Verbal Learning & Verbal Behavior, 9, 596–606. 

Morris, C.D., Bransford, J.D., & Franks, J.J. (1977). Levels of processing versus transfer-

appropriate processing. Journal of Verbal Learning & Verbal Behavior, 16, 519–533. 

Murdock, B. B. (1960). The immediate retention of unrelated words. Journal of Experimental 

Psychology, 60, 222–234. 

Myers, G. C. (1914). Recall in relation to retention. Journal of Educational Psychology, 5, 

119–130. 

Nakata, T. (2011). Computer-assisted second language vocabulary learning in a paired-

associate paradigm: A critical investigation of flashcard software. Computer Assisted 
Language Learning, 24(1), 17–38. 

Nakata, T. (2015). Effects of expanding and equal spacing on second language vocabulary 
learning does gradually increasing spacing increase vocabulary learning? Studies in 
Second Language Acquisition, 37(4), 677–711. 

Nakata, T., & Suzuki, Y. (2019). Effects of massing and spacing on the learning of 

semantically related and unrelated words. Studies in Second Language Acquisition, 
41(2), 287–311. 

162 

 

 

Nakata, T., & Webb, S. (2016). Does studying vocabulary in smaller sets increase learning?: 

The effects of part and whole learning on second language vocabulary acquisition. 
Studies in Second Language Acquisition, 38(3), 523–552. 

Nelson, T. O., & Dunlosky, J. (1994). Norms of paired-associate recall during multi-trial 

learning of Swahili-English translation equivalents. Memory, 2(3), 325–335.  

Nelson, T. O., Leonesio, J., Shimamura, A. P., Landwehr, R. F., & Narens, L. (1982). 

Overlearning and the feeling of knowing. Journal of Experimental Psychology: 
Learning, Memory, & Cognition, 8(4), 279. 

Oberauer, K. (2005). Control of the contents of working memory – a comparison of two 
paradigms and two age groups. Journal of Experimental Psychology: Learning, 
Memory, & Cognition, 31, 714–728. 

Pashler, H., Cepeda, N.J., Wixted, J., & Rohrer, D. (2005). When does feedback facilitate 

learning of words? Journal of Experimental Psychology: Learning, Memory, & 
Cognition, 31, 3–8. 

Pashler, H., Zarow, G., & Triplett, B. (2003). Is temporal spacing of tests helpful even when it 

inflates error rates?. Journal of Experimental Psychology: Learning, Memory, & 
Cognition, 29(6), 1051. 

Pavlik Jr, P. I., & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary 

memory: An activation‐based model of the spacing effect. Cognitive Science, 29(4), 
559–586. 

Pellicer-Sánchez, A. (2016). Incidental L2 vocabulary acquisition from and while reading: An 

eye-tracking study. Studies in Second Language Acquisition, 38(1), 97−130. 

Peterson, L. R., Wampler, R., Kirkpatrick, M., & Saltzman, D. (1963). Effect of spacing 

presentations on retention of a paired associate over short intervals. Journal of 
Experimental Psychology, 66(2), 206. 

Pyc, M. A., & Rawson, K. A. (2009). Testing the retrieval effort hypothesis: Does greater 

difficulty correctly recalling information lead to higher levels of memory?. Journal of 
Memory & Language, 60(4), 437–447. 

Raaijmakers, J. G. (2003). Spacing and repetition effects in human memory: Application of 

the SAM model. Cognitive Science, 27(3), 431–452. 

Rawson, K. A., & Kintsch, W. (2005). Rereading effects depend on time of test. Journal of 

Educational Psychology, 97(1), 70. 

Rayner, K., & Duffy, S. A. (1986). Lexical complexity and fixation times in reading: Effects 
of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition, 14, 
191–201. 

163 

 

 

Rayner, K., Raney, G. E., & Pollatsek, A. (1995). Eye movements and discourse processing. 

In R. F. Lorch & E. J. O’Brien (Eds.), Sources of coherence in reading (pp. 9–36). 
Hillsdale, NJ: Lawrence Erlbaum Associates. 

Richardson-Klavehn, A., & Bjork, R. A. (1988). Measures of memory. Annual Review of 

Psychology, 39, 475–543. 

Robbins, D., & Bray, J. F. (1974). Repetition effects and retroactive facilitation: Immediate 

and delayed test performance. Bulletin of the Psychonomic Society, 3, 347–349. 

Robinson, P. (2003). Attention and memory during SLA. In C. Doughty, & M. H. Long 

(Eds.), The handbook of second language acquisition (pp. 631–678). Oxford: 
Blackwell. 

Roediger III, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term 

retention. Trends in Cognitive Sciences, 15(1), 20–27. 

Roediger III, H. L., & Karpicke, J. D. (2006a). The power of testing memory: Basic research 

and implications for educational practice. Perspectives on Psychological Science, 1(3), 
181–210. 

Roediger, H.L., III, & Karpicke, J.D. (2006b). Test enhanced learning: Taking memory tests 

improves long-term retention. Psychological Science, 17, 249–255. 

Rogers, J. (2015). Learning second language syntax under massed and distributed conditions. 

TESOL Quarterly, 49, 857–866. 

Rogers, J. (2017). The spacing effect and its relevance to second language acquisition. 

Applied Linguistics, 38, 906–911. 

Rohrer, D. (2015). Student instruction should be distributed over long time periods. 

Educational Psychology Review, 27(4), 635–643. 

Rohrer, D., & Pashler, H. (2007). Increasing retention without increasing study time. Current 

Directions in Psychological Science, 16(4), 183–186. 

Rohrer, D., Taylor, K., Pashler, H., Wixted, J. T., & Cepeda, N. J. (2005). The effect of 

overlearning on long‐term retention. Applied Cognitive Psychology, 19(3), 361–374. 

Rose, R. J. (1984). Processing time for repetitions and the spacing effect. Canadian Journal of 

Psychology/Revue Canadienne De Psychologie, 38(4), 537–550. 

Rose, R. J., & Rowe, E. J. (1976). Effects of orienting task and spacing of repetitions on 
frequency judgments. Journal of Experimental Psychology: Human Learning & 
Memory, 2(2), 142. 

Ross, B. H. (1984). Remindings and their effects in learning a cognitive skill. Cognitive 

Psychology, 16, 371–416. 

164 

 

 

Ross, B. H., & Bradshaw, G. L. (1994). Encoding effects of remindings. Memory & 

Cognition, 22, 591–605. 

Ross, B. H., & Landauer, T. K. (1978). Memory for at least one of two items: Test and failure 
of several theories of spacing effects. Journal of Verbal Learning & Verbal Behavior, 
17(6), 669–680. 

Ross, B. H., Perkins, S. J., & Tenpenny, P. L. (1990). Reminding-based category learning. 

Cognitive Psychology, 22, 460–492. 

Rowland, C. A. (2014). The effect of testing versus restudy on retention: A meta-analytic 

review of the testing effect. Psychological Bulletin, 140(6), 1432. 

Rundus, D. (1971). Analysis of rehearsal processes in free recall. Journal of Experimental 

Psychology, 89, 63–77. 

Rundus, D., & Atkinson, R. C. (1970). Rehearsal processes in free recall: A procedure for 
direct observation. Journal of Verbal Learning & Verbal Behavior, 9(1), 99–105. 

Runquist, W.N. (1986). Changes in the rate of forgetting produced by recall tests. Canadian 

Journal of Psychology, 40, 282–289. 

Russo, R., & Mammarella, N. (2002). Spacing effects in recognition memory: When meaning 

matters. European Journal of Cognitive Psychology, 14(1), 49–59. 

Russo, R., Parkin, A. J., Taylor, S. R., & Wilks, J. (1998). Revising current two-process 

accounts of spacing effects in memory. Journal of Experimental Psychology: 
Learning, Memory, & Cognition, 24, 161–172. 

Schmidt, R. (1990). The role of consciousness in second language learning. Applied 

Linguistics, 11,  29–158. 

Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language 

instruction (pp. 3–32). New York: Cambridge University Press. 

Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common 
principles in three paradigms suggest new concepts for training. Psychological 
Science, 3(4), 207–218. 

Schmitt, N. (2008). Review article: Instructed second language vocabulary learning. 

Language Teaching Research, 12(3), 329−363. 

Schulman, A. I. (1974). Memory for words recently classified. Memory & Cognition, 2(1), 

47–52. 

Serrano, R. (2011). The time factor in EFL classroom practice. Language Learning, 61, 117–

145. 

165 

 

 

Serrano, R., & Huang, H. Y. (2018). Learning vocabulary through assisted repeated reading: 

How much time should there be between repetitions of the same text? TESOL 
Quarterly, 52(4), 971–994. 

Serrano, R., & Muñoz, C. (2007). Same hours, different time distribution: Any difference in 

EFL? System, 35, 305–321. 

Shaughnessy, J. J., Zimmerman, J., & Underwood, B. J. (1972). Further evidence on the MP-

DP effect in free-recall learning. Journal of Verbal Learning & Verbal Behavior, 
11(1), 1–12. 

Siegel, L. L., & Kahana, M. J. (2014). A retrieved context account of spacing and repetition 

effects in free recall. Journal of Experimental Psychology: Learning, Memory, & 
Cognition, 40, 755–764. 

Soderstrom, N. C., Kerr, T. K., & Bjork, R. A. (2016). The critical importance of retrieval—

and spacing—for learning. Psychological Science, 27(2), 223–230. 

Spitzer, H.F. (1939). Studies in retention. Journal of Educational Psychology, 30, 641–656. 

Storm, B. C., Bjork, R. A., & Storm, J. C. (2010). Optimizing retrieval as a learning event: 

When and why expanding retrieval practice enhances long-term retention. Memory & 
Cognition, 38(2), 244–253. 

Suzuki, Y. (2017). The optimal distribution of practice for the acquisition of L2 morphology: 

A conceptual replication and extension. Language Learning, 67(3), 512–545. 

Suzuki, Y., & DeKeyser, R. (2017). Effects of distributed practice on the proceduralization of 

morphology. Language Teaching Research, 21(2), 166–188. 

Suzuki, Y., & Sunada, M. (2020). Dynamic interplay between practice type and practice 

schedule in a second language: The potential and limits of skill transfer and practice 
schedule. Studies in Second Language Acquisition, 42(1), 169–197. 

Thios, S. J., & D'Agostino, P. R. (1976). Effects of repetition as a function of study-phase 

retrieval. Journal of Verbal Learning & Verbal Behavior, 15(5), 529–536. 

Thompson, C.P., Wenger, S.K., & Bartling, C.A. (1978). How recall facilitates subsequent 

recall: A reappraisal. Journal of Experimental Psychology: Human Learning & 
Memory, 4, 210–221. 

Toppino T. C., & Bloom, L. C. (2002). The spacing effect, free recall, and two-process 

theory: A closer look. Journal of Experimental Psychology: Learning, Memory, & 
Cognition, 28, 437–444. 

Toppino, T. C., & Gracen, T. F. (1985). The lag effect and differential organization theory: 

Nine failures to replicate. Journal of Experimental Psychology: Learning, Memory, & 
Cognition, 11(1), 185. 

166 

 

 

Tullis, J. G., Benjamin, A. S., & Ross, B. H. (2014). The reminding effect: Presentation of 

associates enhances memory for related words in a list. Journal of Experimental 
Psychology: General, 143(4), 1526–1540. 

Tullis, J. G., Braverman, M., Ross, B. H., & Benjamin, A. S. (2014). Remindings influence 

the interpretation of ambiguous stimuli. Psychonomic Bulletin & Review, 21, 107–113. 

Tulving, E., & Thomson, D. M. (1971). Retrieval processes in recognition memory: Effects of 

associative context. Journal of Experimental Psychology, 87(1), 116. 

Van Strien, J. W., Verkoeijen, P. P. J. L., Van der Meer, N., & Franken, I. H. A. (2007). 

Electrophysiological correlates of word repetition spacing: ERP and induced band 
power old/new effects with massed and spaced repetitions. International Journal of 
Psychophysiology, 66(3), 205–214. 

Verkoeijen, P., & Bouwmeester, S. (2008). Using latent class modeling to detect bimodality 

in spacing effect data. Journal of Memory & Language, 59, 545–555. 

Verkoeijen, P., Rikers, R., & Schmidt, H. (2004). Detrimental influence of contextual change 

on spacing effects in free recall. Journal of Experimental Psychology: Learning, 
Memory, & Cognition, 30(4), 796–800. 

Verkoeijen, P., Rikers, R., & Schmidt, H. (2005). Limitations to the spacing effect: 

Demonstration of an inverted u-shaped relationship between inter-repetition spacing 
and free recall. Experimental Psychology, 52(4), 257–263. 

Wahlheim, C. N., Maddox, G. B., & Jacoby, L. L. (2014). The role of reminding in the effects 

of spaced repetitions on cued recall: Sufficient but not necessary. Journal of 
Experimental Psychology: Learning, Memory, & Cognition, 40(1), 94. 

Watkins, M. J., & Kerkar, S. P. (1985). Recall of a twice-presented item without recall of 
either presentation: Generic memory for events. Journal of Memory & Language, 
24(6), 666–678. 

Waugh, N. C. (1963). Immediate memory as a function of repetition. Journal of Memory & 

Language, 2(1), 107. 

Waugh, N. C., & Norman, D. A. (1965). Primary memory. Psychological Review, 72(2), 89. 

Wenger, S. K., Thompson, C. P., & Bartling, C. A. (1980). Recall facilitates subsequent 

recognition. Journal of Experimental Psychology: Human Learning & Memory, 6, 
590–598. 

Wheeler, M.A., & Roediger, H.L., III. (1992). Disparate effects of repeated testing: 

Reconciling Ballard’s (1913) and Bartlett’s (1932) results. Psychological Science, 3, 
240–245. 

167 

 

 

Wheeler, M.A., Ewers, M., & Buonanno, J.F. (2003). Different rates of forgetting following 

study versus test trials. Memory, 11, 571– 580. 

Whitten, W.B., & Bjork, R.A. (1977). Learning from tests: Effects of spacing. Journal of 

Verbal Learning & Verbal Behavior, 16, 465– 478. 

White, J., & Turner, C. (2005). Comparing children's oral ability in two ESL programs. 

Canadian Modern Language Review, 61(4), 491–517. 

Xue, G., Mei, L., Chen, C., Lu, Z., Poldrack, R., & Dong, Q. (2011). Spaced learning 

enhances subsequent recognition memory by reducing neural repetition suppression. 
Journal of Cognitive Neuroscience, 23(7), 1624–1633. 

Yin, J. C. P., Del Vecchio, M., Zhou, H., & Tully, T. (1995). CREB as a memory modulator: 

Induced expression of a dCREB2 activator isoform enhances long-term memory in 
drosophila. Cell, 81(1), 107–115. 

Yonelinas, A. P., & Jacoby, L. L. (2012). The process-dissociation approach two decades 

later: Convergence, boundary conditions, and new directions. Memory & Cognition, 
40(5), 663–680. 

Young, J. L. (1971). Reinforcement-test intervals in paired-associate learning. Journal of 

Mathematical Psychology, 8(1), 58–81. 

Zechmeister, E. B., & Shaughnessy, J. J. (1980). When you know that you know and when 
you think that you know but you don’t. Bulletin of the Psychonomic Society, 15(1), 
41–44. 

Zeelenberg, R., de Jonge, M., Tabbers, H. K., & Pecher, D. (2015). The effect of presentation 
rate on foreign-language vocabulary learning. The Quarterly Journal of Experimental 
Psychology, 68(6), 1101–1115. 

Zimmerman, J. (1975). Free recall after self-paced study: A test of the attention explanation of 

the spacing effect. American Journal of Psychology, 88, 277–291. 

 
 

168