'..‘, L. Q L1": :1? - 2‘ I ~ 6“ * 1 o ‘ . ”Kazan” ' 524! ‘ t 33.! w: 1097 This is to certify that the dissertation entitled ACQUISITON OF GEMINATE CONSONANTS IN JAPANESE BY AMERICAN ENGLISH SPEAKERS presented by MIKI MOTOHASHI has been accepted towards fulfillment of the requirements for the Linguistics and Germanic, PhD. degree in _ Asian and African Languages l9j00//0c Date MSU is an Affirmative Action/Equal Opportunity Institution LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/07 p:/ClRC/DateDue.indd-p.1 ACQUISITION OF GENflNATE CONSONANT S IN JAPANESE BY AMERICAN ENGLISH SPEAKERS By Miki Motohashi A DISSERTATION Submitted to Michigan State University in partial fiilfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Linguistics and Germanic, Asian and Afiican Languages 2007 ABSTRACT ACQUISITION OF GENflNATE CONSONANTS IN JAPANESE BY AMERICAN ENGLISH SPEAKERS By Miki Motohashi It has been pointed out that English-speaking learners of Japanese often exhibit timing problems in the perception and production of geminate consonants since durational contrast is a novel phoneme for English speakers. The present study has reported on data from the perception and production of geminate consonants in Japanese by American learners. Based on these data, an effective way to train learners to identify geminate consonants was developed and tested. Four experiments were conducted. Two experiments collected perception and production data of geminate consonants by American learners of Japanese to investigate the way that the learners perceived and produced geminate consonants and examine whether there were any particular phonetic contexts and identities of geminate consonants which were particularly more difficult for learners. The conditions considered were types of consonants; /s/, /t/ and /k/; preconsonantal segments; /sa/ and /a/, postconsonantal segments; /u/ and /a/, and comparison between words in isolation and carrier sentences. The results showed that the learners’ performances were affected by the phonetic contexts and identities of geminate consonants. Specially, a combination of fricative geminate consonant /s/ and low sonon'ty vowel /u/ was found the most difficult to perceive, while there was no such tendency for production. The other two experiments considered a method of training to improve perception of such difficult geminate consonants. In addition, another issue to consider is the modality of training. The training program was developed to investigate whether audio-visual (AV) training would be more beneficial than auditory-only (A—only) training to improve the learners’ perception of geminate consonants. The previous training studies generally used auditory-modality cues; however, Hardison’s studies (e. g., 2003, 2004) reported that L2 learners benefit from visual cues as well as auditory cues in perception training. The present study used visual displays of waveforms of geminate consonants as aids for learners to identify difference of mora weight between singleton and geminate consonants. The result indicated that AV—training showed its superiority in producing perception improvement over the A-only training. Further, the AV-training group data showed a transfer effect of perception training to their improvement of production. This result suggests that there is a close link between perception and production development processes. Further, the present study emphasizes the importance of collecting data from learners’ performances and aims to develop an effective training program. The stimuli used for the training were selected based on the findings from the data collected. The effectiveness of high-variability stimulus demonstrated in the present study is compatible with previous studies (e. g., Pisoni et al., 1999), which supported a multiple-trace memory theory in which each event or input is encoded in memory as a trace, rather than prototype. Through the training, all attended perceptual details were stored in memory and modify an attention weighting scheme to perceive distinctive features in L2. It is assumed that the bimodal training used in the present study would facilitate this process. Copyright by MIKI MOTOHASIrH 2007 ACKNOWLEDGIVIENTS My deepest appreciation goes to my dissertation committee members. Since I Iefi the United States three years ago to pursue my teaching career in Japan, communication became difficult. I am grateful that the committee members always—and very patiently—answered my questions promptly by e-mail. I would like to thank my co-chairs, Dr. Dennis Preston and Dr. Susan Gass. Dr. Preston always encouraged me to go on. His academic advice (as well as his sense of humor) was always helpful. Dr. Gass always gave me helpfiil feedback and patiently read my dissertation. The influence of other committee members is also appreciated. I thank Dr. Barbara Abbott for being generous with her time whenever I had questions. Dr. Mutsuko Endo Hudson has been supportive by advising me to complete this dissertation, as well as giving me precious opportunities to expand my teaching career. I thank Dr. Shawn Roewen for his willingness to read my work, ask important questions, and provide feedback and helpful suggestions. Last but not least, I am gratefiil to Dr. Debra Hardison for her patience—and a tremendous amount of feedback. The topic of this dissertation originated from a project for her class. She helped me with the entire process of revising the dissertation by giving me very specific comments on many points—not only about SLA, but also about phonology, statistics, thesis format, etc. in a very timely manner. I want to thank all of my friends with whom I studied at the University of Wisconsin-Madison, and Michigan State University. I am especially gratefiil to Emiko Magnani for her patience in reading my dissertation and giving me very helpful feedback. I also extend many thanks to my students and colleagues at Kansai Gaidai University in Japan for their support for collecting data. I would especially like to thank Dr. Jeffrey Rasch for patiently proofreading my dissertation and giving me many suggestions and comments as a linguist, which contributed to my preparation for the defense. I thank Hideki Saigo, as my colleague, best friend and husband, for sharing the difficult time during the last years of completing the dissertation, giving me advice about finishing a dissertation based on his own experience, and doing all the household chores (especially cooking). I appreciate his patience and support. Finally, I want to thank my parents and grandmother for always helping and encouraging me to continue my study in the US. This work is dedicated to them. vi TABLE OF CONTENTS LIST OF TABLES __________________________________________________________________________________________________________________ x LIST OF FIGURES _______________________________________________________________________________________________________________ xi CHAPTER 1 INTRODUCTION _________________________________________________________________________________________________________________ 1 CHAPTER 2 BACKGROUND ___________________________________________________________________________________________________________________ 7 2.1 Previous studies of geminate consonants __________________________________________________________________ 7 2.1.1 Mora in Japanese _______________________________________________________________________________________________ 7 2.1.2 Research on native speakers’ production of geminate consonants ________________ 9 2.1.3 Research on native speakers’ perception of geminate consonants _______________ 10 2.1.4 Research on nonnative speakers’ production of geminate consonants ________ 13 2.1.5 Research on nonnative speakers’ perception of geminate consonants _________ 13 2.2 Development of speech perception and production _______________________________________________ 17 2.3 Training studies ________________________________________________________________________________________________________ 2 2 2.4 Making visual information available to L2 learners ______________________________________________ 36 2.4.1 Electronic Visual Feedback (EVF) _________________________________________________________________ 36 2.4.2 Effects of instruction on production of segments and suprasegmental features _____________________________________________________________________________________________________________ 37 2.4.3 Experimental studies of EVF __________________________________________________________________________ 40 2.5 Research questions and hypotheses ________________________________________________________________________ 42 CHAPTER 3 Experiment I ________________________________________________________________________________________________________________________ 44 3.1 Overview of the experiments _______________________________________________________________________________ 44 3.2 Objectives of Experiment I ______________________________________________________________________________________ 4 4 3.3 Method ______________________________________________________________________________________________________________________ 50 3.3.1 Participants ______________________________________________________________________________________________________ 50 3.3.2 Materials __________________________________________________________________________________________________________ 51 3.3.3 Procedure _________________________________________________________________________________________________________ 53 3.4 Results _______________________________________________________________________________________________________________________ 54 3.5 Discussion _________________________________________________________________________________________________________________ 61 CHAPTER 4 Experiment H _______________________________________________________________________________________________________________________ 69 4.1 Objectives of Experiment 11 _____________________________________________________________________________________ 69 4.2 Experimental design _________________________________________________________________________________________________ 70 4.3 Method ____________________________________________________________________________________________________________________ 70 4.3.1 Participants ______________________________________________________________________________________________________ 70 4.3.2 Materials 71 vii 4.3.2.1 Pretest and posttest __________________________________________________________________________________ 71 4.3.2.2 Instruction by electronic visual input _____________________________________________________ 71 4.3.3 Procedure _________________________________________________________________________________________________________ 72 4.4 Results _______________________________________________________________________________________________________________________ 72 4.5 Discussion ________________________________________________________________________________________________________________ 74 CHAPTER 5 Experiment HI ______________________________________________________________________________________________________________________ 76 5.1 Objectives of Experiment HI ,,,,,,,,,,, '_ _______________________________________________________________________ 76 5.2 Method ______________________________________________________________________________________________________________________ 77 5.2.1 Participants ______________________________________________________________________________________________________ 77 5.2.2 Materials __________________________________________________________________________________________________________ 78 5.2.3 Recording procedure _______________________________________________________________________________________ 79 5.2.4 Judgment of production __________________________________________________________________________________ 79 5.3 Results _______________________________________________________________________________________________________________________ 80 5.3.1 Results by phonetic conditions _______________________________________________________________________ 80 5.3.2 Error patterns ___________________________________________________________________________________________________ 85 5.4 Discussion _________________________________________________________________________________________________________________ 86 CHAPTER 6 Experiment IV ______________________________________________________________________________________________________________________ 90 6.1 Objectives of Experiment IV ___________________________________________________________________________________ 90 6.2 Method ______________________________________________________________________________________________________________________ 92 6.2.1 Participants ______________________________________________________________________________________________________ 92 6.2.2 Pretest _______________________________________________________________________________________________________________ 92 6.2.2.1 Production test _________________________________________________________________________________________ 92 6.2.2.2 Perception test _________________________________________________________________________________________ 92 6.2.2.2.] Materials _________________________________________________________________________________________ 93 6.2.2.2.2 Procedure ________________________________________________________________________________________ 94 6.2.3 Perception Training _________________________________________________________________________________________ 94 6.2.3.1 Training materials ___________________________________________________________________________________ 94 6.2.3.2 Training procedure __________________________________________________________________________________ 98 6.2.4 Posttest ____________________________________________________________________________________________________________ 100 6.2.4.1 Perception test and production test ______________________________________________________ 100 6.2.4.2 Generalization test ________________________________________________________________________________ 100 6.2.4.3 Follow-up interview ______________________________________________________________________________ 101. 6.3 Results _____________________________________________________________________________________________________________________ 101 6.3.1 Comparison of training types ________________________________________________________________________ 102 6.3.2 Results of generalization test ________________________________________________________________________ 103 6.3.3 Results of the AV-group _________________________________________________________________________________ 104 6.3.3.1 Word level vs. sentence level _______________________________________________________________ 104 6.3.3.2 Word-level perception ___________________________________________________________________________ 106 6.3.3.3 Sentence-level perception _____________________________________________________________________ 108 6.3.4 Error pattern (AV group) ________________________________________________________________________________ 111 6.3.5 Production test ________________________________________________________________________________________________ 112 6.3.5.1 Judgment of production ________________________________________________________________________ 113 viii 6.3.5.2 Results 6.3.6 Individual development for the AV training group ______________________________________ 6.3.7 Follow-up interview responses ______________________________________________________________________ 6.4 Discussion CHAPTER 7 .................................................................................................... CONCLUSIONS 7.1 Summary of findings ______________________________________________________________________________________________ 7.2 General discussion -------------------------------------------------------------------------------------------------- 7.3 Limitations of the study and fiirther study suggestions ....................................... APPENDICES APPENDIX A: Perception test _____________________________________________________________________________________ APPENDIX B: Production test REFERENCES ................................................................................................................... ix 113 116 119 121 124 124 125 135 140 140 146 150 LIST OF TABLES Table 2.1 Japanese special morae “tokushu-haku” ________________________________________________________________ 7 Table 2.2 Stop closure and frication duration of single/geminate consonant pairs ___________ 16 Table 3.1 Examples of test items by phonetic structures ____________________________________________________ 53 Table 5.1 Examples of the test items ____________________________________________________________________________________ 79 Table 6.1 Examples of the test items ____________________________________________________________________________________ 94 Table 6.2 Examples of the geminate consonants _______________________________________________________________ 101 Table 6.3 Mean percentage of perception accuracy in pretest and posttest and increase rate of AV—group _________________________________________________________________________________________________ 117 Table 6.4 Mean percentage of production accuracy in pretest and posttest and increase rate of AV-group _________________________________________________________________________________________________ 118 LIST OF FIGURES Figure 2.1 Duration of words in Japanese and English _______________________________________________________ 10 Figure 3.1 Sonority hierarchy _______________________________________________________________________________________________ 46 Figure 3.2 Sonority index ______________________________________________________________________________________________________ 46 Figure 3.3 CVC word ____________________________________________________________________________________________________________ 43 Figure 3.4 CVCCV word _____________________________________________________________________________________________________ 48 Figure 3.5 Mean percent correct identification by level of proficiency _____________________________ 55 Figure 3.6 Mean percent correct identification by item condition _____________________________________ 56 Figure 3.7 Mean percent correct identification at sentence-level by item condition 58 Figure 3.8 201 students’ perception of /CC+u/ at word-level _____________________________________________ 59 Figure 3.9 201 students’ perception of /CC+u/ at sentence-level _______________________________________ 60 Figure 3.10 Waveform of lakku/ ___________________________________________________________________________________________ 63 Figure 3.11 Waveform of /assu/ ____________________________________________________________________________________________ 64 Figure 3.12 (a) The syllabification pattern of a geminate consonant when it is perceived correctly; (b) The patter when a geminate is misperceived as a singleton _____ 66 Figure 3.13 Syllabification pattern of a geminate consonant when the word is misperceived as containing a long vowel ________________________________________________________ 67 Figure 4.1 Mean percent correct identification of /ss/ geminate consonants _____________________ 73 Figure 5.1 Comparison of judges’ mean scores between word level and sentence level for single and geminate production ________________________________________________________________________ 81 Figure 5.2 Mean production scores by consonants _____________________________________________________________ 82 Figure 5.3 Mean production scores by vowels following geminate consonants _______________ 83 Figure 5.4 Mean production scores by preceding segment types _______________________________________ 84 Figure 5.5 Ratio of errors: production of geminate consonants /CC+u/ ____________________________ 86 xi Figure 6.1 Waveform display ________________________________________________________________________________________________ 96 Figure 6.2 Exercise questions display __________________________________________________________________________________ 97 Figure 6.3 Feedback display _________________________________________________________________________________________________ 98 Figure 6.4 Mean percent correct identification for perception pretest and posttest ________ 103 Figure 6.5 Mean percent correct identification for posttest and generalization test _______ 104 Figure 6.6 Mean percent correct identification pretest and posttest for word-level and sentence level geminate consonants _______________________________________________________________ 105 Figure 6.7 Mean Percent correct identification for each consonant (/s/, /t/, /k/) and each post-consonantal vowel (/a/, /u/) in pretest and posttest for word—level geminate consonants ________________________________________________________________________________________ 107 Figure 6.8 Mean Percent correct identification for each consonant (/s/, /t/, /k/) and each post-consonantal vowel (/a/, /u/) in pretest and posttest for sentence-level geminate consonants ________________________________________________________________________________________ 109 Figure 6.9 Mean percent misperception of geminate consonants as long vowels for each consonant type in pretest and posttest ____________________________________________________________ 112 Figure 6.10 Mean ratings for the production test ______________________________________________________________ 114 Figure 6.11 Mean production rating for each consonant type in pretest and posttest for the AV training group ____________________________________________________________________________________________ 115 xii CHAPTER 1 INTRODUCTION The present study will address issues in the acquisition of second language (L2) phonology by English speakers, focusing on geminate consonants in Japanese. Specifically, the main experiment that provides data for the present study was conducted to investigate whether adult learners of Japanese could be trained to perceive and produce geminate consonants accurately. In order to determine effective training materials and methods, the present study also examined what kinds of geminate consonants were most difficult for learners to perceive and produce. The rationale for choosing geminate consonants is the notorious difficulty which many learners of Japanese as a second language have with durational contrasts. Japanese is a mora-timed language, and duration is contrastive, while such contrasts do not exist in English. It has been reported that learners whose first language (L1) is English often have problems with the perception and production of geminate consonants, as well as with other timing morae in Japanese (Toda, 2003). These problems often lead to miscommunication. In the present study, participants were given perception training in order to examine the possibility of improvement due to such training in their L2 performance. A number of studies have reported that intensive laboratory training resulted in improvement in learners’ performance (e. g., Jamison & Morosan, 1986, 1989; Logan, Lively & Pisoni, 1991; Pisoni, Aslin, Perey & Hennesy, 1982; Yamada, Akahane-Yamada & Strange, 1995, Hardison, 2003, 2004). Further, it has also been reported that the effects of training in perception were transferred to ability in production (e. g., Hardison, 2003; Bradlow, Pisoni, Akahane-Yamada & Tohkura, 1997; Catford & Pisoni, 1970; Rochet, 1995). Through investigation of the effect of perception training in modifying foreign accented production, we will also examine issues of the relationship between perception and production. The present study also emphasizes the importance of collecting data from learners’ performances and aims to develop an effective training program to train English-speaking learners’ perception as well as production of geminate consonants in Japanese. Many of the previous studies of geminate consonants in Japanese limited their focus to stop consonants, and few studies have referred to the types of phonetic contexts in which geminates occur. Therefore, it is necessary to examine whether there are any particular phonetic contexts that make perception and production of a geminate consonant more difficult for learners. Based on the outcome of such research, we could find ways to focus training more specifically and to develop effective materials for training. Although previous training studies have been shown to be effective, the methodologies have not been evaluated thoroughly enough. For example, traditionally the dominant method for examining learners’ development of the ability to perceive new, diflicult nonnative contrasts was laboratory auditory training, by using a two-alternative identification or discrimination tasks involving minimal pairs in isolation (e.g., Akahane—Yamada, Tohkura, Bradlow & Pisoni 1996; Ingram & Park, 1998; Ziolkoski, Usami, Landahl & Tunnok, 1992), but few studies provide details of why the particular methods themselves were adopted. The present study aims to suggest a more effective training method by actually collecting and examining in detail the perception and production data of learners, determining in greater detail where errors occur. Another issue to consider is the modality of training. The above mentioned previous training studies generally used auditory-modality cues; however, Hardison’s studies (e. g., 2003, 2004) reported that L2 learners benefit from visual cues as well as auditory cues in perception training, and further showed that bimodal training was especially effective on the more phonologically difficult segments from the point of view of the learners’ L1. The psychological evidence supports the claim that information from one modality helps to reinforce another’s sensory pathway, and the combination of information from different modalities enhances the development of the learning process (do Sa & Ballard, 1997). The present study examined the effect of combining visual cues with auditory cues to train the learners to identify geminate consonants, and the results were compared with auditory-only training. Hardison (2005a) used visual displays of pitch contours of French as visual cues to train English speakers and reported their effectiveness as visual input. The present study used visual displays of waveforms of geminate consonants as aids for learners to identify difference of mora weight between singleton and geminate consonants. A growing number of language teaching programs have been utilizing computer-assisted instruction for perception and pronunciation teaching to enhance self-monitoring skills by learners. Recent developments in technology allow researchers to display formant frequency graphs, waveforms, or spectrograms on computers to teach both suprasegmental (stress, rhythm and intonation) and segmental features (e.g., Anderson-Hsieh, 1994, 1996; Chun, 1989, 1998; de Bot, 1983; Hardison, 2004, 2005; Levis & Pickering, 2004; Molholt, 1988; Weltens & de Bot, 1984). Such visual displays have been used mainly as production training to give learners feedback on their own production. As a potential alternative training method, the present study proposes computer-assisted perception training. Visual information which consists of graphs of the waveforms of geminate consonants is expected to facilitate learners’ sensitivity to mora timing in Japanese and improve their perception and, subsequently, production as well. As shown in several studies mentioned above, the effect of perception training can be carried over to production ability without additional explicit production training. The present study aims, therefore, to examine whether this transfer effect can be observed in the acquisition of geminate consonants. The training method of the present study was developed on the analysis of actual data which were able to pinpoint the difficulties that learners of Japanese a second language have. There are also other advantages of computer-based instruction. For example, it appears that computer-delivered materials are helpful in reducing the nervousness that students may feel in the classroom, and easy access may encourage learners to use a computer program on a daily basis. The present study suggests that pronunciation training should be incorporated into everyday classroom teaching, in addition to intensive laboratory training, which has also been shown to be highly effective in previous studies. In sum, the present study was motivated by the following research questions: 1) How do the learners perceive and produce geminate consonants? Is there any particular phonetic context of geminate consonants, which makes perception and/or production more difficult for learners? 2) Are audio-visual instruction and training using visual displays of waveforms of geminate consonants more beneficial than auditory-only information? 3) Does perceptual training improve production ability without any additional explicit production training? To examine the first research question, Experiments I and III described below were conducted to collect data on the perception and production, respectively, of geminate consonants by American English speakers. As reviewed in the following section, previous studies have shown that learners perceive and produce geminate consonants in different ways from native speakers of Japanese. However, the results of these studies vary according to many factors including the data collection methods and the focus of the analysis. The present study focused on the phonetic form and context of geminate consonants, that is, the types of consonants and the preceding and following segments; few previous studies have examined these factors. Research questions 2 and 3 are tested by Experiments II and IV. Experiment II was conducted to test the efi’ect of electronic visual input on the perception of geminate consonants, and Experiment IV was a pretest and posttest of the experimental training study using visual input in perception training, to examine whether the training is effective and whether the effects of such training transfer to improvement in production. The organization of the remainder of this dissertation is as follows. Chapter 2 reviews the relevant literature to explore the above research questions, mainly regarding 1) perception and production of geminate consonants by native and nonnative speakers of Japanese; 2) the relation between perception and production and the effects of training to improve ability in these two domains; and 3) the effects of electronic visual feedback on acquisition of L2 phonology. Chapters 3 through 6 discuss the methodology and results for Experiments I through IV, respectively. Chapter 7 provides a general discussion of the results of the experiments and the pedagogical implications. CHAPTER 2 BACKGROUND 2.1 Previous studies of geminate consonants 2.1.1 Mora in Japanese The duration of vowels and consonants is a contrastive feature in Japanese while it is not in English. A mora is a unit of timing, and each mora is supposed to take about the same length of time to pronounce (Ladefoged, 1993). As the number of morae increases, the total duration of the syllable increases proportionately. Syllable duration, and thus total word length is attributable to the number of morae. There are three kinds of special morae in Japanese; geminate consonants, moraic nasals, and those resulting from long vowels, and they are called ‘tokushu-haku’ (special timing morae). To perceive and produce these special morae, sensitivity to timing is necessary, and it is a difficult task for learners of Japanese whose native language does not have durational contrasts to acquire native-like perception and production. Below are examples of minimal pairs of both tokushu-haku and non-tokushu-haku: Table 2.1 Japanese special morae “tokushu-haku ” Long vowel Geminate consonants Moraic nasal /kiite/ “Listen.” /kitte “stamps” /kiNka/ “gold coins” (3 morae; 2 syllables) (3 morae; 2 syllables) (3 morae; 2 syllables) lkite/ “Come.” lkite/ /kika/ “vaporization” (2 morae; 2 syllables) (2 morae; 2 sfllables) (2 morae; 2 syllables) Although basic Japanese syllables are open, consisting of /CV/, syllables with geminate consonants and moraic nasals are exceptionally closed ones. According to Shibatani (1990), geminate consonants consist of a non-nasal consonant coda followed by a homorganic consonant onset in the following syllable. This homorganic geminate consonant adds one mora. For example, a word containing a geminate consonant like kitte “stamps” is considered a three—mora word, while its single consonant counterpart is counted as a two-mora word, e. g., kite “come,” and again this difference in duration is contrastive as shown in Table 2.1. By actually measuring the duration of words produced by native speakers, research has shown that each mora has equal duration (e. g., Port, Dalby & O’Dell, 1987; Sugito, 1999). For example, Port et a1. (1987) measured the duration of a number of words which contain different numbers of morae, including words with geminate stops and long vowels, and found that the duration of words with an increasing number of morae increased by nearly consistent increments. It would appear that native Japanese speakers discriminate between short and long vowels, as well as single and geminate consonants, by the relative duration of the target vowel or consonant. With morae of relatively equal length, Japanese therefore has isochronous timing of morae, while English has syllables and/or morae of difl‘erent lengths, most notably as a result of stress; i.e., stressed syllables are longer. Duration of units, however, is not systematically contrastive in English. It has been pointed out that English-speaking learners of Japanese often show timing problems in the production and perception of long vowels and geminate consonants, since durational contrast is novel for them. Thus, not only learners, but also instructors need a clear understanding of how native Japanese speakers make durational contrasts. 2.1.2 Research on native speakers’ production of geminate consonants A number of studies have conducted acoustic analyses to see how native speakers of Japanese actually produce a geminate as opposed to a single consonant. It has been found that one of the most important acoustic cues for producing the distinct duration of a geminate stop consonant is the closure duration of the first part of the geminate. The results of these previous acoustic studies are mostly consistent; the total duration of a syllable with a geminate consonant is approximately 50% longer than that of its single consonant counterpart, though there is a discrepancy in actual measurements as to whether the single/geminate duration ratio is exactly 2:3 or not. Homma (1981) measured word duration of two and three mora words with geminate stops (/pp/, /tt/, and /kk/) and their singleton consonant counterparts produced by native Japanese speakers. She found that the ratio duration between words with single stops and those with geminated stops was about 2:3, confirming that the morae of geminates are isochronous timing units. Such duration was not affected by the phonological context of types of preceding and following consonants and vowels. Sugito (1999) measured the duration of one- to five-mora words with geminate stops and found that words with an increasing number of morae increase in duration by nearly constant increments. She also conducted the same experiment with English native speakers, having them read English words, and pointed out that English syllables are, unlike morae in Japanese, inconsistent in their duration, as shown in Figure 2.1. Duration of words in Japanese spoken by Y. T. 0.8 0.7 0.8 .0 (a Word duration in sec. 9 o N -§ .0 ‘ 1 2 3 4 5 # of syllables in a word 0.6 d 0.5 /’ 0 (I) / .E 0.4 c / .9 E 0.3 / 3 P 0.2 a L o / 3 0.1 v 0 4 I I 1 2 3 4 # of morae in a word . . . j Duration of words in English spoken by R. E. 0.9 I Figure 2.1. Duration of words in Japanese and English (Sugito (1999); reproduced from p.69) 2.1.3 Research on native speakers’ perception of geminate consonants Researchers have also been intrigued by the question of what acoustic cues native Japanese speakers use to discriminate durational contrasts. It is generally agreed 10 that native Japanese speakers use closure duration of the first part of a geminate consonant as a primary cue in discriminating between two- and three-mora words. For example, Min (1987, 1993) used digitally edited stimuli made from two-mora words and their three-mora geminated counterparts. Using a stimulus such as /ita/, the stop closure duration between /i/ and /ta/ was gradually lengthened in 10 ms steps from 110 ms to 250 ms (15 different lengths altogether). The participants were then asked to tell whether they heard lita/ or /itta/. The results showed an apparent perceptual categorical boundary at 160-180 ms among native speakers. Fujisaki and Sugito (1977) also used synthesized stimuli manipulating the closure duration of the first part of a stop geminate consonant and had native Japanese speakers discriminate between their perceptions of geminate and single consonants. They reported that the closure duration played the most important part in discrimination, and the perceptive boundaries for the native Japanese speakers to distinguish a single from a geminate consonant were categorical. Many other studies indicate similar findings (e. g., Fukui, 1978; Hirato & Watanabe; 1987; Toda, 1998). Besides closure duration, there have been some studies which show that pitch accent also influences native speakers’ perception of durational contrasts. Ofuka (2003) also used synthesized stimuli to examine the relationship between pitch accent location and the perception of durational contrasts. Pitch accent is contrastive in Japanese, e. g., minimally contrastive /ame/ (H(igh)-(L(ow)) and /ame/ (LH) are different words which mean “rain” and “candy,” respectively. She examined the perceptual boundary between a singleton consonant word /kata/ “shoulder” (HL) and its geminated counterpart lkatta/ “won” (HLL) and another pair /kata/ “form” (LH) and /katta/ “bought” (LHH), by 11 manipulating the closure duration of stimuli by 10 ms in ten steps between /kata/ and /katta/ in both pitch accent patterns. She found that native Japanese speakers were affected by the location of pitch change so that there is a significant difference between the two pitch accent patterns in the placement of the perceptual boundary; the LHH pattern in /katta/ “bought” required a longer closure duration than the HLL “won” pattern to be perceived as containing a geminate. Furthermore, Hirata (1990a) distinguished word-level from sentence-level perception. In her experiment, native speakers seemed to use different acoustic cues at different levels. In word-level perception, preceding vowel length as well as stop closure duration were utilized by native speakers as acoustic cues to discriminate single and geminate consonants. She concluded that the ratio of the closure duration of the consonant to the duration of the preceding vowel was a crucial acoustic cue. Ifthis ratio is short, the consonant is perceived as single, but if it is long it is perceived as a geminate. In the sentence-level perception, the distinction is made based on the speed of the units following geminate consonants. A single consonant can be perceived as a geminate consonant if the following parts of an utterance are read fast, and geminates can be heard as single consonants if the following parts of an utterance are read slowly. In sum, for native speakers, the acoustic cues for durational discrimination are not limited to a single factor; the cues may include the duration of the preceding vowel, the closure duration of the stop, and the speed of the following elements, depending on whether perception occurs at word level or sentence level. Despite such interacting conditions, however, native speakers have clear and consistent perception of durational contrasts. 12 2.1.4 Research on nonnative speakers’ production of geminate consonants It has been reported by many researchers that the timing control of geminate and single stop closures differs significantly between native speakers (NS) and nonnative speakers (NN S), which contributes to the characterization of an accent as “foreign.” Han (1992) reported that her American subjects’ closure duration of stop geminate consonants was consistently shorter than that of the NS subjects. On the other hand, Toda (1994) claimed that it was not only the shorter duration of geminate consonants and long vowels, but also the longer duration of single consonants and short vowels, which made learners sound like they were producing geminate consonants and long vowels. As a result, her Australian subjects tried to produce geminate consonants which were even longer than their already lengthened single counterparts, and this adjustment resulted in a noticeable foreign accent. 2.1.5 Research on nonnative speakers’ perception of geminate consonants As discussed so far, previous studies have generally agreed that the absolute closure duration of a stop consonant is one primary cue for native speakers of Japanese for discriminating between durations of consonants. Researchers are also interested in seeing whether learners of Japanese use the same acoustic cue. Most of the previous studies of the perception of geminate consonants by nonnative speakers have been carried out to determine the categorical boundaries of perception of contrasts using synthetic stimuli. Inaccurate perceptual boundaries of closure duration for the single vs. geminate discrimination will cause faulty perception by nonnative speakers. The results agreed that nonnative speakers would perceive the stimuli 13 as geminate consonants when they had shorter closure duration than was required by native speakers, but in general, such a perceptual boundary was not categorical, but rather blurred and continuous (e. g., Hirata, 1990b; Min, 1987; Nishibata, 1993). In Min (1987), Korean speakers’ results were compared with those of native participants. The results showed an apparently categorical boundary among native speakers, while nonnatives did not have such a clear boundary. Min further found that, while native speakers used closure duration as an acoustic cue for perception, some Korean learners, though too small a number to generalize from, tended to depend on additional phonetic characteristics such as tenseness and aspiration of the consonant as acoustic cues. Korean and Chinese speakers in Minagawa and Kiritani’s (1996) study, and Thai speakers in Minagawa’s (1996) study, were found to be affected by pitch accent types when discriminating single and geminate consonants. In a High-Low (HL) accent context, the error pattern of C-)CC (mishearing a single as a geminate consonant) was significantly higher than that of CC-9C (mishearing a geminate as a single consonant), but in the Low-High (LH) accent context there was no difference in error types. Since the acoustic measurement of the stimuli revealed durational differences of postconsonantal vowel duration between the HL and LH contexts, that is, the average postconsonantal vowel duration in an HL accent context is shorter than in an LH accent context, closure duration to postconsonantal duration ratio was suggested as a possible acoustic cue for Korean and Chinese speakers in judging the single vs. geminate contrast. In contrast, according to Hirato and Watanabe (1987), the perception of the single vs. geminate stop contrast by native Japanese speakers is not affected by postconsonantal vowel length. In addition, Toda (1998, 2003) reported that, while NS were affected by preconsonantal 14 vowel duration, NNS did not show such an influence. Another interesting finding is in Yamagata and Preston’s (1999) study. Their study showed that the learners (L1 was English) often perceived long vowels in Japanese loanwords, which native Japanese speakers spelled with geminates instead. Although the learners failed to geminate, they were successful in giving the target words the correct number of morae. Enomoto (1989) and Toda (2003) firrther reported a learning effect through formal instruction, in which advanced level learners came to acquire a clearer perceptual boundary for geminate consonants, compared to the beginning learners. In summary, there may be different acoustic cues which learners of Japanese might depend on. Although the findings on perceptual cues used by native speakers are consistent among researchers, there has been little agreement and no clear generalization on what acoustic cues nonnative speakers use to distinguish geminate/single consonants. This is because each research program is different, for example, in the subject’s L1, the phonetic contexts of the stimuli, the data collection method, or the levels of proficiency of the subjects. The amount of research which focuses on the correlation between learners’ perceptual ability and the types of phonetic contexts of the stimuli is specially limited. Most studies have examined stop consonants (e. g., Minagawa, 1996; Nishibata, 1993), except Toda (1998) and Hayes (2002) which also included fricatives (/ss/). Hayes (2002) conducted an experiment to examine the relative perceptibility of the contrasts based on durational differences among particular types of single/geminate contrasts. To analyze the duration of single/geminate consonants acoustically, a fricative 15 /s/ and two stops /t/ and /k/ were chosen for test items. She hypothesized that differences in durational contrasts between single and geminate consonants belonging to different natural phonological classes would affect the learners’ perception of the singleton-geminate contrast. The subjects of the study were to listen to minimal pairs, 60 of the same word and 60 with different words, and then to tell whether the pair that they had just heard was the “same” or “different.” As seen in Table 2.3, since the difference in duration between a single /t/ and a geminate /tt/ is larger than the difference between both the /s/ and /ss/ pair and the /k/ and /kk/ pair, she hypothesized that the discrimination of /t/ and /tt/ should be the easiest. On the other hand, there should be no difference in difficulty between /s/ contrasts and /k/ contrasts, since they have little durational difference. Table 2.2 Stop closure and frication duration of single/geminate consonant pairs (Hayes, 2002, p.32) t/tt k/kk s/ss Single duration 95.7 81.7 136.1 Geminate duration 276.1 223.6 270.1 Difference (geminate duration 180.3 141.9 134.0 minus single duration) (in msec) Her hypothesis was supported by the results of the experiment. However, learners do not usually encounter such situations, in which they can compare single and geminate counterparts for discrimination. Therefore, it is hard to say that this result 16 reflects the reality of learners’ perception in other context. Further, the geminate consonants used as stimuli in previous studies differed in phonetic contexts, i.e., they were of various consonant types and had various preceding and following elements. One of the objectives of this present study is to examine more comprehensively how learners perceive and produce geminate consonants and whether their perception as well as production is affected by such phonetic conditions. In addition, we have seen that nonnative speakers’ perception and production of geminate consonants are different from native speakers’; however, there have been few studies of how these two abilities are related, except Akahane-Yamada (1999), whose study was limited to stops. It is important to explore this issue further in order to clarify the details of the fundamental problems which learners might have in acquiring geminate consonants. The next section will review general views on the development process of perception and production by adult L2 learners. 2.2 Development of speech perception and production The view that perceptual development comes before production development is consistent with the results of a number of experiments which have been concerned with the relationship between L2 perception and production in the course of L2 acquisition. Many such studies suggest that perception plays an important role in production, and production problems result not only from motoric difficulties but also from perception problems. Flege, Munro, and MacKay (1995) suggested that production inaccuracy of the Italian learners in their study might have been due to a perceptual problem; they argued 17 that an L2 phone must be perceived in a firlly native-like fashion if it to be produced in a fully native-like fashion. Thus, they argue, perception should come before production; although correct perception does not guarantee correction production, it is a prerequisite for it. Rochet (1995) also observed the role of perception in foreign-accented pronunciations of L2 sounds. His study examined perception and production of the French high front rounded vowel [y] by untrained Portuguese and English speakers, whose native languages contain only two high vowels (/i/ and /u/). In the perception test to identify vowels along a synthetic high vowel continuum, native French speakers identified a stimulus with the F2 values between 1300 and 1900 Hz as /y/, but Portuguese speakers identified it as /i/ and English speakers as /u/. Based on this result, Rochet hypothesized that an imitation task would indicate a similar tendency; when /y/ was produced incorrectly, Portuguese speakers tended to produce it more /i/-like, whereas English learners produced more /u/-like vowels. The results supported his hypothesis; therefore, he claims that foreign-accented pronunciation by untrained speakers may be perceptually motivated. This perception precedence idea can be also observed in several other studies (e. g., Aslin, Pisoni, Hennesy & Perey, 1981; Barry, 1989; Bohn & F lege, 1990) However, there have been reports which showed opposite tendencies. Sheldon and Strange (1982), replicating Goto (1971), collected data from Japanese learners of English regarding the English liquids /r/ and /l/, which are not contrasted in Japanese. The data showed that the subjects performed better and more accurately on the production of /r/-/l/ contrasts than in perception. The data included perception test materials involving minimal pairs with /r/ and /l/ and the subjects’ judgments regarding their own productions 18 of the pairs. According to Sheldon and Strange, “perceptual mastery of a foreign contrast does not necessarily precede adult learners’ ability to produce acceptable tokens of the contrasting phonemes” and “may lag behind production mastery” (p. 254). Flege and Efling’s (1987) experiment with Dutch speakers of English showed that their subjects were able to produce a substantial voice onset time (VOT) difference between the /t/ phonemes in Dutch and English, but they did not show such good discrimination in perception. Further, Mack (1989) also conducted studies which showed that production can be more accurate than perception. Gass (1984) examined the perception of L2 learners of English of the VOT of /b/ and /p/ in initial positions by using a forced-choice task with synthesized stimuli, and the learners’ production data were also collected. Perception data showed an unclear, continuous distinction between the segments, compared with the native speakers’ clear categorical boundaries. As opposed to this nonnative-like perception, the learners could produce /b/ and /p/ in native-like fashion. Thus, in this study, nonnative speaker production was in advance of perception. However, as Flege (1991) and Mack (1989) pointed out, these results have to be interpreted carefirlly. For example, the data from the Japanese learners of English in Sheldon and Strange’s (1982) study may have been influenced by the formal English training in production which Japanese school students had received, i.e., instruction to use articulatory strategies such as “to say /l/, combine the features of the Japanese X and Y sounds” (F lege, 1991, p. 265). Thus, the types of input which the subjects have been given should also be considered cautiously to determine precisely how the data collected could point to a specific process of L2 development. One of the findings in Sheldon (1985), which reanalyzed the Korean learners’ 19 data in the US. reported by Borden, Gerber and Milsark (1983), was that the precedence of production by perception decreased as the Korean learners’ time residing in the U. S. increased. This could be interpreted as an effect of instruction as Sheldon and Strange (1982) argued above. Sheldon hypothesized that a functional perceptual level in an L2 learner might be enough for communication purposes, while heavily accented productions are socially less accepted, with the consequence that L2 speakers would feel more pressure to improve production than perception. Bohn and Flege (1990) also agreed that speech production was more subject to social control than perception, and as a result, the perception of a new contrast showed more resistance to L2 experience than the production of the contrast did. Although no conclusive determination has been made, we can assume that speech perception and production capacities of individuals have great overlap. It is important to consider both of the areas simultaneously. To explain the relation between L1 and L2 in perception and production, and predict difficulties that learners tend to have, F lege (1995) and Best (1995) proposed the following L2 developmental models. Best (1995) proposed the “Perceptual Assimilation Model (PAM),” which hypothesizes that L2 speakers perceive nonnative sounds based on similarities to or discrepancies with the L1 phones which are closest to them in terms of the manner of articulation. The model predicts an L2 discrimination ability that depends on the degree to which an L2 contrast can be associated with L] categories. Thus, for L2 learners, certain contrasts in L2 are easier to discriminate than others, while some are more diflicult. For example, /r/ and /1/ present the most difficult contrast for Japanese learners of English to master since these two phones are identified as the same Japanese phoneme, 20 a situation which is called ‘single category contrast.’ Similarly, Flege (1995) developed the ‘Speech Learning Model (SLM),’ which hypothesizes that L2 sounds that are perceptually similar to sounds in L1 are more difficult to acquire accurately than sounds that are dissimilar to any sounds in L1, and L2 speakers try to assimilate a new L2 phone to a close Ll phone although the two phones are acoustically different. This indicates that such L2 learners have not detected the phonetic differences between an L2 sound and the most similar L1 sound, which results in foreign accents. The greater the phonetic distance between an L1 phone and the closest L2 phone is, the more easily the L2 learner can detect the difference. Greater phonetic distance facilitates the eventual establishment of a phonetic category. Both models assume that perceptual learning occurs first but the perception and production skills develop in parallel, although this prediction that perception development is followed by production is not always true. However, as we have seen, production depends on perception in certain ways, although its development may not always follow perception development. We can therefore assume that production difficulties may be associated with perception difficulties. As the SLM and PAM suggest, since learners are language-specific perceivers of speech sounds and tend to adjust their perception to the phonetic characteristics of speech segments found in their L1 5, nonnative-like perception often occurs during the course of L2 acquisition. The previous linguistic experience with L1 might influence the way L2 sounds are perceived, at least in the early stages of L2 perceptual category development. Jusczyk (1993, 2000) proposed a model of the development process of infant speech perception, which can be applicable to the L2 acquisition process, too. Infants 21 have an innate auditory analyzer which can process any potential L1 at the initial stage of processing of speech signals. A set of auditory analyzers provide a preliminary description of the spectral and temporal features present in the acoustic signal. Once language is acquired, the output of the auditory analyzers is weighted to give prominence to those features that are the most critical to distinctive phonological features. This “weighting scheme” is a way of directing attention to features critical for recognizing and distinguishing words in a particular native language. For example, information from auditory analyzers concerning aspiration in syllable-initial voiceless and voiced stops would receive heavy weighting in the acquisition of English, but not of French. Therefore, to acquire a new language, a listener must learn a new weighting scheme in order to be attuned to the target language. Many studies of first language acquisition reported that children’s linguistic ability to learn to discriminate between new contrastive features decreases after a certain age. This is not a loss in auditory capability, but rather a reorganization of the perceptual space optimal for L1 (Guion & Pederson, 2002). Since L2 learners tend to fall back on the weighting scheme used for the native language, a new weighting scheme must be developed. They must learn to alter the focus of attention, which affects the way in which speech sounds are perceived (Jusczyk 1993). Based on the idea that perceptual space is modified by experience (Nosofsky, 1986), a number of training studies have been conducted in order to examine how adult learners can alter such focus of attention; they are reviewed in the following section. 2.3 'fiaining studies Earlier researchers have postulated that the poor performance observed in adult 22 learners’ perception and production was due to a permanent change in the perceptual or sensory mechanisms as a result of selective early experience (Pisoni, Aslin, Perey & Hennesy, 1982). On the other hand, training experiments have been conducted based on the assumption that it is possible to train adult learners to perceive and/or produce novel L2 phonemes. This implies that adult learners’ perceptual and/or productive systems can be modified. Such training studies generally aim to 1) find the cause of difficulty in acquiring new L2 phones; 2) discover how capabilities of the adult perceptual system are modified; 3) show that linguistic experience has a substantial efl’ect on speech perception; 4) find an effective way for L2 learners to acquire difficult sounds; and/or 5) examine firrther the relationship between perception and production. In the early studies which showed the effectiveness of training, researchers were interested in the perception of voicing contrasts in stops. Pisoni et al. (1982) trained monolingual English speakers to identify and discriminate VOT contrasts that are not phonemically distinctive in their native language. For the experiment, synthetic VOT stimuli based on measurements of natural speech were used to train the subjects to identify -70, 0, and 70 ms VOT synthetic stimuli (voiced, voiceless unaspirated and voiceless aspirated stops, respectively). Whereas English has only a two-way contrast of voiced and voiceless, and the features aspirated and unaspirated are not contrastive, the results showed that adult learners could perceive an additional perceptual contrast easily in the laboratory after a short training period (1 hour a day for 4 days). Thus the adult subjects were successful at modifying their perception of VOT. Pisoni et al. also argued that the key to this successfirl training was to provide immediate feedback during training tasks. Further, McClaskey, Pisoni and Carrell (1983) also showed that knowledge about 23 VOT perception gained from laboratory training was genuinely acquired in that the result of discrimination training on one place of articulation (e.g., labial) was transferred to another place of articulation (e.g., alveolar) without any additional training. Another speech contrast that has been investigated in a great detail by a number of studies is the /r/-/l/ contrast in English. The contrasts are harder for learners to acquire than VOT distinction. In order to distinguish between /r/ and /l/ in various phonetic environments, processing of complex temporal and spectral changes is required, although the stop voicing involves only a temporal difference. Voicing may be more discriminable to listeners than the acoustic cues that underlie other speech contrasts, since it is psychophysically more distinctive or robust (Pisoni, Lively & Logan, 1994). A series of studies was conducted by Pisoni and his colleagues to address the problems experienced by L1 Japanese learners of English as a second or foreign language. Japanese does not have the /r/-/l/ contrast (Bradlow, Akahane-Yamada, Pisoni & Tohkura, 1999; Bradlow, Pisoni, Akahane-Yamada & Tohkura, 1997; Lively, Logan & Pisoni, 1993; Logan, Lively & Pisoni, 1991; Pisoni, Lively & Logan, 1994). The training procedure and stimuli used in their experiments were designed to avoid some of the problems found in Strange and Dittmann’s (1984) study. In Strange and Dittmann, although discrimination performance improved gradually over the training sessions, the effects of discrimination training did not generalize to naturally produced stimuli. One of the causes of their failure to train learners’ linguistic ability was that the variability of the stimuli was too limited to generalize, since the training stimuli consisted of only one /r/-/l/ minimal pair produced by one synthetic voice. Based on this observation, Pisoni and colleagues used a wider variety of training stimuli, which consisted of natural speech 24 tokens instead of synthesized speech, and minimal pairs in difl’erent phonetic environments produced by five different talkers. In doing so, they considered the important role of stimulus variability in perceptual learning. In addition, a two-alternative forced-choice identification task was used instead of a discrimination task. An identification task encourages classification of stimuli into categories, while a discrimination task focuses perception only on fine within-category acoustic differences. This high-variability training approach for perceptual learning contributed to generalization to novel stimuli and talker’s voice. Lively, et al. (1993) showed that increasing the stimulus variability during learning was effective in the development of robust phonetic categories. The training was also effective in promoting long-term retention of learning in both perception and production; the Japanese subjects in Japan maintained their improved levels of performance three months after the perception training (Bradlow et al., 1999). Another training technique was described in Jamieson and Morosan (1986). Their study examined the ability to identify the American English fricatives /6’/ and /6/. Training was given to Canadian francophone speakers by using a perceptual fading technique, in which stimuli were presented sequentially from the most acoustically distinct stimuli to the least distinct stimuli. In a more recent study, McCandliss, Fiez, Protoppapas, Conway, and McClelland (2002) used a similar technique called adaptive training to train Japanese learners to acquire the English /r/-/l/ contrast through synthetic stimuli, which maximally exaggerated the acoustical difference between the contrasts, and gradually minimized the difference to approximate that found in natural exemplars. They also investigated the effect of feedback, comparing the presence and absence of 25 feedback in combination with the different types of training. It was found that combination of adaptive training and feedback facilitated learning the most by calling a subject’s attention to the critical cues that distinguish the training stimuli. In addition, the result of the training experiment indicated that the exaggeration effect would increase the likelihood that the subject would be able to generate consistent labels of contrasts even in the absence of feedback. However, they also suggested that, as similarly implied by J amieson and Morosan’s result, what the subjects have learned is a very general phonological discrimination, and it could not apply to all instances of /r/ and /l/ spoken in all possible contexts by all speakers. They assumed that the fixed training with a large number of various stimuli in combination with feedback such as was used by Pisoni and colleagues in the study mentioned above would lead to more robust generalization and contribute to mitigating the difficulty learners have in acquiring the target contrast from natural experience, although their adaptive technique would provide more rapid learning. While many training studies predominantly used auditory presentation methods, Hardison (2003) also used and extended the high-variability training approach to include training in combined auditory and visual modalities. Her study was the first published study that investigated auditory-visual vs. auditory-only training for L2 learners. Using a talker’s face, including articulatory gestures, as a visual cue, and locating the sound in various phonetic contexts and positions within the word, Hardison examined the effect of speaker and context variability on the perception of the English /r/-/l/ contrast by Japanese and Korean learners of English. The result demonstrated significant interactions of these variables and indicated both generalization to novel stimuli and production improvement. The effectiveness of multimodal training in addition to high variability of 26 stimuli was also observed in the earlier identification of words beginning with /r/, /l/, /p/ and /f/ by Hardison (2005b). As another effective way of providing audio-visual training, a real-time computerized pitch display was used to provide prosody training for English-speaking learners of French (Hardison, 2004) and Chinese-speaking learners of English (Hardison, 2005a). The learners could visualize their own pitch contours in utterances in the target language and compare it with native speakers’. This is another example of effective training utilizing visual input along with auditory input to facilitate learning. According to Hardison (2000, 2003), these results, that a multi-modal, high-variability perceptual training approach facilitates learning and generalization, indicate that language learners store detailed individual instances as memory traces, rather than creating abstract prototype categories, and use these stored detailed episodes for memory encoding. Traditional view of the learning process, known as the abstractionist view, assumes that any representations of the sound patterns of words are stored as abstract prototypes and are normalized with respect to variables affecting the sounds, such as the talker’s voice, speaking rate, and so on. It is assumed that these variables, which were not necessary for processing the meanings of any given utterances, were discarded as noise somewhere during speech processing. The alternative view, the episodic view, does not assume such normalization or prototype formation, but assumes that listeners store specific instances or tokens in memory. During processing, they evoke specific instances, rather than abstract representations of the sound patterns of words, and try to match new instances to these. 27 This view is supported by empirical studies of adult learners (e. g., Goldinger, 1997 ; Johnson, 1997) and the investigations regarding adults’ and infants’ retention of specific details of particular instances of perceptual experience, e. g., recognition of particular voices (Jusczyk 2000). Multiple-trace theory incorporates the above-mentioned prototype and episodic views of perceptual processing, such as in the Minerva 2 model by Hintzman (1986), and explains how repetition affects episodic memory. The model assumes that each experience event has it own memory trace as an episodic trace and stores specific events in primary or short-term memory (PM) as collections of primitive properties that include perceptual details, context, affect, semantic connotation, and so on. When retrieving a memory trace, a retrieval one or “probe,” which is an active representation of experience, is simultaneously sent to communicate with all stored dormant traces in secondary or long-terrn memory (SM). When the probe is sent from PM to all traces in SM, PM receive a single reply or “echo.” Repetition of the same experience produces multiple traces of an item but does not cause strengthening of a single memory trace. Each trace reacts more or less intensely depending on its similarity to the probe, and the contribution of traces which are the most similar to the probe is greater because they produce a more intense response. If the information in the representation is more detailed, the probe becomes more specific, which produces a smaller set of highly activated traces. Thus the responses or echoes to the probes vary in their intensity and content. Whenever several traces are very strongly activated, the intensity of content of the echo is very strong and reflects their high level of common properties; therefore, if a new instance is very similar to previously stored traces, the intensity of the echo reflects more common properties. A strong echo reflects greater 28 degree of similarity in activated traces and familiarity to the experience. However, if the probe resembles only a few of the previously stored traces, the returned echo should reflect more idiosyncratic properties of those activated traces. Thus, the specificity of the probe and, the number of strongly activated traces will determine whether the echo content is ambiguous or clear. Jucszyk’s (1993) development model reviewed earlier is also based on the episodic view. During the course of perceptual development, the output of the innate auditory analyzers at an earlier stage of development is weighted to give prominence to the critical distinctive features in the target language to enable the learner to recognize words. Through this attention weighting scheme, sound pattern extraction is made, and then the matching process occurs. The representation obtained through linguistic experience and by the weighting scheme serves as a probe that will try to be matched against existing representations, or traces previously analyzed and stored in SM. If a close match is obtained between the probe and the stored items, the input is recognized; if not, the input is stored as a new item. It is also assumed that representations of the sound structure of a word are not stored in the form of abstract descriptions such as abstract prototypes; rather, the sound properties of items actually encountered in different contexts, in other words, multiple traces of individual instances of the item are stored. The above episodic views on the learning process are consistent with the results from the previous L2 training studies whose results indicate that repetition of high-variability stimuli and immediate feedback are indispensable factors in effective training. Hardison (2000) proposed a scenario of bimodal L2 speech processing and the role of training, based on multiple-trace theory, Jusczyk’s model of child L1 development, 29 and results from her auditory-visual resulted in Hardison (1998). The following is her proposal: at the first stage of L2 acquisition, auditory and visual inputs are preliminarily analyzed through different pathways. In the next stage, a new weighting scheme for L2 must be developed, so that learners can alter their attentions from the optimal setting to perceive distinctive features in L1 to the optimal setting for L2 by learning to attend to new sources of information obtained through the auditory analyzers. For example, to hear the distinction between /r/ and /l/, attention has to be shifted auditorily to the F3 transition and visually to the articulatory gestures in order to distinguish between the sounds. Learning occurs through copying the features of an experience into a trace. Probes, or signals processed in PM, activate dormant stored traces in SM, and the weighting process occurs according to the trace’s similarity to the features of the probe, which ultimately will return an echo to PM. Attention to auditory and visual attributes of the stimulus will determine the features of the probe. Training with multiple exemplars and immediate feedback enhance learning; repetition and feedback can direct attention to within-category similarities and between-category distinctions in L2, adding traces to memory and modifying the memory system. Old traces are not altered, but new traces are added. As the result of learning, new L2 memory traces become less ambiguous and less confusable. Thus, the objective of training is “to create a situation in which the echo from an aggregate of L2 traces acting in concert overshadows the echo from L1 traces” (Hardison, 2000, p. 321). The advantage of prototype is its long retention, while exemplars may be forgotten over time, but decaying multiple-traces of each exemplar with redundancy can also be reduced. Through many new exemplars in perceptual learning, learners store multiple traces which mach the probe; these multiple traces share 30 common features, thus firnctioning like a prototype. Another important feature of Hardison’s (2000) model of L2 development, a weighting scheme is required to direct learners’ attention to critical distinctive features in L2. Multiple-trace theory is based on the assumption that all items that are attended to are stored in memory; learners must be able to attend to critical features of input for categorization and identification of new L2 contrasts. Multiple exemplars, immediate feedback, and repetition add traces and increase the salience and information value of important features to focus on, and consequently enhance learning. Not all tokens in the target language are equal candidates for incorporating into the phonetic category, and only those tokens that are perceived during a “signal-oriented” mode can be collected for incorporation and subsequent modification of a phonetic category (Lindblom, Guion, Hura, Moon, & Vlfrllerman, 1995). Signal orientation, which is the cognitive mechanism of attention, helps to create novel categories in addition to the modification of existing categories. Nosofsky (1986) argues that in categorization and identification of newly encountered stimuli, selective attention process is assumed to operate, which leads to systematic changes in the structure of the perceptual space and changes inter-stimulus similarity relations. Attention weights act to shrink or expand the perceptual space; the psychological space is stretched along the dimension that is selectively attended to, maximizing within-category similarity, and is shrunk along the other dimensions, minimizing between-category similarity, so that learners are optimizing similarity relations for the given categorization problem. If selective attention properly modifies similarity relations across the identification and categorization paradigms of stimuli, the probe to memory will provide good matches to stored L2 traces, returning 3] less ambiguous echoes to PM, and categorization will be enhanced. Therefore, it is necessary to direct learners’ attention to focus on the critical properties. Based on Nosofsky’s proposal, Pisoni, Lively & Logan (1994) examines adult phonetic processing and concludes that the cognitive structures created by attentive processes are adjusted from prior linguistic experience and can be modified through training for better discrimination of non-native phonetic contrasts. As we have reviewed, training programs with high variability and multimodality have shown their effects in the shifting of learners’ focuses, which leads to generalization to new tokens they encounter in the real world. Empirical studies have reported that different sensory areas affect other classification learning in the individual modalities. Bimodal speech recognition reported by McGurk and MacDonald (1976) showed that a pair of auditory and visual stimuli (the visual stimulus being a speaker’s lip movement) can affect each other and produce a sensory effect different from either the actual auditory input or the visual input. de Sa and Ballard (1997) argued that responses of cortical cells in the primary sensory modality would respond to features from other sensory modalities. They then proposed a computational model using the information in one modality to modulate learning in another, instead of merging the outputs from different pathways. In perceptual learning in SLA, not only auditory input but also visual input in AV-training, such as described in Hardison (2003), facilitates such processes. Based on these observations and the exemplar—based theory of learning, the current study also aims to give beginning learners of Japanese effective training to accurately perceive geminate consonants through multimodal (not only auditory, but also visual) training, with a variety of stimuli and immediate feedback, expecting better improvement than that resulting from auditory-only 32 training, as well as generalization to novel stimuli. Some studies have shown that there is a close link between perception and production through demonstrating transfer of training, in which the effect of training on one domain was transferred to another. In Bradlow, Pisoni, Akanae-Yamada, and Tohkura (1997), 11 Japanese learners of English received 45 sessions (30 minutes each) of perceptual identification with feedback over 15 days. The stimuli consisted of minimal pairs for /r/ and /l/. Although the training was designed only for perception, the pretest and the posttest included assessment of production ability. The result showed that the subjects improved not only in perception but also in production. In Rochet (1995), native speakers of Mandarin Chinese received perception training for French voiceless stops, and the result also showed that improvement in perception performance could carry over to improvement in production. In addition, a similar transfer effect was found in the studies on phonologically delayed children conducted by Jamieson and Rvachew (1992). Their studies also showed that speech production treatment for the children benefited from perception training. A very early training experiment in production showed a similar transfer effect; i.e., the effect of production training carried over into perception performance. Catford and Pisoni (1970) compared the performance of subjects who received production and articulation training involving unfamiliar or “exotic” sounds and that of those who were trained only in perceptual discrimination. The results of production and perception tests showed that those who received articulation training in addition to perceptual training performed better. This finding implied, as they suggested, “some kind of carry-over from productive competence to auditory discriminatory competence” (p. 481); thus, 33 improved production abilities may contribute to better discrimination. of L2 sounds. Leather (1990) conducted two experiments in parallel with two different groups of Dutch learners of Mandarin Chinese; perception tests were given to the subjects who had been trained only in the production of Chinese tones, and production tests were given to those who had received only perception training for the same tones. The progress that the two groups made was compared, and the results showed no difference in their progress. Both groups improved at the same rate. He argued that his subjects “did not need to be trained in production to be able to produce, or in perception to be able to perceive, the sound patterns of the target system” and “training in one modality tended to be sufficient to enable a learner to perform in the other” (p. 95). The bimodal (audio and visual) training of Hardison (2003) also showed improvement in subjects’ production ability. Hardison suggests that L2 learners may “attempt to coordinate information about perception and production in category development” (p. 516), a claim similar to that made by Jusczyk’s model (1993) of L1 development. Interactions between the developing perception and production systems may afl’ect the way learners acquire knowledge of L1 sound patterns. Learners are under pressure to coordinate the way that these systems function and to relate the perceptual representations of words to the articulatory representations for production, so they may reach an abstract representation to capture generalizations that apply to both systems, which is phonology. It is the coordination of perceptual and productive representations that may lead the language learner from a more global representation of sound patterns of words to one that is structured with respect to phonetic segments. According to Jusczyk, when infants start babbling, they are very attentive to the distinctive features of the 34 language. Production development lags behind perception since infants have to wait until they gain control and coordination over their jaw movements; it also takes time to coordinate information from both modalities. Adults learners do not have to wait for the development of their articulatory system, but it is observable that they also need some time for the coordination of both modalities. At the same time, it should also be noted that production is more easily altered through formal instruction, as has already been mentioned. '9'”: 1&5 On the other hand, differences in the rate of development in perception and production were found in Bradlow et al. (1997), who reported little correlation between degrees of learning in perception and production after perception training. The learners who improved the most in perception did not necessarily improve the most in production. There was variation in learning; degrees of learning in perception are difl’erent from the transferred learning in production. They noted that “learning in the perceptual domain is not a necessary or sufficient condition for learning in the production domain; the processes of learning in the two domains appear to be distinct within individual subjects” (p. 2393. This claim is compatible with the results of Akahane-Yamada (1999). As Bradlow et al. (1997) indicated, their study did not support Flege’s (1995) SLM. The SLM assumed that improvement in speech production as a consequence of perceptual learning is due to a reorganization of the underlying system used for both speech perception and production and hence, predicts that changes in perception will transfer to changes in production, and these changes will proceed in parallel. However, the SLM does not account for the results of Bradlow et a1. (1997), which indicated the presence of individual variations in learning and the lack of correlation between degrees of learning 35 in the two domains. The specific relationship between production and perception is not clear; they might differ according to sound types, phonetic contexts, methods of data collection and training, and so on. However, most of these studies agree on the following; 1) perception ability and production ability are closely related: though the degree of correlation is not clear, the abilities do not appear to develop independently; 2) training experiments bring apparent improvement to adult learners, either in perception or production, or both. Therefore, it is possible to train adult learners to perceive and/or produce novel phonemes in the L2, though training methods and data collection processes in the above-cited studies varied. The above-reviewed training studies demonstrate the adaptability of the adult perceptual system through training, and there is a certain relationship between perception and production. There have been a number of studies involving various L1 8 and L25, as well as various kinds of segments (vowels and consonants) and suprasegmentals (e. g., Chinese tones); however, very few studies have been conducted in this context to examine geminate consonants in Japanese. The present study took as one of its principal objectives the investigation of the relationship between the acquisition of the perception and the acquisition of the production of geminate consonants, in particular, the contribution of perceptual training to productive ability. 2.4 Making visual information available to L2 learners 2.4.1 Electronic Visual Feedback (EVF) The previous section reviewed some previous laboratory training studies. In this 36 section, alternative ways of improving learners’ perception of geminate consonants will be considered. A growing number of language programs have been utilizing recent developments in the technology available as computer-assisted instruction for perception and pronunciation training to enhance self-monitoring skills by learners. For example, electronic visual feedback (EVF) is a type of computerized training which utilizes software (e.g., Cool Edit by Syntrillium Software, Wavesurfer, and Praat) or hardware (e. g., Computerized Speech Lab (CSL) and Visi-Pitch by Kay Pentax and IBM Speech Viewer) to perform an acoustic analysis of a target sound. Chun (2002) used Speech Tools, downloadable web-based software provided by SIL, for the images of intonation in her book. Speech Analyzer, a component of Speech Tools which offers visual analyses such as waveform, pitch plot, spectrogram, spectrum and various F1 vs. F2 displays. All these programs and devices involve the digitization of speech and its subsequent visual representation on a video screen. Such technology allows learners to measure and visualize intensity, duration, frequency range, etc. of the target sounds. Researchers have reported the effectiveness of such training in improving learners’ perception and production. 2.4.2 Effects of instruction on production of segments and suprasegmental features Molholt (1988, 1990) reported effective use of EVF when teaching difficult consonants and vowels to Chinese ESL students in laboratory sessions using Kay Pentax’ Visi-Pitch and Speech Spectrographic Display (SSD). With Visi-Pitch, students can see simultaneously both an instructor’s and their own spectrograph and waveform of a target sentence to practice. In general, the energy concentration of Chinese consonants has a 37 higher frequency than that of American consonants. The differences in the duration of American /v/ and /b/ are new to Chinese speakers. Molholt (1988) introduced EVF as an effective way to teach segments through the visual representation of frequency (including voicing), aspiration, and duration of such difficult consonants. For example, as for frequency, since Chinese has no voiced stops and only one voiced fricative, the language in general has a higher frequency range than English. Therefore, it is important at the beginning of pronunciation lessons for Chinese students to start building more sensitivity to sounds in the low-frequency range. EVF enables teachers to provide students with visual instruction on how to control frequency, such as in a minimal pair for /s/ and /z/. The visual display provides an objective measure that helps students focus their attention on the exact features of their pronunciation that need to be changed. This technique is also used in teaching vowels. Many researchers have reported that EVF has been used by ESL learners for teaching various aspects of suprasegmental features, such as stress, rhythm, and intonation (e.g., Anderson-Hsieh, 1994, 1996; Chun, 1989, 1998, 2002; de Bot, 1983; Hardison, 2004, 2005b; Levis & Pickering, 2004; Molholt, 1988, 1990; Weltens & de Bot, 1984). Anderson-Hsieh (1994, 1996) also reported advantages of EVF in teaching suprasegmental features. On listening to spoken discourse, her ESL learners only focused on individual lexical items, and they tended to ignore the accompanying rhythm of utterances. In addition, a more serious problem was that they did not notice the importance of perceiving these suprasegmental features, so that they tended to have difficulty producing them. By providing visual information about suprasegmental features in real time, it 38 “2* becomes possible to raise learners’ awareness of such speech characteristics, as well as providing an effective training procedure. For example, one of the typical problems that Japanese ESL learners have is transfer of their L1 rhythm, which is a “mora-timed rhythm,” and their failure to highlight stressed syllables sufficiently because they use pitch accent instead of stress. Anderson-Hsieh (1996) used EVF in her classroom instruction. While EVF provided visualizations of the difference between the native speaker model’s and students’ own speech, the students were encouraged to repeat the words, make greater differentiation in length between stressed and unstressed syllables, and use higher pitch on the stressed syllables. She also reported that EVF was effective not only for word-level stress, but also sentence-level stress and intonation. Levis and Pickering (2004) also reported the use of speech visualization technology in teaching intonation at the discourse level. They claimed that providing practice with discourse-level intonation features is the next step in using technology for the teaching of intonation, so that learners can learn to use intonation for real communicative needs. For teaching prosody, Hardison (2004) also used a computer assisted speech training program by Real-Time Pitch (RTP) along with Kay Pentax Computerized Speech Lab (CSL), which displays simultaneously both of an instructor’s and a learner’s pitch contours for comparison, to teach French prosody to English speakers. In addition to RTP, in Hardison (2005a), Anvil, a web-based annotation tool integrating the video of a speech event with its pitch contour display was used to teach English prosody to Chinese speakers. For learners of Japanese, Landahl, Ziolkowski, Usami and Tunnock (1992) and Hirata (1999, 2004) reported effectiveness in teaching Japanese pitch contours using Visi-Pitch with CSL. 39 2.4.3 Experimental studies of EVF Although their number is limited, several reports on studies of EVF have provided relevant experimental information concerning the number of subjects, statistical analysis of data, etc. de Bot (1983) conducted an experiment to assess the influence of auditory-visual feedback vs. solely auditory feedback on the learning of English intonation. The subjects in the experimental group were presented with a sentence through headphones. The F0 contour, i.e., pitch, of this sentence was plotted on a display, and then they had to imitate the sentence as their own F0 contours appeared on the display for comparison. In the experiment, practice time was another factor: one group received only one training session of 45 minutes while two sessions were provided for the other group. The control group followed the same procedure, but without visual feedback. The result of the experiment showed that visual feedback produced a significant effect on the learning L2 of intonation, whereas practice time was not a critical factor. In other words, optimum imitation of a sentence was reached sooner with auditory-visual feedback than with auditory feedback only. One of the advantages that de Bot pointed out was that the use of this kind of equipment tends to increase the subjects’ motivation to try harder to achieve their learning target. In Hardison (2004), 16 American learners of French received three weeks of training in French prosody using computerized displays of pitch contours as visual feedback. The results revealed significant effects of training in the acquisition of prosody. In addition, generalization to segmental accuracy and novel sentences was also found. Thus, the efl’ect of training is apparent not only in the immediate focus of the visual feedback but also in novel tokens. Hardison’s observation of the learners during sessions 40 suggested that there appeared to be a hierarchy of the learners’ awareness, fi'om more global elements such as the pitch contour, which was the focus of visual feedback, to more local elements such as individual sounds. Further, Hardison (2005a) conducted prosody training with Chinese learners of English using a web-based annotation tool integrating the video of a speech event with visual displays showing the pitch contours and examined the effects of discourse-level input versus sentence-level input. The presence of video was more helpfiil with discourse-level input than with individual sentences. Here again, high variability of the stimulus was effective in combination with auditory and visual input sources. However, as Anderson-Hsieh (1996) pointed out, EVF has some drawbacks, too. The major disadvantage of EVF is that the commercial hardware may be too costly to use in language laboratory settings and for individuals, e. g., it may be too costly to purchase Kay Pentax products. It is also not convenient for use in large classes except for demonstration. However, there are a number of free or low-cost programs available for use as “e-learning” tools (e. g., Praat, SIL Speech Analysis software WaveSurfer). Another point that should be considered is that instructors need to acquire technical knowledge to read some types of visual displays, and their careful control of the information in guiding students is indispensable. As we have seen, there are many studies reporting on the use of EVF to improve learners’ production of segments (vowels and consonants) and of suprasegmental features (e. g., tone, stress, and intonation). This present study examined the possibility of using EVF for enhancing the learning of durational contrasts, mainly related to geminate consonants in Japanese, through the display of waveforms, which make duration visible, 41 as discussed in the following chapter (Experiment II). Further, EVF so far has been used mainly to train learners’ production ability, and few reports have addressed perception improvement. As seen in the previous section, a number of studies concluded that gains from perception training in L2 contrasts can transfer to productive ability. In light of the previous studies of the effects of EVF, the present study explored the potential of visual input for both perception training and production learning of Japanese geminate consonants by American learners of Japanese. 2.5 Research questions and hypotheses The present study was motivated by the following research questions and hypotheses: 1) How do L2 learners perceive and produce geminate consonants? Is there any particular phonetic context of geminate consonants, which makes perception and/or production more difficult for learners? Many previous studies of geminate consonants have been conducted on native and normative speakers, but few studies have focused on the effect of the types of consonants and of phonetic contexts. I hypothesized that the learners’ perception and production would be affected by phonetic environments, and this might be a cause of difficulties in acquiring the contrasts. The present study aimed to find if there are any particularly difficult contexts for learners. While previous studies which examined learners’ perception and production used only words in isolation as stimuli, the present study also examined whether there was any difference between word-level and sentence-level performance, as either of these levels might constitute a difficult context 42 for the accurate perception and/or production of geminate consonants. 2) Are audio-visual instruction and training using visual displays of waveforms of geminate consonants more beneficial than auditory-only information? Coupled with the results of the above research question, this study aimed to find an effective method of perceptual training. The effectiveness of visual input in addition to audio input in perceptual training has been reported by previous studies (e.g., Hardison 2003), so it was suggested that it was also effective in training learners in the perception of geminate consonants. Based on the results of the previous studies, and as a possible application of the theory of episodic memory (Hintzman 1986), I hypothesized that the perceptual training with visual information would also be successful in guiding learners’ attention to critical durational contrasts; thus, the training would be more effective. 3) Does perceptual training improve production ability without any explicit production training? Previous studies have suggested the relationship between perception and production and reported production improvement through perceptual training of /r/ and /l/ contrasts (e. g., Akahane-Yamada et al., 1996). This research hypothesized that the perceptual training with visual input would also lead to development in the ability to produce geminate consonants; therefore there was a close link between perception and production would be demonstrated. The participants in this study were given only perceptual training, but they were given production tests to examine whether their ability to produce geminate consonants improved at the same time. 43 CHAPTER 3 Experiment I 3.1 Overview of the experiments In the present study, a total of four experiments were conducted. The subjects were all native speakers of American English who were studying Japanese at the university level. Experiments 1 through HI concerned the difficulties that the learners encountered with regard to geminate consonants. Experiment I and Experiment HI were conducted to obtain perception and production data, respectively. Considering the learners’ difficulties found in Experiment 1, Experiment H was conducted as a pilot study to test electronic visual input as a method to improve their perception. Based on the findings of these three experiments, a training method was explored, and Experiment IV was conducted to test the effect of the training. 3.2 Objectives of Experiment 1 Many of the previous studies of geminate consonants limited the test items to stops and did not refer to the types of geminates tested and their phonetic contexts. This study aimed to examine if there are any particular phonetic contexts for and identities of geminate consonants that make perception more difficult for learners. Through detailed examination of such conditions, the research question of Experiment I is thus to find what causes the learners’ difficulty in perceiving a particular type of geminate consonant. As discussed above, the closure duration of stop geminate consonants produced by native speakers varies depending on the identity of the consonant itself, and this may affect 44 non-native speakers’ perception. Furthermore, consonant types other than st0ps should be considered to see if there are any particular consonant types and phonetic conditions in which learners find it difficult to distinguish a singleton from a geminate consonant. First, I hypothesized that one of the causes of difficulty in acquiring accurate perception of geminates was related to the sonority of the target segments. Previous studies reported that there was no effect on the perception of Japanese geminates of the following vowel for native speakers (Hirato & Watanabe, 1987). However, there may be some effect on learners’ perception resulting from the identity of the vowel (/a/, /i/, /u/, /e/, or /o/) that follows a geminate consonant (Minagawa & Kiritani, 1996). In order to see if there was any effect of the following vowel on the perception of a geminate consonant, /a/ and /u/ were selected. These two vowels have different levels of sonority according to a scale which is considered universally applicable. Sonority is a ranking on a scale that reflects the degree of openness of the vocal apparatus during production, or the relative amount of energy produced during the sound (Goldsmith, 1990). The sonority hierarchy is generally described as having the organization shown in Figure 3.1. Japanese has a five-vowel system, which consists of /a/, /i/, /u/, /e/, and /o/. Between the two vowels selected for this experiment, /a/ has the highest sonority and /u/ has the lowest sonority. 45 Most sonorous it Vowels low vowels e.g., /a/ mid vowels high vowels e.g. /u/ flaps laterals nasals fricatives affricates Least sonorous stops Figure 3.1. Sonority hierarchy (from Goldsmith, 1990; p.110) Thus, it was also examined how the sonority of the following vowels would affect the learners’ perception of geminate consonants. Since hierarchies do not indicate an actual degree of distance, Selkirk (1984) proposed the quantification of sonority in a Sonority Index as shown in Figure 3 .2. The higher the number is, the greater its sonority. a e,o i,u r 1 m,n s v,z, f,0 b,d,g p,t,k Figure 3.2. Sonority index (Selkirk, 1984, p. 112) In addition, according to this index, in bisyllabic words, the sonority distance between a 46 geminate consonant and the following vowel is closer in a fricative (e. g., in sassa) than when it is a stop (e. g., in sakka). The bigger difference might help to perceptually highlight the boundary between the geminate consonant and the following vowel, aiding speech perception (Kenstowicz, 1994), while the closer difference might obscure the boundary between the consonant and the vowel. Highlighting the boundary between the geminate consonant and the following vowel would make the precise duration of the geminate easier to perceive. Thus, it could be predicted that the learners would have more troubles with perceiving words containing an /ss/ fricative geminate consonants than those containing a /kk/ stop geminate. Another hypothesis is that English, the learners’ L1, may play a role in determining their ability to perceive a geminate consonant in Japanese to some extent. English has a constraint called the Maximum Onset Principle in syllabification; it says that intervocalic consonants should be syllabified into the onset of the second syllable rather than the coda of the first syllable. Thus consonants are preferred in the onset position, while no coda consonants are preferred except in the word final position (Goldsmith, 1990). According to this principle, the preferred syllabification of VCCV is V.CCV rather than the syllabifications VCCV or VCCV. Since the L1 of all the participants of this study is English, they might determine a syllable boundary by following this principle. It could be predicted that if a learner failed to perceive the mora weight (two morae for a vowel plus a geminate consonant) correctly, s/he might have a bias toward assigning the consonant as part of the onset, so that s/he might perceive a geminate consonant as a singleton as CV.CV. If the entire geminate is syllabified as part of the onset, then it cannot have moraic weight (Hayes, 1989), that is, it cannot be a 47 geminate. For ease of exposition, following Kenstowicz’ (1994) and Hayes’ (1989) description of moraic syllable structure, geminate and nongeminate consonants are represents as follow (syllable: o ; mora= [1 ). For example, in a CVC monosyllabic word in English, the vowel in nucleus is assigned one mora and consonants in onset and coda positions are nonmoraic as shown in Figure 3.3. o / I\ p .' . Figure 3. 3. CVC word (e. g., “pet” in English) On the other hand, the first part of a geminate consonant is moraic. For example, the Japanese word /sakka/ containing a geminate consonant is syllabified as a trimoraic bisyllabic word as shown in Figure 3.4. c\ 0 /ii i1 / u s a k a Figure 3. 4. CVC.CV word (e. g. “sakka” in Japanese) It has been suggested that the syllable plays a important role in the processing of speech 48 1‘51" _.___ I sound segments (e. g., Derwing, 1992; Ishikawa, 2002; Mehler et al., 1981; Schiller, Meyer & Levelt, 1997). In Japanese, morae as timing units have to be processed in addition to syllables, which may cause difficulties for L2 learners. As another possible source of difficulty for learners, the question of whether there is any difference in identification accuracy for geminates in words in isolation as opposed to those embedded in sentences was examined. Traditionally, the dominant method for examining learners’ development of the ability to perceive new, difficult nonnative contrasts has been to use a two-alternative identification or discrimination task with minimal pairs in isolation. For example, many studies use a two-altemative identification task; the learners are presented with stimuli consisting of minimal pairs for /l/ and /r/ in isolation (Bradlow et al., 1997; Bradlow et al., 1999; Hardison, 2003; Lively, Logan, & Pisoni, 1993; Pisoni, Lively, & Logan, 1994). With regard to the present study, Minagawa (1996), Hayes (2002), and Min (1993) all used a two-altemative identification task, having the learners identify minimal pairs for single/geminate consonants in isolation. A question raised here is whether a two-altemative forced choice task using minimal pairs in isolation is sufficient. Would the results reflect the overall perception ability of learners in a variety of phonological contexts? In actual conversations, learners have to identify phones or sounds and syllables in a flow of sounds, a longer and more complicated context than that of words in isolation. Is there any difference in learners’ perception at the sentence level? 49 3.3 Method 3.3.1 Participants All participants were undergraduate students at a large university in the US, and all were native speakers of American English whose ages ranged 19 though 22. There were no heritage learners. They were divided into three groups on the basis of the level of the Japanese courses in which they were enrolled at the time of the data collection. The 101 level group was made up of students in first-year Japanese language courses at the university (n=28; 7 females and 21 males). The students enrolled in the 101 level had almost no previous knowledge of Japanese, and it had been about three months since they had begun to study the language. The 201-group of students in the second-year Japanese language courses (n=42, 17 females and 25 males) was composed of those students who had passed the first-year class in Japanese. It was the third semester for these students. The 401-group of students in fourth-year Japanese (n=15; 5 females and 10 males) was in their seventh semester of studying Japanese. All the 401-level learners had studied in Japan for one or two semesters. Generally, it can be said the 401-level students had had more interactions with native Japanese speakers than had the students in the other lower levels, although this did not guarantee that they had become proficient proportionally, since the learning opportunities, motivations, and L2 uses of Japanese varied among the students. In the regular introductory Japanese classes, the first- and second-year courses, the students met for 50 minutes, 5 times per week. There were substantial oral drills and communicative activities in class, and the instructor sometimes corrected the students’ inaccurate pronunciations. However, there was no special training for discriminating particular phonemic contrasts in class. 50 3.3.2 Materials Stimuli consisted of 30 bisyllabic Japanese real words and non-words, which included 12 singletons, with the segmental form /(C)V.CV/, and 12 geminate counterparts, with the segmental form /(C)V.C.CV/, where the first CV and the final CV were identical, and 6 fillers which were bisyllabic words consisting of three morae, but including no geminate consonant. As defined in 2.1.1, a mora is a unit of timing, and each mora has approximately the same duration in production. Long vowels (e.g., /kiite/ ‘listening’) and geminate consonants (e. g., /kitte/ ‘a stamp’) take twice as long to produce as a short vowel or singleton counterpart (e. g. /kite/ ‘coming’). Two tests were conducted; in Test 1, the words were heard in isolation, but in Test 2, the following carrier sentences were used. watashi wa to iimashita I topic that said marker ‘1 said .’ kore wa desu this topic is marker ‘This is .’ Note that as the word-for-word gloss shows, in Japanese word order, the stimuli come in the middle of the sentence, instead of the sentence final position seen in the English translations. The same set of 30 words was used for both tests, but the order of presentation was randomized. As for the target consonants, the stops /t/, /k/ and a fricative /s/ were the same sounds as were used in Hayes’ (2002) study (of. Ch. 2.1.4). In Hayes’ study, learners were 51 presented with a set of two words in each trial, either geminate-geminate, geminate-singleton, or singleton-singleton combination, and the learners were given a same-different discrimination task, i.e., they were asked to determine whether the two words were same or different. Such a discrimination task might not directly reflect learners’ linguistic perception ability since it is rare to encounter a comparison of two sounds in a natural setting. In the present experiment, the learners were presented with only one token in each trial, and they were given an identification task to identify whether ‘ the token was a geminate or a singleton. Previous studies revealed that the duration of the vowel preceding a geminate consonant plays a role as an acoustic cue for native speakers, but not for non-native speakers (Min, 1987), and that variable was therefore excluded from consideration in the present study. To test for the effect of the difference between /CV/ and N / as preceding segments, /sa/ and /a/ were chosen as preceding contexts. The vowels following a geminate consonant were a high vowel /u/ and a low vowel /a/. Since Experiment I did not consider the effect of pitch accent, the accent patterns of all the stimuli were kept consistent; they were H(igh)-L(ow) for singletons (i.e., two mora stimuli) and H-L-L for geminate consonants and long vowels (i.e., three mora stimuli). The stimuli consisted of 14 non-words and 16 real-words. In Table 3.1 below, translations are given for real-words; non-words are indicated with "".’ 52 Table 3.1 Examples of test items by phonological structure Singleton Geminate /t/ /tt/ /s/ /ss/ /k/ /kk/ Structure of items Example V2=lul Example V2=lal (C)V1.tV2 (C)V1t.tV2 satu sattu sata satta ‘volume’ * ‘trouble’ ‘went’ (C)V1.sV2 (C)V1s.sV2 sasu sassu sasa sassa ‘stab’ ‘guess’ * ‘quickly’ (C)V1.kV2 (C)V1k.kV2 saku sakku saka sakka ‘tear’ ‘sack’ ‘refreshments’ ‘composer’ 3.3.3 Procedure The above stimuli were recorded by a female native speaker of Japanese (Tokyo dialect) using a SONY MD MZ-RHlO-S with a SONY ECM-CSIO microphone. The stimuli were presented on the same mini-disk player in a classroom setting. Each recorded token was played only once. A three-alternative forced choice identification procedure was used for the word-level and the sentence-level tests. The learners were asked to choose from one of three choices on their response sheets, which consisted of minimal triplets including a singleton /(C)V.CV/, a geminate consonant /(C)V.C.CV/ and a long vowel /(C)V.V.CV/, where the first CV and the final CV were identical in each item in the triplet (in the case of vowel-initial words, the first vowel was the same in each item). 53 Below are the examples of the test items: The participants heard: /sassu/ Choices given (instructed to circle one): a. sasu b. saasu c. sassu The participants heard: /aka/ Choices given (instructed to circle one): a. aka b. aaka c. akka The answer sheets were collected, and correct answers were tabulated for each participant; if an answer was correct, one point was given, but no point was given for an incorrect answer. Two types of test were given to each participant, words in isolation and words in frame sentences as previously described. There was a total of 24 points in each test. 3.4 Results The first set of data was scored by totaling the number of correct responses as in Figure 3.5. A mixed design 2 x 3 ANOVA was used with test (word level, sentence level) as between-group variable, and with level (101, 201, 401) as within-group variable. Test and level had significant main effects, F,( 1, 102) = 181.15, p = .000; HQ, 102) = 4.085, p = .020. The words in carrier sentences produced more errors (56%) than the words in isolation (75%). The interaction of Test x Level was not significant (F (2, 102) = 1.035, p = .359), which indicated the difficulty of the sentence-level test as opposed to that of the 54 word-level test was compatible throughout the levels. Comparison among the levels in a post-hoc test (Tukey’s HSD) revealed that there was no significant difference between the 101 level (61%) and the 201 level (65%). However, the 401 level (72%) was better than the 101 level at significant levels (p = .015) and the 201 level (p = .045). It is assumed that the 401 level students’ superior performance could also be due to much more exposure to Japanese language through actually studying in Japan. Over the course of Japanese language study and exposure, native English speakers learning Japanese develop increased sensitivity to consonant duration. This result supports the findings of Hayes (2002). Mean % Correct Identification 101 201 401 101 201 401 Word level 71% 75% 80% Sentence level 50% 55% 64% Figure 3. 5. Mean percent correct identification by level of proficiency Next, word-level perception and sentence-level perception of geminate 55 consonants were examined in detail separately. A detailed analysis was made of the data from the 201 level students, which was the largest group of the three. Figure 3.6 shows the data of word-level perception of geminate consonants followed by /a/ (N=6, M=4.57, SD=1.70) and /u/ (N=6, M=3.95, SD=1.63). CI /+a/ I /+u/ Mean % Correct Identification /ss/ /tt/ fkk/ /+a/ 73% 77% 83% /+u/ 51% 64% 73% Figure 3. 6. Mean percent correct identification at word-level by item condition (Japanese 201 students) A two-factor repeated measures AN OVA was conducted. Variables were consonant (/s/, /t/, k/) and vowel (/a/, /u/). Both had significant main effects, Fe (2, 198) = 38.500, p = .000; Fv (1, 99) = 46.718, p = .000. The interaction of Consonant x Vowel was also significant (F (2, 198) = 7.579, p = .001). /s/ consonants were the most difficult 56 to perceive as geminate (73%), while /k/ (83%) was the easiest, and /t/ was in the middle (77%). As for the vowel following a geminate consonant, the geminates preceding /u/ in the final position (63%) were more difficult to perceive than those preceding /a/ (78%). This result may indicate that, as predicted, the sonority of a vowel following a geminate consonant played an important role in the learners’ perception. Thus, of all the geminate consonant types, /ss + u/ was the most difficult for the learners to perceive (51%) while /kk + a/ was the easiest (83%), as shown. In addition, the difference between the effects of /a/ and /u/ was the biggest in following /ss/, while the perception of /tt/ and /kk/ showed almost parallel effects. This result does not support Hayes’ (2002) result, which showed that there was no difference between lkk/ and /ss/, and that /tt/ was the easiest to perceive. This discrepancy may be the result of a difference between a discrimination task, as in Hayes’ study, and the identification task used in the present study. In addition, there was no significant difference between the scores with /sa/ and /a/ as the preceding segments (t (299) = -.218, p = .828). That is, the difficulty in perception was not affected by the difference between preceding segments with consonant + vowel or with vowel only. Figure 3.7, which is also a detailed analysis of 201 students’ data, shows sentence-level perception of geminate consonants followed by /a/ (N=6, M=3. 16, SD=1.37) and /u/ (N=6, M=2.36, SD=1.47). 57 CI /+a/ I /+u/ Mean % Correct Identification /ss/ /tt/ fkk/ /+a/ 59% 62% 75% /+u/ 39% 50% 54% Figure 3. 7. Mean percent correct identification at sentence-level by item condition (Japanese 201 students) Again, a two-factor repeated measures AN OVA was conducted, and variables were consonant (/s/, /t/, k/) and vowel (/a/, /u/). Both had significant main effects, Fe (2, 198) = 32.575, p = .000; Fv (1, 99) = 43. 190, p = .000. As in the word-level perception, the geminates preceding /u/ in the final position (48%) were more difficult to perceive than those preceding /a/ (65%). As for the consonant, /s/ consonants were the most difficult to perceive (49%), while /k/ (65%) was the easiest, and /t/ was in the middle (56%) as in the word-level data. However, the interaction of Consonant x Vowel was not significant (F (2, 198) = 1.740, p = .178). Similar difficulty of /u/ compared to /a/ was observed across the three consonants at the sentence level; /s/+/u/ was the most difficult 58 combination. In the sentence level, too, there was no difference between the scores with /sa/ and /a/ as the preceding segments (t (299) = -. 173 8, p = .083). Again, the difficulty in perception was not affected by the preceding segments at the sentence level. The data in Figures 3.8 and 3.9 show how the learners perceived a geminate consonant when they did not perceive it correctly, at word-level and sentence-level, respectively. In order to enable a closer examination of the most difficult item, i.e., a geminate + /u/, the data are broken down by error pattern. Some of the learners who could not perceive a geminate consonant correctly tended to think that the word contained a long vowel. This tendency was especially strong in the /ss + u/ geminate sequences. 103% Answer Ratio s§§§§§§§§§ /ss/ /tt/ /I I Long vowel E] Sngleton Figure 3. 8. 201 students’ perception of /CC+u/ at word-level 59 I D I Long vowel Singlemn Answer Ratio s§§§§§§§§§§ /ss/ /tt/ /kl 5 3 o < 7. taatsu 8. akku 9. sasu 71:32“) 37> 0 < $79“ 10. assa 11. sakka 12. kakisu if) 0 a a o 71> 73%? 13. akka 14. aku 15. sassu 2160753 32< 30'3" 16. asu 17. kazeki 18. sasa £163“ #113 a a a 19. takata 20. sassa 21. atta tint: é o 3 3790 7‘: 22. attsu 23. saku 24. satsu $900 if < E: ’3 25. atta 26. atsu 27. sattsu $907: 35’) 30’) 28. asa 29. kakute 30. sata El?) 2:" 7b>< “C ‘5 7‘: 146 PartH Please read the following sentences only once and record on the provided cassette tape. 1. watashi wa sata to iimashita 1197': L121: ‘67”: kiwi/\i L7’: 2. kore wa kakute desu I; 109121: 75‘ < 'C'CT 3. watashi wa asa to iimashita 2197: lel: 2593 (Emmi L7: 4. kore wa atta desu ZED/Ii Flo/57.2139” 5. watashi wa atsu to iimashita 1197’: L11 tic/DE lfllfli L7”: 6. kore wa kakisu desu :na @3363 7. watashi wa satsu to iimashita 337’: L11 éotEhl/‘i L7”: 8. kore wa saku desu 13111 E." < ”(‘3’ 9. watashi wa attsu to iimashita 397‘: Lli «Too/Dchflfli L7: 10. kore wa sasa desu :na 3363 11. watashi wa sassa to iimashita 397’: L1: $0 3 Elnlfli L7”: 147 12. kore wa takata desu :iLl‘i 73733731“? 13. watashi wa atta to iimashita 1197’: Li: thatch/uni L7’: 14. kore wa aku desu :ne a