RE-EXAMINING FUNCTIONAL LOAD IN LIGHT OF RATERS’ PERCEPTION OF ERROR GRAVITY IN SECOND LANGUAGE SPEECH By Adam Pfau A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Teaching English to Speakers of Other Languages – Master of Arts 2020 ABSTRACT RE-EXAMINING FUNCTIONAL LOAD IN LIGHT OF RATERS’ PERCEPTION OF ERROR GRAVITY IN SECOND LANGUAGE SPEECH By Adam Pfau The current study looks at the role FL values has concerning the perception that listeners have of the intelligibility and comprehensibility of unclear phonemes in second language speech. A listener’s familiarity with accented speech is also considered. Native English listeners from two separate populations – a student population exposed to a variety of second language speech and a community member population with little exposure to accented speech – were presented with recorded speech samples from L1 Japanese speakers of English. The speech samples contained unclear, or non-native like, examples of two separate phoneme pairs: the /r-l/ consonant contrast and the /s-θ/ contrast. The first carries a very high functional load value, while the second carries a very low value. Listeners responded to a comprehensibility and intelligibility task that contained examples of both target contrasts. The results indicated that the student population found the speech samples more intelligible and easier to comprehend. The /r- l/ contrast, with a higher FL value, was more difficult for them to transcribe and comprehend than the /s-θ/ but these differences were less pronounced for the community member raters. This suggests that, while teachers may be wise to use FL values as a basis for a pronunciation syllabus or instruction, they should be aware that the FL values do not paint the whole picture concerning how listeners respond to errors made in second language speech. ii TABLE OF CONTENTS LIST OF TABLES .........................................................................................................................iv 1. Introduction..................................................................................................................................1 2. Literature Review…….................................................................................................................6 2.1 Intelligibility..................................................................................................................6 2.2 Comprehensibility..........................................................................................................9 2.3 The Current Study…....................................................................................................13 3. Methodology…..........................................................................................................................16 3.1 Participants...................................................................................................................16 3.1.1 L1 Japanese L2 English Speakers….............................................................16 3.1.2 Native English-Speaking Listeners………...................................................17 3.1.2.1 Listener Population #1: Student Listeners……….........................18 3.1.2.2 Listener Population #2: Community Member Listeners………….....................................................................................18 3.2 Materials......................................................................................................................19 3.2.1 Recorded Speech Samples…........................................................................19 3.2.2 FL Calculations.............................................................................................20 3.3 Procedure……….........................................................................................................22 3.3.1 Task #1: Intelligibility...................................................................................22 3.3.2 Task #2: Comprehensibility..........................................................................23 4. Results........................................................................................................................................25 4.1 Data Analysis...............................................................................................................25 4.1.1 Comprehensibility Task……........................................................................25 4.1.3 Intelligibility Task…….................................................................................27 4.1.3 Correlation Between the Two Tasks………….............................................31 5. Discussion..................................................................................................................................33 5.1 Research Question #1..................................................................................................33 5.2 Research Question #2..................................................................................................35 5.3 Research Question #3..................................................................................................37 5.4 Pedagogical Implications…….....................................................................................40 APPENDIX....................................................................................................................................43 BIBLIOGRAPHY .........................................................................................................................45 iii LIST OF TABLES Table 1: Intelligibility Task. This table shows data collected from raters’ on their ability to transcribe target sounds upon hearing them……………………..……………………...………..26 Table 2: Comprehensibility Task, Section One. This table displays data collected from raters on a comprehensibility task using a nine-point scale……………………………………..…………..29 Table 3: Comprehensibility Task, Section Two. This table displays data collected from raters on their perception of difficulty in comprehending target sounds…………………………………30 Table 4: Comprehensibility and Intelligibility Correlation. This table displays correlation data between raters’ intelligibility and comprehensibility results……………………………………31 Table 5: Functional Load Values For Consonant Contrasts. This table shows all of the calculated functional load values for all possible consonant contrasts, as calculated using Phonological CorpusTools……………………………………………………………...…………………….44 iv 1. Introduction Though Adam Brown wryly remarks that “no two linguists will agree on what functional load should measure, or how” (Brown, 1988, 594), the concept of functional load, shortened to FL for future reference, customarily refers to the degree of contrast that exists between units of language, such as phonemes (King, 1967). A phoneme is the abstract concept that represents a sound in a language that can serve a contrastive function; for example, in the minimal pair rock – lock, /r/ and /l/ are contrastive phonemes. In this regard, FL refers to the degrees of importance that various phonemes carry in creating meaningful distinctions within a language, traditionally calculated through a counting of minimal pairs. If FL has implications for speech processing, a high FL should make it difficult for a listener to identify a phoneme that is ambiguous due to either noise, its omission, or it being produced in a non-native like fashion. Though FL needs to be considered alongside other context, such as a listener’s familiarity with different kinds of speech, some research analyzing the role of FL values and the assessment of comprehensibility has been performed that underscore its role as an influencing factor. Catford (1987) and Brown (1991) demonstrated that substitutions involving certain phonemes have more of an impact on a listener’s ability to comprehend the speech if the substitutions concern phonemes with higher FL values. Munro and Derwing (1995) similarly examined the role of FL in comprehensibility judgements in listeners. Their study analyzed how incorrect substitutions of phonemes impacted listeners’ perception of how easily they were able to understand the speech when the substitutions had high or low FL values. Their findings were that incorrect substitutions made with phonemes of higher FL values resulted in lower comprehensibility scores as more and more of the substitutions were integrated into speech samples. 1 The current study takes that concept and applies it to native English speakers’ perceptions of ambiguous English phonemes in second language speech. If FL values truly do determine the difficulty a listener has in guessing an ambiguous phoneme, that would have clear implications for a second language speaker who may struggle to learn certain phonemes not present in their L1. That said, there is a paucity of previous empirical research on FL to demonstrate that the concept truly matters in a practical way, that is, in a way which impacts the everyday communication that second language speakers engage in. Why these implications are important will be discussed below, but first it is worth commenting on this scarcity of research. The reasons for this are two-fold. Firstly, Surendran and Niyogi’s relatively recent 2003 study established how “researchers who want to measure FL often cannot” (p. 7) because of the limitations in concrete definitions of the term as well as the methodological complexity involved in doing so. Mi Oh (2015) touched on this issue, explaining that the lack of many large corpora and the difficulty in processing them contributed to the scarcity of FL research. She also succinctly explains the second reason for a lack of empirical research when she writes that the role of FL in research “has often been considered in an impressionistic way” (p. 154) that fails to provide either a clear practical definition of what FL means or any analysis of it that carries real-world implications. Put simply, the operationalization of FL involves calculations that were previously too complicated to easily research (e.g. Hockett, 1955), and the theory itself was rarely put under the lens of empirical studies. No one would dispute, for instance, the importance of analyzing phonemes from the perspective of phonological processes or grouping phonemes into natural classes based on those processes; how this relates to pronunciation, and the pedagogical implications from that analysis are very clear. Analyzing how phonemes operate within the framework of FL, however, does not supply as clear and obvious real-world implications. What 2 implications are there to our practical understanding of English, for instance, that the phonemic contrast between /p/ and /b/ carries a relatively high FL whereas the contrast between /s/ and /z/ carries a FL that is relatively low? The current study is designed to address this question by situating the concept of FL in an empirical framework in which it can be observed. Multiple phonetic contrasts with widely differing FL values will be analyzed, which allows an evaluation of FL as an objective assessment of the importance of different contrasts by examining how listeners perceive those contrasts. To assess a listener’s perception, the current study will examine the relationship that a phonetic contrast’s FL values has with the intelligibility and comprehensibility of second language speech. The role that these two concepts play in relation to FL will be discussed below, but first one must define these two terms clearly. The current study assumes definitions of these two concepts that have been illustrated and utilized in research by Munro and Derwing (1995, 1997, 2001; Smith, 1992), among others. These studies have defined intelligibility as word or utterance comprehension, and comprehensibility as the effort involved in that comprehension. For the purpose of the current study, intelligibility is defined as the extent to which listeners can correctly decipher the sounds they hear. The current study will assess the intelligibility of second language speech to a listener through the use of a transcription task, as such a task clearly demonstrates whether or not a listener can correctly decipher the sounds they have heard. As for comprehensibility, the notion is defined as the listeners’ perception of the ease, or how much effort, went into deciphering those sounds. Unlike intelligibility, which can be operationalized based on a binary scale in which a listener is either correct or incorrect in their deciphering, comprehensibility, which analyzes effort, allows listeners to perceive different degrees of effort. The current study will thus assess comprehensibility using a 9-point scale. 3 A study that examines intelligibility and comprehensibility based on two separate tasks will allow an analysis of how the two concepts weave into each other, and the role that both play when analyzing FL as an objective assessment of the importance of contrasts. As mentioned above, the notion of FL should theoretically have clear implications for a listener’s ability to assess the comprehensibility and intelligibility of second language speech in a situation in which a learner of English may face difficulty in learning certain phonemes not present in their L1. If a native English listener was presented with any error or ambiguity in hearing a phoneme that carried a high FL value, whether the phoneme was not heard or pronounced wrongly, the FL implications for perception should indicate a greater chance for misidentification compared to a sound with a low FL. This is how one would expect a listener to react if the FL values attached to phonemes dictated the error gravity that a listener attached to phonemes with no additional factors impacting how the listener perceived the second language speech, and if FL actually can be used as an objective assessment of the importance of different contrasts for pedagogical purposes. The current study will adopt the above-mentioned value-based methodological approach in order to assess how native English listeners react to second language speech when pronunciation is unclear. In this context, the present study focuses on two phonetic contrasts with widely different FL values, namely: the American English /r-l/ phonetic contrast and the /s-θ/ contrast produced by L1 Japanese speakers of English for whom /r/, /l/, and /θ/ are not present in their L1. Looking back at the role of FL in intelligibility and comprehensibility, one would assume that a native English listener faced with an ambiguous /r/ or /l/ would experience more difficulty deciphering (or, transcribing, using our methodology) the correct target sound because the high FL values for those phonemes reflect a large number of confusable words. One would 4 also expect that ambiguous examples of either the /s/ or /θ/ phonemes would not impede the intelligibility as much and a listener would have less difficulty in deciphering the correct sound if no other factors (such as contextual clues supplied by a sentence) had an influence. FL should also have implications for listeners’ perception of comprehensibility. For the /r/ or /l/ phonemes, not only should the high FL value for that contrast greatly suggest a problem in terms of the intelligibility of speech, but it should also require the listener to use more effort into deciphering the target sound in that contrast. The ease of deciphering, that is, should be less than if a listener was tasked with deciphering an ambiguous /s/ or /θ/ compared to /r/ - /l/. By focusing on the intersection of these three elements - FL, intelligibility, and comprehensibility – the current study, in sum, will assess the role of FL in how native English listeners perceive those four sounds in words produced by L2 speakers of English (L1 Japanese). Though the focus of this study does not explicitly examine other possible factors (e.g. familiarity with accented speech, attitudes towards accented speech), addressed below, that may have a role in impacting how a listener perceives different phonemes when ambiguity is present, it will allow an examination as to whether or not FL, as an objective assessment of the importance that contrasts play, truly represents the complete picture behind how a listener reacts to those sounds. The following literature review will consider these three concepts through the lens of previous research in order to examine factors important to how listeners perceive second language speech and how those factors, in addition to FL, may also influence the comprehensibility and intelligibility of second language speech. 5 2. Literature Review The current study focuses on examining the real-world implications that FL values have concerning how listeners respond to unclear pronunciations of L2 sounds of varying FL values. That is, how much influence FL has on real-time speech processing, and consequently, how much influence it should have on pedagogical decisions. In doing so, the scarce research that has been done connecting FL to pedagogic or empirical evidence must also be examined, as well as research that analyzes other factors important to how listeners process speech, specifically the familiarity a listener has with accented speech, how meaningful that exposure to second language speech may be, and their attitudes towards accented speech. This is in order to demonstrate FL’s place as one of many factors that influence how listeners respond to errors, or unclear phonemes, in second language speech and also to examine how factors related to familiarity need to be considered as influences on how listeners perceive sounds from accented speech. As the current study examines listener responses to speech through their perception of the comprehensibility and intelligibility of specific sounds, the above-mentioned factors need to be examined in the framework of those two terms. 2.1 Intelligibility Intelligibility and comprehensibility are both factors that refer to how well a listener can understand the speech they are hearing, although the two terms refer to different things. Intelligibility refers to assessing whether or not a listener can understand what a speaker is saying. Unlike comprehensibility, which refers more to how difficult it is for a speaker to understand speech, intelligibility is an assessment that can be measured with a binary scale based 6 on whether a speaker understood speech or not. When looking at the intelligibility of speech, one must consider both the speaker and the listener as important influencing factors. Research has looked at the variety of factors that influence a speaker’s ability to produce intelligible speech (e.g. the speaker’s L1, the age at which they began to acquire their L2) and has shown that these factors can play an influencing role in how intelligible their speech is to listeners (Flege, 2003; Flege, Munro, & MacKay, 1995). However, intelligibility of speech depends very much on a listener’s perception of that speech, and listeners also present many factors that influence how intelligible they perceive different sounds to be. One of these factors concerns a listener’s familiarity with accented speech. Gass and Varonis (1994) examined how various familiarity factors (a listener’s familiarity with a speaker, familiarity with an accent) influenced listeners’ perception of speech intelligibility. Their findings illustrated that a listener’s familiarity with a specific accent improved their ability to understand speech from speakers of that accent even if the listener was unfamiliar with the specific speakers themselves. This finding is in line with other studies (Wingstedt & Schulman, 1984) which produced similar results while examining different accents from speakers of different L1s. Bradlow and Bent (2008) and Sidaris et al (2009) examined one’s familiarity with non- standard accented speech, finding that listeners’ transcriptions of speech samples can improve with repeated exposure to the kind of accented speech found within the samples, even when that task is extended to novel accents that are not realistically found in any language. This demonstrates how a listener’s perception of the intelligibility of second language speech can be greatly influenced by allowing them to familiarize themselves with the accented speech. Even a brief familiarity to a novel accent can improve how intelligible a listener will find speech 7 samples containing that accent after they have been given prior exposure to it. Another study that demonstrates this relationship, Kennedy and Trofimovich (2008), illustrates how connecting certain target sounds to connected speech, thus giving target sounds some semantic context, can improve a listener’s rating for the intelligibility of those target sounds when faced with accented examples of them. In their study, participants found target sounds more intelligible when those sounds were given brief semantic context or put into sentence frames. This shows how even the slightest context or added room for familiarity can influence a listener’s perception of how intelligible speech, and how allowing a listener to even briefly familiarize themselves with an accent can influence how intelligible a listener feels that speech samples in that accent are. If these studies illustrate how listeners, as well as speakers, bring factors with them that can influence their perception of speech intelligibility, one also must consider that a listeners’ attitude towards accented speech will influence how intelligible they perceive samples of accented input. Sheppard, Elliott, and Berk (2017) performed a study in which different listener populations listened to samples of speech from L2 English learners. In their study, one population had both more familiarity with accented speech than the other population and also indicated much more positive attitudes to accented speech than did the other population. The population who had a more positive view of accented speech not only was able to more easily understand the speech samples they heard, but their ability to transcribe the speech (thus, how intelligible the speech was to them) was far more accurate. Other, similar, research has been done that demonstrates a positive correlation between a listener’s positive attitudes towards accented speech and their ability to more accurately transcribe it (Kang & Rubin, 2009; Lindemann, 2002). These studies demonstrate how it is not only a listeners’ familiarity with specific accented speech that can impact their ability to understand speech samples, but also their 8 familiarity with second language speech overall, and also their attitudes regarding accented speech. The factors detailed above concerning familiarity are important to the current study because, when considering if the specific FL values of different phonemes play a central role in influencing how listeners perceive the intelligibility of those phonemes when they are presented with unclear examples of them, one also must consider the familiarity to accented speech and the attitudes of accented speech that different listeners bring with them. This previous research makes it clear that, when comparing the responses of listeners who are assessing the intelligibility of second language speech, one cannot compare the ratings of a listener with little meaningful exposure to accented speech with those of someone who has had considerable exposure. This is similarly true for comparing two listeners with widely different attitudes towards second language speech. The current study builds from this research by examining how two different listener populations perceive the intelligibility of target sounds in second language speech: listeners with a high degree of meaningful exposure to a variety of accents, and those with a far less meaningful exposure. 2.2 Comprehensibility Just as research has analyzed the factors that listeners bring with them that can influence their perception of speech intelligibility, it has also looked at how those factors can influence a listener’s perception of the comprehensibility of the speech, or, how much effort they put into that understanding. Grammatical accuracy, lexical richness, and rate of speech, correlated significantly with comprehensibility scores: Derwing and Munro (2001) and Kang (2010) found that extreme speaking rates (either too fast or slow) corresponded to reduced comprehensibility, while other studies focused on reduced accuracy in word stress placement (Hahn, 2004). Those 9 studies all demonstrate how multiple linguistic factors that speakers may struggle with can impede the ease of comprehension during a speech exchange; the familiarity factors that listeners bring with them, however, have also been examined. Just as with the intelligibility scores, Sheppard, Elliott, and Berk (2017) found that raters of speech samples had indicated that they expended less effort in perceiving second language speech when they had either more exposure to accented speech or more positive attitudes towards second language speech. Gass and Varonis (1982) examined the roles of pronunciation and grammar in native speaker’s listening comprehension of second language speech. The findings from that study indicated that native speaker listeners were unable to separate pronunciation from grammar when assessing the speech samples, and that the listeners had difficulty separating aspects of non-native discourse from one another when trying to assess how a speech sample is more easily comprehended than others. These findings are important to compare to the findings of Gass and Varonis’ 1984 study, in which it was found that a listener’s familiarity with accented speech, with a topic that a speaker was discussing, or familiarity with a specific accent or speaker, all had positive correlations with how easily the listener felt it was to comprehend second language speech. The findings of these two studies, taken together, demonstrate how the comprehensibility of second language speech, from a listener’s perspective, does not depend on a single linguistic feature, and instead relies on a listener’s wider understanding of a speech sample. That is, not all listeners may perceive the comprehensibility of speech samples in the same way because there are no specific linguistic features that determine how difficult listeners find it to understand second language speech. Instead, listeners’ judgement of difficulty relies on a variety of factors, chief among them are the listener’s experience with accented speech or with second language speakers. In the current study, which is 10 analyzing FL as a potential objective assessment of the difficulty of comprehending different phonemes when presented with unclear examples, these studies demonstrate how FL may just be one of multiple factors that impact listeners’ perception. The findings from studies, mentioned before, that suggest FL may be a crucial factor in comprehending L2 speech - Catford (1987), Brown (1991), and Munro & Derwing (1995) - create a compelling case for operationalizing FL values as a tool for instructing second language learners so that they could master the specific phonemes that were found to most hinder native- speaker comprehension of their speech. Derwing and Munro (2005) explain this succinctly when making the case that pronunciation instructors shouldn’t “spend time on something that doesn’t affect the intelligibility or comprehensibility” of second language speech in a listener, and that “the evidence is accumulating…[on] segmentals with a high functional load” (pg. 483). This would be a direct supportive argument for using FL values as an objective assessment for the importance of different phonemes and contrasts in teaching second language speakers, and their studies provide compelling evidence for taking FL seriously as a factor important to how listeners’ perceive errors in second language speech. This idea is also supported by other studies, such as Pye, Ingram, and List (1987), which proposes using FL values and basing pronunciation instruction on the importance of contrastive oppositions instead of just focusing on the frequency with which isolated consonants appear across lexical types. In this view, FL values could potentially assist with focusing phonological development on the contrast. These ideas are supported by researchers A. Brown (1988) and G. Brown (1974), who looked at the relationship between FL and English language teaching to argue that teachers should turn to phonetic oppositions as an important element of pronunciation teaching. From their findings it seems helpful for teachers, instead of focusing on every pronunciation misstep, to make sure that their 11 students are substituting a sound for a difficult phoneme that won’t result in a highly contrastive opposition and thus “are acoustically similar…and bear a low FL” (A. Brown, 1988, pp. 72). A teacher’s attention, and the time saved, could be better spent elsewhere, they argue. Their view is that doing so will give the student an effective enough platform of pronunciation until the low contrastive substitutions that the student is making can be addressed in more advanced stages of teaching. This view is hand-in-hand with the idea that not all phonetic contrasts are equally important, and that there are particular phonetic contrasts that are more crucial when it comes to building a developing phonological system for speech production. The FL values theoretically place concrete numerical values on how important those different contrasts are and how disruptive errors made concerning those contrasts should be to a listener’s comprehensibility. That said, the previous research detailed above also makes it clear that FL may be only one factor out of many when it comes to how listeners perceive the difficulty in comprehending speech samples and how easily they are able to overcome disruptions made by errors with differing FL values. The current study seeks to look at this intersection: the role that FL has in predicting comprehensibility will be examined, as will the familiarity with accented speech that the present study’s two listener populations possess. It will then be possible to examine if the FL values attached to the analyzed phonemes impact the listener’s comprehensibility of speech samples as they should; that is, if a listener perceives spending less effort in comprehending speech involving a consonant contrast with a low FL value compared to a consonant contrast with a high FL value. It will also be possible to examine if a familiarity with accented speech may play a role in how listeners in the current study perceive the speech samples, as both listener populations have different likelihoods of holding frequent meaningful exchanges with second language speakers or speakers who have accents. Looking at how these two factors impact 12 comprehensibility will allow for an analysis of whether the impact of FL values on listener comprehension extends to both listeners with little familiarity and those with high familiarity. 2.3 The Current Study It is within this theoretical framework, looking at the role FL and familiarity with accented speech has to play in a listeners’ perception of the intelligibility and comprehensibility of second language speech, that the current study is operating. For the current study, the English /r-l/ contrast and the /s-θ/ contrast will be examined for two different reasons. First, the FL values for both of their contrasts differ greatly. With a FL value of 0.015 for the /r-l/ contrast, and a 0.002 FL value for the /s-θ/ contrast (the process for obtaining and interpreting these figures will be explained in Section 3.2.2), the difference between the two could not be starker, with the /r-l/ FL value ranking among the highest of all possible consonant contrasts and the /s-θ/ FL value among the lowest. If one looks at those FL values, and assumes no additional factors, then previous research would suggest that listeners should not only have more difficulty in correctly transcribing unclear examples of target sounds featuring a /r/ or /l/ phoneme but should also spend more effort (or have a greater difficulty) in their perception of those target sounds. Choosing these two contrasts will allow an examination of FL as an objective assessment for the importance of different consonant contrasts when it comes to listener comprehensibility and intelligibility scores. Secondly, the current study aims to examine the implications that these calculated FL values has for second language speakers to see to what extent it may be difficult for some listeners to identify the speakers’ speech if they fail to master a contrast between two phonemes with a particularly high FL value. The study also is faced with the need to collect speech samples from participants who will pronounce the target sounds in a non-native like fashion. Previous 13 research has demonstrated the difficulty that Japanese speakers of English have when learning these contrasts (Bannister, Hazan & Iverson, 2005; Yamada, 1992). Research by Lado (1957) and Ritchie (1968) also notes that Japanese speakers struggle with fricatives, especially the /s-θ/ contrast. Lambacher et al. (1988) performed a study on this and found that 25% of their Japanese participants’ mis-identified the /s/ for the /θ/, and vice versa, upon hearing them. Choosing the /r- l/ and /s-θ/ contrasts to analyze, thus, makes it possible for us to compare two different contrasts with widely different FL values to see if there is any difference in the perception of these contrasts among native listeners. It also allows us to see if the differences in those FL values have any implications for second language speakers who may struggle to learn certain contrasts not present in their L1 (in this case, native Japanese speakers). This will allow us to gain some insight into whether or not the FL theory has any implications for second language speakers, and whether or not FL as a theory is an effective measurement or scale for second language speakers to use in determining which phonetic contrasts are most important for them to pronounce comprehensibly. The two contrasts are quite different in regards to the importance that they play in separating utterances in English (as represented by their FL values) and in the differences that should exist between the two in how they impact a listener’s perception of comprehensibility and intelligibility. To determine if there is a real-world difference in the way native speakers respond to non-native like examples of both the /r-l/ and /s-θ/ contrasts, the current study takes an empirical approach to analyzing FL in order to analyze the role it has to play in influencing the perception of comprehensibility and intelligibility of second language speech samples in two different listener populations with varying degrees of familiarity with accented speech. The study was motivated by the following research questions: 14 1. To what extent do functional load statistics realistically reflect native speakers’ perception of error gravity, as represented by an assessment of their comprehensibility of learner productions of English /r/, /l/, /s/, /θ/? 2. How much is the intelligibility of the English produced by Japanese speakers reduced if they do not learn to produce /r/ and /l/ clearly, given the FL value of this contrast in English? 3. Is there a noticeable difference in the ratings between an on-campus student population exposed to a wide range of accents and a population from the general (or off-campus) community not as extensively exposed to accented English? 15 3. Methodology A Qualtrics survey with two tasks was developed: an intelligibility transcription task and a two-part comprehensibility task. Creating this survey involved recruiting L1 Japanese speakers of English to supply speech samples, and two different population pools of native English speakers to assess those speech samples. These speaker populations are described below. 3.1 Participants 3.1.1 L1 Japanese L2 English Speakers Fifteen native Japanese speakers (8 males and 7 females), all undergraduate university students, were recruited to provide the recorded speech samples for ratings. Speakers with lower English proficiency were targeted by seeking Japanese speakers from majors outside of TESOL, the languages, or linguistics. All were between the ages of 18 – 30 and had learned English in Japan during their secondary schooling and lived on campus. Within these fifteen recorded participants, five were selected for use in the rating sessions. Three were chosen based on their lower proficiency and the presence of target errors in their speech concerning the /r/, /l/, /s/, and /θ/ phonemes. One of them was based on their slightly higher proficiency in which target word errors occurred but were not as frequent. A final participant was selected based on his native-like proficiency and lack of any such errors. This allowed the study to view how listeners would react to a range of oral proficiencies while keeping the number of samples to be rated at a reasonable level. 16 3.1.2 Native English-Speaking Raters Native English raters were recruited for two separate population pools of raters: those who lived on a very culturally diverse university campus, and those who lived in a nearby community and were not associated with the university. The goal for this was to see if exposure to accented English produced any noticeable or interesting trend when looking at native English speakers’ perception of second language speech. It would allow us to compare the results between those living in a situation where they were surrounded by second language speakers and accented speech and those who had far less exposure to these elements. All raters were screened on several criteria for inclusion in this study. They identified themselves as speakers of “standard American English”, which this study defined for them as “the English commonly spoken by news anchors in the US”. No raters had taken courses in formal linguistics or phonetics. The rating survey was distributed in person, through networking, and using some online resources. Most raters (in both populations) were recruited by me through face-to-face interaction. For listeners in the student population, this included approaching them in hallways, posting flyers through various campus buildings, and “walking in” classes to ask if anyone would like to participate. Many of them completed their rating sessions in a small room with my presence. Listeners were paid US $10 for their participation. Most community member raters were collected through personal networking. Because I have had multiple part-time retail and food service jobs outside of the campus community, this made it easier for me to find raters in those industries who had no on-campus experience. 17 3.1.2.1 Listener Population #1: Student Raters From the student population who lived on campus, all 28 were undergraduates who occupied either dorm rooms or on-campus apartments, and all were aged 18 – 30 years old. Most of them came from majors within business, science, or athletic health majors (those majors comprising 17 of the 28) and none of them had majors related to languages or linguistics. Fourteen of them were male, and sixteen were female. From the 28 raters in total, all of them spoke the upper Midwest variety of English, with no participants seeming to possess any differing accent. We did not ask them whether they also worked on campus or off-campus, something that could have impacted their exposure to second language speech, but that would also have been an element which could have been included. 3.1.2.2 Listener Population #2: Community Member Raters From the population of adults who did not live on campus, and worked and lived within the outside community, all 26 of them were adults aged 18 – 60 who had lived in the United States their entire lives. Of those 26 selected for participation, 19 of them were aged 30 – 60. Many of them, known to me personally, either worked in the retail sector, the food service sector, or in warehouses (21 of the 26 worked in one of those industries). Their level of meaningful exposure that this population had to accented or second language speech in their day- to-day lives was significantly less than the university student listener population. 18 3.2 Materials 3.2.1 Recorded Speech Samples The speech samples included on the survey for assessment were recorded English sentences spoken by native Japanese speakers. This is because, as mentioned before, the target phonetic contrasts being analyzed - /r - l/ and the /s - θ/ - are contrasts not present in Japanese speech and, thus, Japanese speakers would likely have a greater difficulty in producing these sounds spontaneously. Each Japanese speaker was recorded speaking nine sentences. In the sentences, two minimal pairs for each target contrast - /r/, /l/, /s/, and /θ/ - were featured. The sentences were designed so that either member of the given minimal pairs could reasonably and logically be placed within the sentence frame. This made it so that the target words were placed within connected speech, but also so that the listener had to rely purely on comprehending the sound and word that they heard from the recording. The listener would not be able to predict the target word based on context or logic. The minimal pairs selected for the /r-l/ contrast, for example, were rock/lock and writer/lighter. The sentences used to frame these words were as follows: Adam walked past the writer on his way to the bathroom/Paul saw the lighter when he sat at the table; Amy looked to see if the rock was where she left it/Ben sat the lock down on his chair when he got up. The minimal pairs selected for the /s- θ/ contrast, sick/thick and theme/seam, were used in the sentences: Paul wanted to know how sick the puppy was/Frank could tell that the plant was very thick; Greg pointed out the seam of Lucy’s costume/Paul pointed out the theme of Suzy’s costume. Again, each minimal pair could fit logically within the two sentences provided, and it is not possible to guess which of the words belongs in the sentence by context alone. A ninth 19 sentence was used that did not feature any of the minimal pairs or target contrasts: Bill ate some cake after dinner. This sentence was used to familiarize the raters with each new speaker’s voice. The speech samples were recorded by presenting the native Japanese speakers with a delayed repetition task, a method used in previous research using similar tasks (Flege, Munro, & MacKay, 1995) because of its ability to elicit speech from participants without the participants simply imitating what had been said. I would play an audio recording of a woman’s clear, crisp, voice, reading each sentence naturally. After a slight pause, a male’s voice would prompt the speaker to recite what they had heard. This was done so that the speakers would follow the sentences, but the male’s sudden vocal prompt and the slight pause before it would disrupt the Japanese speakers from rehearsal and imitation. The participants’ productions were acceptable as long as the target word was recalled; other errors such as omissions were not relevant. 3.2.2 FL Calculations The present study uses Phonological CorpusTools 1.4.1 (hereafter, PCT), a freely available and intuitive software developed by the Linguistic Department at the University of British Columbia, that gives users a simple graphical user interface by which they can perform phonological analysis on corpora of transcribed English. With little programming experience required for use, it is designed for phonologists interested in how frequency and usage play a role in phonology and allows for a multitude of related calculations to be run on a selected corpus. The current study will use the software’s ability to calculate the FL of individual pairs of sounds and to supply a numerical value for the contrast that exists between those two phonemes. The program calculates FL through its software by using a change in system entropy equation, a method of measurement used widely today (Surendran, 2003; Wedel, 2013) in FL research, by which a hypothetical merging of a pair of sounds takes place so that calculations can 20 be done on how much energy was lost during that merging. Under this method of measurement, the FL of any two sounds in the entire corpus can be calculated by looking first at the entropy of the system of the corpus. This includes all possible sounds. Next, the two target sounds in questions go through a hypothetical merger in which both sounds (say /p/ and /b/) are merged into a hypothetical /x/. The entropy of the system is re-calculated, showing the amount of energy that was lost in the system because of that hypothetical merger. If there is no change at all, there will be a zero value for the FL of that specific pairing. In using this software for the current study’s FL calculations of phonetic contrasts, the Irvine Phonotactic Online Dictionary (IPhOD) was applied as the corpus for analysis. This corpus, developed by the University of California, is a large collection of over 54,000 English words (with an emphasis on words pulled from spoken language) all written out in phonetic transcription. The entirety of the corpus is transcribed per the conventions of standard American English, is free to use and to download, and is fully functional alongside the PCT software for all of its calculations. This corpus, once loaded into the software, allows users to select any two phonemes at a time (as a pair), and then calculate the FL values that exist between those selected. The current study reports the FL calculations for every possible phonetic contrast between consonants (vowels were excluded from the scope of this study), so as to get a wide and complete look at how the FL calculations for the two target contrasts in focus (/r-l/ and /s-θ/) rank compared to the average FL contrast between pairs as well as all possible FL contrasts. These values can be seen below in Table 1, located in the Appendix. In the table, the FL is identified for every possible consonant contrast in English. As can be seen by these results, the /r-l/ contrast, with a .015 FL value, ranks much higher than the /s-θ/ contrast at a .002. Not only that, but the /r-l/ contrast ranks at the very 21 highest among all possible FL values for consonant contrasts, whereas the /s-θ/ contrast ranks much lower. If, then, FL does play a crucial role in determining how listeners perceive the intelligibility and comprehensibility of second language speech, these FL values should indicate different responses from the listeners for both phonetic contrasts. An unclear, or non-native like, /r/ or /l/, with its relatively high FL, should cause more disruption to a listener’s ability to understand what is being said, which should impede the sound’s intelligibility and cause the listener to use more effort in comprehending the sound. An unclear /s/ or / θ/, however, should not be as disruptive and should require less effort. 3.3 Procedure 3.3.1 Task #1 – Intelligibility The first task for listeners was a transcription task which aimed to determine the intelligibility of specific sounds. In this task, listeners were presented with all nine sentences being spoken by all five Japanese speakers. Recordings were blocked by speaker. After each recording, the sentence frame appeared on the screen without the target word. Listeners were told, in instructions before this task, to view the sentence frames only as connected speech and not as meaningful to the task. This sentence frame was not the intended sentence, but instead whatever the speaker actually said in the audio. Thus, ellipses ([…]) or refrains such as “jumbled speech” were sometimes included within the transcription to represent mumbling. Raters then typed the missing target word that they heard. For instance, given the sentence Paul saw the lighter when he sat at the table, the audio recording of the speaker saying it was supplied in its 22 entirety, but the transcription of the text below that omitted the word lighter and included just a blank space. Because of this study’s focus on those two phonetic contrasts, we were not concerned with whether the entire word was correct, just if the listener was able to comprehend the target sound. The target sound was the sole focus. 3.3.2 Task #2 - Comprehensibility The second task asked two things of the listeners. The listener would re-listen to each of the speech samples, only this time would have two sliding bars to respond to. The first sliding bar would ask listeners, upon hearing the target sound in context again, which sound – a /r/ or /l/, or a /s/ or /θ/ - they believed they heard. As a nine-point scale was used, it allowed listeners to represent the degrees to which they felt the sound represented one of those phonemes. For instance, if listeners were presented with the audio recording of speech such as Ben sat the lock down on his chair when he got up and were given the sentence frame visually, they would be prompted to identify what best represented the initial sound in the missing word. This pointed the attention of the listeners to the target sound in question. On the scale, where an /l/ sound was a one and a /r/ sound was a nine, a one would count both as a perfectly target-like /l/ or a completely incorrect /r/. A five would indicate that the listener could not pick at all between the two sounds. For the /r/, /l/, and /s/, phonemes, a consonant letters r, l, and s, were used to represent the sounds for the raters. For the /θ/ phoneme, a th was used to represent it for the raters in a more understandable way. After that task, listeners would then, for every audio sample rating, indicate how difficult it was for them to select between the two sounds. This was done using a sliding scale that ranged from Very Easy (a one) to Very Difficult (a nine). 23 The end of the survey included a space for listeners to write any comments on the survey, or questions or thoughts. It is unfortunate to indicate, however, that none of the participants, perhaps because most of them completed the task in my presence, included anything meaningful apart from contact information, gratitude for letting them participate, or wishes of luck. 24 4. Results 4.1 Data Analysis Visual inspection of the intelligibility and comprehensibility data involved the following. First, data were removed for three raters because they completed the task in an unusually short amount of time. Most raters completed it in about 20 minutes. This left data from 61 raters for further analysis. Second, data from four raters were removed because they failed to use the entire rating scale. Third, data were removed for three raters as they exhibited numerous outliers. The final number of raters per group was 54: 28 in the student population and 26 community members. 4.1.1 Intelligibility Task The binary rating system for the intelligibility task – whether listeners were able to correctly identify a target sound through their transcription – allowed us to code all listener responses as a one (for correct) or a zero (for incorrect) during analysis. Because we were looking specifically at the target /r/, /l/, /s/, and /θ/ sounds, a response was coded as correct (a one) so long as the listener correctly identified that target sound. The rest of the word they transcribed could have been incorrect, so long as the initial target sound was correctly deciphered. Because of the dichotomous nature of the intelligibility data, the Kuder-Richardson test was used to assess reliability. Results indicated that all tests came back above a reasonable threshold of KR = .7 (.755 and .724 for the /s-θ/ and /r-l/ ratings of the community population, .770 and .834 for the /s-θ/ and /r-l/ ratings of the student population). 25 The number of correct answers was tabulated within each rater group. Table 4 displays the raw total of correct responses that each rater group provided during the transcription task, we well as the percentages that those correct responses represented for that task. Table 1: Intelligibility Task Raw Total (Percentage) of Correct Responses From the Student Population For the /r-l/ Contrast Raw Total (Percentage) of Correct Responses From the Student Population For the /s-θ/ Contrast Raw Total (Percentage) of Correct Responses From the Community Population For the /r-l/ Contrast Raw Total (Percentage) of Correct Responses From the Community Population For the /s-θ/ Contrast 250 (45%) 348 (62%) 204 (39%) 213 (41%) The implications from these results will be discussed further below, but it is clear that both sets of the community members’ total correct answers fell below that of the student population. This falls in line with the previous results found of the comprehensibility scales, in which the student population out-performed the community members’ ratings for both tasks – correctly comprehending the target sounds while also indicating that they did so with less effort. We find something similar here, in which the student population, for both target contrasts, were more likely than the community members to correctly transcribe the correct target sound upon hearing it. Another result is that there is a difference between the gaps that exist within the two listener population’s ratings for the /r-l/ and /s-θ/ comprehensibility ratings. In the student populations ratings, there exists a large twelve percentage point gap between the two, indicating that it was much easier for the student listeners to correctly transcribe the words containing the 26 /s-θ/ contrast. The community members have a smaller gap between their /r-l/ and /s-θ/ ratings – only two percentage points – though they similarly transcribed sentences featuring a /s/ or /θ/ sound correctly more often. In short, the results here demonstrate that all listeners were more likely able to correctly transcribe a sound featuring the /s/ or /θ/ phoneme than they were able to do so with sounds featuring a /r/ or a /l/. Differences between the two rating groups, however, exist. 4.1.2 Comprehensibility Task Due to the fact that the survey data for Task #2 was collected using a nine-point scale in which a low number (such as a one) would always indicate a target-like /l/ sound, or a very non- target-like /r/ sound, and similarly represent a target-like /s/ or non-target-like /θ/, it was necessary to reverse code the data. Because a one on the nine-point scale reflected a target-like /l/ and /s/ sound, and also represented “very easy” on the nine-point scale designed to gauge comprehension difficulty, the data results for sentences which were supposed to be a /r/ or /θ/ sound were reverse coded. Under this logic, a one on the nine-point scale would, for every sound, represent the best example, or most target-like example, of that sound. It would also represent the most ease participants had when comprehending those sounds. For instance, if listeners were presented with a sentence and had to fill in a missing word that included the /r/ sound, and they thought the sound heard represented a very target-like /r/, they would have selected an eight or nine on the nine-point scale to reflect this. Under the reverse coding, a nine would turn into a one, an eight into a two, and etcetera, down the scale. The data would then show that a one was the target-like /r/ sound that the participant indicated. This reverse coding was done for all sentences that featured a /r/ and /θ/ sound, as those sounds always occupied the right-hand side of the original nine-point scale and was done for both rater populations. 27 Since each speaker had four sentences per target phonetic contrast – four sentences with either a /r/ or /l/ sound, and four for the /s/ and /θ/ sounds – this meant that all 54 of the listeners had four numerical values to be totaled for all five of the speakers and for each of the target contrasts. This simplification resulted in the four spreadsheets mentioned above, with all listeners in each one having one large sum of their comprehension ratings per speaker. The next step in analyzing the data was to determine if the inter-rater reliability for each of the spreadsheets justified averaging the results to a single number for easier analysis. SPSS was used to calculate the inter-rater reliability for each spreadsheet (that is, one for each of the two consonant contrasts for both population groups). Using the SPSS Reliability analysis tool with the intraclass correlation coefficient revealed .866 for students’ ratings for the /r-l/ contrast, .939 for students’ ratings of the /s-θ/ contrast, .816 for community member’s ratings of the /r-l/ contrast, and .936 for community member’s ratings of the /s-θ/ contrast. An interesting trend is that, in both populations, the coefficient was slightly lower than that for sentences featuring the /s-θ/ contrast, which indicates that both populations had more varied responses for the /r-l/ contrast, and could potentially indicate that both populations struggled more in coming to easy and uniform responses for ratings concerning the /r/ or /l/ sounds. Because each of the inter-rater reliability statistics allowed for us to average the results of the rating spreadsheets to an easily analyzable number, the spreadsheets had all their ratings averaged into a single representative figure. This was done by averaging all the ratings for each target sound, by each population, producing the results in Table 2 below, which displays the mean comprehensibility results for each rating group. The standard deviation (hereafter, SI), as well as the 95% confidence interval (hereafter, CI), are given as well. The table displays the means for both rating populations and for both target sounds. 28 Table 2: Comprehensibility Task, Section One Sound Contrast Student Ratings of the /r-l/ Contrast Student Ratings of the /s-θ/ Contrast Community Member Ratings of the /r-l/ Contrast Community Member Ratings of the /s-θ/ Contrast Mean 20.91 16.06 23.30 17.73 Standard Deviation (SD) 2.44 95% Confidence Interval (CI) (17.6, 22) 2.11 2.77 1.94 (14.4, 18) (20.5,25.3) (14.9,19.2) As seen above the student comprehensibility ratings, for each target contrast, were lower than their respective contrasts in the community member ratings. Since lower ratings in this scale corresponded to “correctness” in listeners’ ability to correctly determine the target sound from the speech sample, this already demonstrates that the student population determined the correct target sound more often than the community member ratings. The community members, on the other hand, were less likely, concerning both target contrasts, to correctly comprehend the target sounds, with there being a greater gap in performance between the two rating populations concerning the /r-l/ contrast. This result would indicate that, while the community members under-performed the student population at correctly comprehending target sounds in both of the consonant contrasts, they struggled most when it concerned comprehending examples of the /r/ or the /l/. For further analysis, the same process that had been done for the rating task above (averaging all of the ratings into a single analyzable figure after determining the inter-listener reliability) was done on the second task listeners completed in which they had to use a sliding nine-point scale to determine how much effort they put into identifying the target sounds they heard. Because this nine-point scale was uniform across all questions (very easy on the low end, very difficult on the high end), no reverse coding of this scale was required. The inter-listener 29 reliability was calculated for each spreadsheet (again, one per target sound for each population group). Since they all fell above a .700 threshold, it was determined it was appropriate to average these numbers. Averaging the ratings that every listener gave for each speaker on the second sliding nine-point scale in the comprehensibility task, produced the averages seen in Table 3 below. The table displays the average means for all rating groups’ responses to the amount of effort they perceived using in comprehending the different sound contrasts. Table 3: Comprehensibility Task, Section Two Sound Contrasts Student Ratings, /r-l/ Contrast Student Ratings, /s-θ/ Contrast Community Members, /r-l/ Contrast Community Members, /s-θ/ Contrast Means 18.84 15.31 19.87 16.22 SD 2.25 2.03 3.21 2.11 CI (15, 20.1) (12.2, 18.2) (16.2, 21.1) (14.2, 19.1) As can be seen from these results, all of the student rating averages are, again, lower than their counterparts in the community member ratings. This would indicate that the student listeners, generally, indicated that they spent less effort in comprehending the target sounds they were given, whereas the community members had to put forth more effort in comprehending all target contrasts. That said, both ratings groups are consistent in indicating that they put forward less effort in comprehending the target sounds featuring the /s-θ/ contrast as opposed to the /r-l/ contrast. Both rating populations, based on these results, show they perceived putting less effort into comprehending the /s-θ/ sounds and that those sounds were less disruptive to them when faced with possible errors concerning it in second language speech. Thus, the findings for the comprehensibility tasks find that, though differences in degrees exist between both listener groups, all raters put forth more effort in comprehending examples of the /r/ or /l/ and struggled to comprehend those examples more when compared to the /s/ and /θ/. 30 4.1.3 Correlation Between the Two Tasks Further analysis was done to compare the intelligibility ratings alongside the comprehensibility ratings from above. Because this involved dealing with a binary variable – the intelligibility rating score that operated on a correct or incorrect basis – and a continuous scale – the nine-point comprehensibility scale – a point biserial correlation analysis was done on SPSS to determine the relationship that the listeners’ intelligibility ratings had with their comprehension of the two target contrasts. A positive correlation between the two factors can be seen below in Table 5. The table displays the correlation between the intelligibility and comprehensibility tasks, as seen in each rater group. Table 4: Comprehensibility and Intelligibility Correlation Correlation Between Intelligibility and Comprehensibility (Student Ratings of the /r-l/ contrast Correlation Between Intelligibility and Comprehensibility (Student Ratings of the /s-θ/ contrast Correlation Between Intelligibility and Comprehensibility (Community Ratings of the /r-l/ contrast Correlation Between Intelligibility and Comprehensibility (Community Ratings of the /s-θ/ contrast .808 .630 .788 .593 Looking at these results demonstrates that this correlation was stronger, for both populations, with the /r-l/ contrast. There is a clear trend that exists between both listener groups by which the correlation of their intelligibility and ease of comprehensibility are stronger with the /r-l/ contrast than with the /s-θ/. This indicates that, for all populations, their ability to correctly transcribe a target sound concerning the /r-l/ contrast correlated with a higher likelihood that they would be able to more easily identify a target-like example of that sound, 31 whereas with the /s-θ/ their ability to transcribe it correctly did not correlate as strongly with their perception of ease when it comes to comprehending the sound. The correlation between the two tasks in the student listener population, however, is stronger, concerning both target contrasts, than that of the community member listeners. This indicates that the student listeners’ ability to correctly transcribe one of the target sounds had a stronger correlation to their ability to comprehend it more easily. The community member listeners’ ability to correctly transcribe a target sound, however, did not necessarily correlate to being able to comprehend it with greater ease. The implications of these findings, as well as other possible factors that need to be analyzed when noting the differences between the two listener populations, will be discussed below. 32 5. Discussion Limitations in this study’s design make it impossible to discuss the wide range of other factors that could play crucial roles in influencing how listeners respond to errors in second language speech. Among them concern the attitudes that listeners have concerning accented speech as well as the specific acoustic properties that each phoneme possesses that may impact its intelligibility to listeners. A further limitation is that the listeners did not provide any detailed qualitative responses about what they found most difficult when taking this survey. Those responses could have given more insight into what the listeners perceived themselves of having spent the most effort on. Though the results from this study are methodologically interesting, future studies can continue to look further at the role these other factors have on how listeners process errors in second language speech pertaining to different phonemes. 5.1 Research Question #1 RQ1: To what extent do functional load statistics realistically reflect native speakers’ perception of error gravity, as represented by an assessment of their comprehensibility of learner productions of English /r/, /l/, /s/, /θ/? When answering the question as to whether or not the FL values calculated from the phonological corpus realistically reflect a listeners’ perception of second language speech errors, it would appear that the answer is nuanced. When looking at the results of the comprehensibility ratings given for each community, it is noticeable that, for each population, the sentences containing the /s/ or /θ/ sounds were both easier for listeners to correctly identify the target sounds and also sentences that required less effort from the listeners in doing so. Since these results were seen in both rating populations, the students and the community members, this could support the idea that unclear sounds featuring 33 either a /r/ or /l/, a consonant contrast with a comparatively high FL value, were more disruptive to the native English listeners and required more effort from them as evidenced in their comprehensibility scores. Where the results display some nuance is when looking at the differences that exist between the two rating populations. The results from the first section in the comprehensibility task demonstrates that the community member rating population found comprehending words featuring sounds from both target contrasts more difficult than did raters in the student population. They also indicated, in the second section of the comprehensibility task, that they perceived themselves as having spent more effort for both target contrasts. These results indicate that the disruptive weight that the FL values attach to the different phonemes being analyzed seem to have some observable influence on how listeners perceive second language speech, but that this influence needs to be considered alongside other factors. Because the ratings given for both populations had a consistent gap between them, with the students consistently over- performing the community member population, it is clear that FL’s relevance for listener perception does not accurately portray the full picture of what shapes that perception. One possible explanation for the differences in these results would be the increased exposure to second language speech that the student population had. As studies detailed earlier (Gass & Varonis, 1982; Kennedy & Trofimovich, 2008; Sheppard et al., 2017) have found, familiarity to second language speech patterns will increase the ease with which a listener can comprehend speech samples of accented speech. Seeing as the community member rating population had little meaningful exposure to this type of speech, this factor would explain why they faced greater difficulty in their comprehension of the speech samples and why they perceived themselves as having spent more energy in their comprehension. The increased 34 exposure to second language speech that the student population experienced could have made it easier for them to comprehend the speech samples. Multiple studies (Bradlow & Bent, 2008; Sidaris et al, 2009) have demonstrated how comprehensibility scores can increase for listeners when they have experienced a greater familiarity to second language speech, even when they are assessing speech samples from multiple different accents or L1s. It thus is possible that the student rating population was more easily able to comprehend the speech samples of L1 Japanese speakers of English even if the majority of their exposure to second language speech came from speakers with different L1’s. Further studies need to be developed to explore this possibility in greater detail, but the potential of a familiarity influence could explain why the FL values did not represent the full scope of influence on the listeners’ ratings. 5.2 Research Question #2 RQ2: How much does the intelligibility of the English produced by Japanese speakers suffer if they do not learn to produce /r/ and /l/ clearly, given the FL value of this contrast in English? When looking at how much Japanese second language speakers of English would suffer if they did not learn a highly contrastive pair of sounds in English – like the /r-l/ - compared to a contrast with a much lower FL value, it would seem to depend. The intelligibility scores for words featuring an unclear /r/ or /l/ sounds demonstrate that both rating populations were unable to transcribe the words with a greater frequency than words featuring either a /s/ or a /θ/. These results indicate that the intelligibility of a Japanese speaker of English would suffer more if they failed to master learning a highly contrastive phonetic opposition – such as the /r-l/ contrast – than if they failed to master a phonetic contrast with lower FL values. A limitation of this study is that only two consonant contrasts were analyzed – though, two with very different FL values. It thus cannot be stated if these findings would play out for all possible consonant contrasts or if 35 there are differences among contrasts with high FL values. It is clear, however, concerning at least the /r-l/ and /s-θ/, as analyzed in the current study, that the Japanese speakers’ intelligibility, as perceived by the native English speakers, suffered more when the speakers failed to supply target-like examples of the /r/ or /l/ sound. This finding would support FL’s role as an influencing factor in intelligibility, as it would demonstrate how the unclear consonants with higher FL values impeded intelligibility more often than those with low FL values. An important factor to consider alongside these results, though, is how both rating populations had significantly different gaps in their intelligibility scores between the two target contrasts. The results from the student rating population indicate that they were able transcribe words containing a /s/ or a /θ/ much more easily than words with a /r/ or /l/ - a twelve percentage point gap between the two sounds. This demonstrates that their perception of the intelligibility of those two sounds was clearly different, and that the /s-θ/ examples very clearly did not reduce the intelligibility of the speech as much to them as the other examples. With the community member ratings, however, there was only a small two-point gap between their intelligibility scores for the two contrasts. This demonstrates that, unlike the student rating population, these raters did not perceive as large of a difference in how the intelligibility of the speech samples was reduced when an unclear /r/ or /l/ was used versus an unclear /s/ or /θ/. These results could indicate that an increased familiarity with accented speech could make the influence of FL on listener perception more pronounced, whereas a lower degree of exposure to accented speech may perhaps make all accented sounds so difficult for the listener that the influence of FL is less pronounced. Further results from the point biserial correlation demonstrate that both rating populations had stronger correlations between intelligibility and comprehensibility scores for the 36 /r-l/ sounds when compared to the /s/ or /θ/ sounds. That is, both rating groups’ ability to accurately transcribe a target sound for the /r/ or /l/ correlated more strongly to their ability to more easily comprehend it later on and use less effort in doing so. This could also indicate that a native English speaker’s perception of Japanese speakers’ second language speech would suffer more if they failed to master a /r/ or a /l/ sound when compared to a /s/ or /θ/, as there would be a greater likelihood, as seen in the stronger correlation, between a decrease in intelligibility and a decrease in comprehensibility. That is, if a Japanese speaker of English’s intelligibility suffered when failing to use a target-like example of a /r/ or /l/, then there would be a greater likelihood that the listeners’ ability to easily comprehend the word they were saying would decrease and that the listener would perceive themselves as having spent more effort in that comprehension. The likelihood of this correlation is decreased for words containing unclear /s/ or /θ/ sounds. This is a trend that exists in both listening populations, though to a lesser extent for the student raters. Thus, the student listeners may, because of the potential benefits that they have when it concerns an increased familiarity to second language speech, not perceive as much reduction to the intelligibility of a Japanese speaker’s speech when errors are made. That said, because the trend exists in both rating groups, the intelligibility of Japanese speakers’ second language speech would seem to suffer more if they failed to learn a highly contrastive phonetic contrast, like the /r-l/, among listeners with both great and little exposure to second language speech. 5.3 Research Question #3 RQ3: Is there a noticeable difference in the ratings between an on-campus student population exposed to a wide range of accents and an older population from the general (or off-campus) community not as extensively exposed to accented English? Looking at the differences that exist between the two rating populations helps to expand the discussion already proposed earlier – that FL values make up only a part of the complete 37 picture concerning how listeners comprehend speech – and also allows a discussion concerning the role that familiarity may play. The most glaring implication from the data is that the student population of listeners, in both comprehensibility rating tasks and in the intelligibility rating task, out-performed the community member population in that they were more likely to transcribe the correct target sounds, they were more likely to indicate the target sound upon hearing them in context, and they indicated that they spent less effort in comprehending those sounds. Without exception, the student population thus seemed to have an easier time undergoing all tasks when faced with the second language English speech they were presented. One of the key factors that may possibly play a role in explaining how listeners responded to the speech samples is familiarity, as evidenced in previous studies such as Gass and Varonis (1984). Because the student population comes from an on-campus environment in which there is a large number of International students and diverse accented speech, this population has a much greater exposure to meaningful exchanges – not only sales encounters or brief encounters – with accented speech than does the community population. The community members, after all, live removed from any sort of campus environment in which International students converge and have far fewer meaningful exchanges with accented speech in their environment. It is true that most of the community members gathered were employed in service-type jobs – food service, retail service – and it is possible that they have frequent interaction with customers who have accented speech, but that is far different from the extended and more meaningful interaction that students on-campus at a diverse university are likely to have with accented speech. If this factor did indeed play any role in influencing the ratings gathered, then this would indicate that, while FL certainly does seem to bear out in reality-based ratings, one also has to consider how familiarized the listener is with accented speech. It is possible that one’s familiarity 38 with accented speech, or lack thereof, can influence how one perceives second language speech and the gravity of errors made concerning different consonants in a way that influences the impact that the FL values carry. The most obvious evidence for this from this study concerns how the student population, in all tasks, found the speech samples more intelligible and easier to comprehend. Their ratings still indicated that consonants with higher FL values were more difficult for them to transcribe and comprehend, but that phenomenon was less pronounced than in the results from the community member ratings. This could indicate that the influence that the FL values have on intelligibility and comprehensibility can be more noticeable when there is a lack of exposure to accented or second language speech. That exposure may make the influence of the FL values less noticeable for listeners with an increased familiarity with L2 speakers, though it does not erase its influence. This would explain why the FL values were shown to be related to the ratings of both listener groups, those with and those without that familiarity. Of course, a large limitation to this study is that it only presented native English listeners with one sort of accented speech – Japanese second language speakers of English. Future studies concerning the same topic have much to explore regarding how much the type of accented speech matters; that is, if exposure to any accented speech will help improve one’s ratings on an intelligibility and comprehensibility task (as evidenced in Bradlow & Bent, 2008), or if it is important that one is familiarized with the specific accent they are being asked to rate. It is impossible for us to say if the community members’ ratings were lower, across the board, because they were lacking as much experience in meaningful exchanges with any sort of accented speech when compared to the student population, or if their ratings would have improved for this specific study only if those meaningful exchanges with accented speech were with Japanese second language speakers of English. Looking at different examples of accented 39 speech in future studies, and how native English listeners respond to a multitude of different accents, could help answer some of these questions in a way that this study cannot. 5.4 Pedagogical Implications The pedagogical implications from these findings are limited by the fact that this study only chose to look at the differences that existed between two consonant contrasts - the /r-l/ and the /s-θ/ contrast – and that potential other factors, excluding familiarity with accented speech, that may potentially go into influencing comprehension were not considered in this study. Those factors would be necessary in truly looking at the pedagogical implications brought up here, but it is still worthy to consider them in the context of the limited results from this study. The main implication from this study concerns FL as an objective assessment of the importance of consonant contrasts based solely on frequency, and whether that assessment alone is worth basing a syllabus on. The results from this study seem to indicate that building a syllabus for pronunciation based on this objective assessment of frequency may be useful for teachers as a way to determine which consonants are most important for the L2 speakers to learn, but that teachers need to be aware of the limitations of that method. The findings in this study do suggest that building a syllabus on the basis of FL values would aid Japanese speakers of English in improving the intelligibility of their speech – their intelligibility suffered more when they failed to use a target-like /r/ or /l/ than sounds with lower FL values. Native English speakers’ ability to more easily comprehend examples of words with an unclear /s/ or /θ/, and spend less work in doing so, would also be evidence that a teacher’s time would be better spent on the /r/ or /l/ sounds based on the higher FL values that those sounds carry. That said, the findings from this study also make it clear that other factors need to be considered as well. 40 There is also some evidence that creating a syllabus based solely on the objective assessment of the FL values may not adequately reflect the reality of how all native English speakers will comprehend a second language speaker’s speech, and that doing so may deprive some contrasts with very low FL values from receiving attention in favor of contrasts with very high FL values, when the reality seems to indicate that doing so may not always be helpful. When it came to the intelligibility scores, after all, the student rating population found that unclear /r/ or /l/ sounds impacted intelligibility much more than unclear /s/ or / θ/ sounds, whereas the community member raters found almost no difference between the two contrasts. This would seem to indicate that, while building a syllabus on FL values may benefit Japanese speakers of English when they interact with students or listeners who have a greater familiarity with second language speech, the benefit might not be as strongly felt when interacting with listeners who have less familiarity. While teachers may, in some cases, spend their time more wisely by focusing on consonants with higher FL values, and thus consonants that may disrupt comprehensibility and intelligibility more starkly, they also need to consider that the benefits from doing so may only be seen strongly when their students interact with certain types of listeners. The fact that listeners with little exposure to second language speech may not be as forgiving when faced with errors concerning consonants with low FL values, when compared to listeners with greater familiarity, suggests that teachers cannot rely exclusively on the FL values in all cases. Though it is impossible to say, from this study, which factors a teacher should truly consider when developing a syllabus in addition to the FL values, the findings from this study still indicate that using FL as a sole methodology for developing a syllabus or a textbook ignores the reality of how different types of listeners will respond to second language speech. 41 A second pedagogical implication of this would be for teachers to be careful concerning how to select textbooks and paying attention to how those textbooks select which consonant contrasts to focus their attention on. A limitation of this study is that it looked only at English, one language. Not all languages have the same available corpora, and teachers would be wise to seek out how their textbooks chose to determine what to base their content on. As mentioned above, any textbook that focuses their attention to consonant instruction on frequency, or the objective assessment of that frequency through FL values, alone may not truly be developing their materials in a manner that is reflected in reality, especially the reality of speaking with accented speech to populations who do not have as much meaningful exposure with accented speech. This issue could matter more or less to different kind of learners, depending on the type of native English speaking populations that they foresee interacting the most with (a second language speaking college professor versus a service worker, for instance), but is worth considering when developing instructional materials for second language learners of English. It is also worth considering, when teaching, that the materials you are using to teacher your students consonants may be devised in a way that assumes other comprehension factors will not be an influence on listeners’ comprehension of certain contrasts or of the error gravity of errors made using those contrasts. If this study, in combination with studies such as Gass, S. & Varonis, E.M. (1984), can have any implications drawn from it, it would seem that other factors, such as a listeners’ familiarity with second language speech, can influence listener comprehension and that FL alone as a device for developing instructional materials does not paint a complete picture of how listeners will respond to that speech. 42 APPENDIX 43 Table 5: Functional Load Values For Consonant Contrasts 44 BIBLIOGRAPHY 45 BIBLIOGRAPHY Bradlow, A. R., & Bent, T. (2008). Perceptual adaptation to nonnative speech. Cognition, 106, 707-729. Brown, A. (1988). FL and the teaching of pronunciation. TESOL Quarterly, 22(4), 593-606. doi:10.2307/3587258 Brown, A. (1991). Functional load and the teaching of pronunciation (pp. 379-397). Teaching English pronunciation: A book of readings. New York: Routledge. Brown, G. (1974). Practical phonetics and phonology. In J.P.B. Allen & S.P. Corder (Eds.) The Edinburgh course in applied linguistics: Vol. 3. Techniques in applied linguistics (pp. 24- 58). Oxford: Oxford University Press. Catford, J.C (1987). Phonetics and the teaching of pronunciation: A systemic description of English phonology. In J. Morley (Ed.), Current perspectives on pronunciation: Practices anchored in theory (pp. 87-100). Alexandria, VA: TESOL. Derwing, T., and Munro, M. J. (1997) Accent, intelligibility, and comprehensibility. Studies in Second Language Acquisition, 45, 73-97. Derwing, T.M. and Munro, M.J. (2005), Second language accent and pronunciation teaching: A research-based approach. TESOL Quarterly, 39, 379-397. doi:10.2307/3588486 Flege, J. (2003) Assessing constraints on second-language segmental production and perception. A. Meyer & N. Schiller (Eds.), Phonetics and phonology in language comprehension and production, differences and similarities. Berlin: Mouton de Gruyter. Flege, J., Munro, M. J., & MacKay, I. R. A. (1995). Effects of age of second-language learning on the production of English consonants. Speech Communication, 40, 467-491. 46 Flege, J.E., MacKay, I.R.A. & Munro, M. J. (1995). Factors affecting strength of perceived foreign accent in a second language. The Journal of the Acoustical Society of America, 97, 3125–3134. Gass, S. M. & Varonis, E.M. (1984). The effect of familiarity on the comprehensibility of nonnative speech. Language Learning, 34, 65-89. Gass, S. M., & Varonis, E.M. (1994). Input, interaction, and second language production. Studies in Second Language Acquisition, 16, 283-302. Hahn, L.D. (2004), Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38, 201-223. doi:10.2307/3588378 Hall, Kathleen Currie, Blake Allen, Michael Fry, Khia Johnson, Roger Lo, Scott Mackie, and Michael McAuliffe. (2017). Phonological CorpusTools, Version 1.3. [Computer program]. Available from PCT GitHub page. Hockett, C.F. (1955). A manual of phonology. Memoir of International Journal of American Linguistics, No. 11. Baltimore: Waverly Press. Iverson, P., Hazan, V., & Bannister, K. (2005). Phonetic training with acoustic cue manipulation: A comparison of methods for teaching English /r/-/l/ to Japanese adults. Journal of the Acoustical Society of America, 118, 3267-3278.10.1121/1.2062307. Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Language Journal, 94, 554-566. doi:10.1111/j.1540-4781.2010.01091.x Kang, O., & Rubin, D. L. (2009). Reverse linguistic stereotyping: Measuring the effect of listener expectations on speech evaluation. Journal of Language and Social Psychology, 28(4), 441–456. https://doi.org/10.1177/0261927X09341950 Kennedy, S., & Trofimovich, P. (2008). Intelligibility, comprehensibility, and accentedness of L2 speech: The role of listener experience and semantic context. Canadian Modern Language Review, 64(3), 459-489. King, R. D. (1967). A measure for FL. Studia Linguistics, 21, 1–14. Lado, R. (1957) Linguistics across cultures: Applied linguistics and language teachers. Ann Arbor: University of Michigan Press. Lambacher S., Martens W., Kakehi K., Marasinghe C., & Molholt G. (2005). The effects of identification training on the identification and production of English vowels by native speakers of Japanese. Applied Psycholinguistics, 26, 227–247. Lindemann, Stephanie. (2002). Listening with an attitude: A model of native-speaker 47 comprehension of non-native speakers in the United States. Language in Society 31, 419-441. Mi Oh, Y., Coupé, C., Marsico, E., & Pellegrino, F. (2015). Bridging phonological system and lexicon: Insights from a corpus study of FL. Journal of Phonetics, 53, 153-176. 10.1016/j.wocn.2015.08.003. Munro, M. J., & Derwing, T.M. (1995). Foreign accent, comprehensibility and intelligibility in the speech of second language learners. Language Learning, 45, 73-97. Munro, M. J., & Derwing, T. M. (2001). Modeling perceptions of the accentedness and comprehensibility of L2 speech: The role of speaking rate. Studies in Second Language Acquisition, 23(4), 451-468. Pye, C., Ingram, D. & List, H. (1987). A comparison of initial consonant acquisition in English and Quiché. In K. Nelson & A. van Kleeck (Eds.), Children's language, Vol. 6, 175-190. Hillsdale, NJ: Erlbaum. Ritchie, William C. (1968). On the explanation of phonic interference. Language Learning 18, 183-197. http://hdl.handle.net/2027.42/98385 Sheppard, B., Elliott, N., & Baese-Berk, M. (2017). Comprehensibility and intelligibility of international student speech: Comparing perceptions of university EAP instructors and content faculty. Journal of English for Academic Purposes, 26, 42-51. 10.1016/j.jeap.2017.01.006. Sidaras, S. K., Alexander, J. E., & Nygaard, L. C. (2009). Perceptual learning of systematic variation in Spanish-accented speech. The Journal of the Acoustical Society of America, 125(5), 3306-3316. Smith, L.E. (1992). Spread of English and issues of intelligibility. In B.B. Kachru (Ed.), The other tongue: English across cultures (pp. 148-161). Urbana: University of Illinois Press. Surendran, D. & P, Niyogi. 2003. Measuring the FL of phonological contrasts. Tech. Rep. No. TR-2003-12. Chicago: University of Chicago. Vaden, K. I., Halpin, H. R., Hickok, G. S. (2009). Irvine Phonotactic Online Dictionary, Version 2.0. [Data file]. Available from http://www.iphod.com. Varonis, E., & Gass, S. M. (1982). The comprehensibility of non-native speech. Studies in Second Language Acquisition, 4(2), 114-136. doi:10.1017/S027226310000437X Wedel, A., Kaplan, A. & Jackson, S. (2013). High FL inhibits phonological contrast loss: A corpus study. Cognition, 128, 179-86. Wingstedt, M. & Schulman, R. (1984). Comprehension of foreign accents. Phonologica 1984: 48 Proceedings of the 5th International Phonology Meeting. Cambridge: Cambridge University Press. Yamada, R.A., & Tohkura, Y (1992). The effects of experimental variables on the perception of American English /r/ and /l/ by Japanese listeners. Perception & Psychophysics, 52, 376– 392. https://doi.org/10.3758/BF03206698 49