EXPLORING THE EFFECT OF RELATIVE TIMING OF TARGET AND BACKGROUND WORDS ON SPEECH UNDERSTANDING WITH AND WITHOUT A BACKGROUND RHYTHMIC CONTEXT By Toni Marie Smith A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Psychology—Master of Arts 2021 ABSTRACT EXPLORING THE EFFECT OF RELATIVE TIMING OF TARGET AND BACKGROUND WORDS ON SPEECH UNDERSTANDING WITH AND WITHOUT A BACKGROUND RHYTHMIC CONTEXT By Toni Marie Smith Using the Coordinate Response Measure (CRM) paradigm, recognition of target speech in the presence of competing speech has been shown to depend upon both the rhythmic context of target and background speech and fundamental frequency differences between the speakers (McAuley et al., 2021). In the present study, two experiments examined the effects of relative timing of target and background key words and the presence or absence of a background rhythmic context on target recognition using the same male talker for both target and background sentences. Exp. 1 varied the onset asynchrony between target and background key words when background rhythmic context was removed (i.e., the background consisted only of the competing key words) and Exp. 2 manipulated the rhythm of background speech leading up to key words, but left the key words intact with an onset asynchrony of ±50ms. Exp. 1 revealed an asymmetric U-shaped performance curve where (1) target recognition improved with increasing deviation of background key words from the expected onset timing of target keywords, and (2) target key words were better recognized when they began prior to the onset of background key words, compared to after. With the reintroduction of the background context in Exp. 2, performance was reduced to chance both when the background rhythm was intact and when it was rhythmically irregular, suggesting that listeners were unable to distinguish target and background sentences and could not develop expectations for target keyword timing TABLE OF CONTENTS LIST OF FIGURES........................................................................................................................iv INTRODUCTION.…………...…………………………………………………….......................1 GENERAL METHODS…………………………………………………………….....................10 EXPERIMENT 1………………………………………………………………….......................12 Methods……………………………………………………………………......................12 Participants and Design…………………………………………........................12 Stimuli……………………………........……...……….........................................12 Procedure…………………………………………………...................................12 Results……………………………………………………………………........................13 Discussion…………………………………………………………………......................15 EXPERIMENT 2………………………………………………………………….......................16 Methods…………………………………………………………………..........................16 Participants and Design…………………………………………........................16 Stimuli…………………………………………………………............................16 Procedure………………………………………………………….......................17 Results……………………………………………………………………........................18 Discussion…………………………………………………………………......................20 GENERAL DISCUSSION………………………………………………………........................22 APPENDIX………………………………………………………………………........................28 REFERENCES……………………………………………………………………......................34 iii LIST OF FIGURES Figure 1. Results for Experiment 1: Proportion correct target Color and Number recognition for each value of onset asynchrony (0, ±25ms, ±50ms, ±100ms, ±200ms). Negative values indicate that the onset of the background Color appeared early relative to the onset of the target Color (background leading), while positive values indicate that the onset of the background Color appeared late relative to the target Color (background lagging). Error bars represent standard error……………………………………………............................................................................29 Figure 2. Results for Experiment 1: Proportion of Color (Panel A) and Number (Panel B) intrusions for each value of onset asynchrony (0, ±25ms, ±50ms, ±100ms, ±200ms). Negative values indicate that the onset of the background Color appeared early relative to the onset of the target Color (background leading), while positive values indicate that the onset of the background Color appeared late relative to the target Color (background lagging). Error bars represent standard error………………………………………….............................................….30 Figure 3. Examples of rhythm unaltered and altered versions of a spoken CRM sentence of the form “Ready [call sign] go to [color] [number] now.’ The top panel (Panel A) shows the sample sentence where the rhythm is unaltered (m = 0), as represented by the bars equally spaced in time. The middle and bottom panels show how the same time points in the speech signal are shifted by the rhythm transformation (m = 0.75, maximally altered condition) for two different phases (Panel B, phi = 5π/4; Panel C, phi = π/2)……………………………………………………………………………………….............31 Figure 4. Results for Experiment 2: Proportion correct target Color and Number recognition for each level of background rhythm alteration (m = 0.0, 0.50, 0.25, 0.75). Black squares with a solid line represent performance when the background Color/Number was leading (OA = -50ms) and open circles with a dashed line represent performance when the target Color/Number was leading (OA = +50ms). Error bars represent standard error…………………………………………………………………............................................32 Figure 5. Results for Experiment 2: Proportion Color (panel A) and Number (Panel B) intrusions for each level of background rhythm alteration (m = 0.0, 0.50, 0.25, 0.75). The solid grey lines represent the chance of choosing the background Color or Number at random when both background and target Colors and Both numbers are heard. Black squares with a solid line represent performance when the background Color/Number was leading (OA = -50ms) and open circles with a dashed line represent performance when the target Color/Number was leading (OA = +50ms). Error bars represent standard error……………………………………………………………………………………................33 iv INTRODUCTION Understanding speech-in-noise is crucial for effective communication within the hearing population, since many of our social interactions occur in noisy environments such as family dinner tables, restaurants, busy street sides, or (more recently) lag-y video conference calls. Despite the importance of this ability in navigating the social world, the myriad of factors that underlie speech-in-noise perception are not well understood. There are a number of stimulus factors that have been shown to influence the recognition of speech-in-noise, including the type of background sounds (e.g., speech vs. non-speech) (Desjardins & Doherty, 2013), the number of background talkers for speech backgrounds (Rosen et al., 2013) and general cues to perceptual segregation, such as fundamental frequency differences between target and background talkers (Brokx & Nootboom, 1982; Assmann & Summerfield, 1989, 1990). A growing body of recent investigations has also implicated speech rhythm as having an influence on speech recognition in noise (e.g. Aubanel, Davis, & Kim, 2016; Wang et al., 2018; McAuley, Shen, Dec, & Kidd, 2020; McAuley, Shen, Smith & Kidd, 2021). The role of rhythm and timing in speech-in-noise perception is the focus of this thesis. One mechanism by which speech rhythm has been hypothesized to contribute to successful perception of speech amidst noise is through Selective Entrainment, whereby temporal regularities in to-be-attended target speech guide attention in a manner that facilitates selective attention to the target (McAuley et al., 2020). The Selective Entrainment Hypothesis is based in Dynamic Attending Theory (DAT), which posits that there exist internal attentional oscillations that are entrained by external rhythms (Jones, 1976; Jones & Boltz, 1989, Large & Jones, 1999; McAuley et al., 2006). This entrainment leads to changes in the phase and period of internal (attentional) rhythms that align peaks in attentional energy to time points where 1 relevant stimulus events are expected to occur. In turn, this alignment of attentional peaks with stimulus events is hypothesized to facilitate perception. Accordingly, behavioral evidence has shown a perceptual advantage for stimuli arriving at expected (compared to unexpected) times (Barnes & Jones, 2000; Jones et al., 2002; Miller, Carlson, & McAuley, 2013). While the timing of natural speech is not strictly periodic, it is characterized by temporal patterns that give rise to the perception of regularity, which can be used to guide temporal expectations for the occurrence of upcoming speech sounds in the speech stream. Timing at the level of syllables in particular (about 3-9 Hz) has been shown to contribute to the quasi-rhythmic nature of speech across a number of languages (Dauer, 1983; Tilsen & Arvaniti, 2013), and is also likely to be important for early language acquisition (Goswami, 2019). Additionally, a body of behavioral evidence suggests not only that these regularities are present, but also that they facilitate the perception of speech. For instance, understanding speech in noise is adversely affected by the disruption of speech rhythm. Speech that has been isochronously re-timed is more intelligible in noise than speech that has been anisochronously re-timed (Aubanel, Davis, & Kim, 2016). Regularity in speech also appears to build temporal expectations over time; within the same sentence, later-occurring words are better recognized than earlier-occurring words in multi-talker babble, but not when the speech has been altered to be artificially irregular (Wang et al., 2018). Earlier rhythmic context within a sentence can also influence the perception of ambiguous syllable organization occurring later in the sentence, suggesting that such temporal expectations can influence word segmentation (Dilley & McAuley, 2008; Morrill et al., 2014; Baese-Berk et al., 2019). Parallel to these behavioral investigations into the role of rhythm in speech perception, evidence has also accrued that speech rhythms serve to entrain neural activity. Specifically, 2 cortical neural oscillations have been shown to phase-lock to the temporal envelope of speech, and it has been argued that this neural entrainment to the speech envelope is used as a mechanism for parsing connected speech into smaller units (Ghitza, 2011; Giraud & Poeppel, 2012; Ding et al., 2016; Riecke et al., 2018). Such neural synchronization is also modulated by attention; in a multi-talker context, selective attention to target speech enhances neural entrainment to the target speech envelope (Ding & Simon, 2012, 2014; Golumbic et al., 2013). Moreover, disruption of neural synchronization to target speech using trans-cranial alternating current stimulation modulates target speech recognition, suggesting that the entrainment of neural activity to the speech envelope plays a causal role in the understanding of target speech presented with a competing talker (Riecke et al., 2018). A recent investigation by McAuley and colleagues (2020) found that alterations to the natural rhythm of both target and background speech influences target speech recognition, but in opposite ways. Their experiments used the Coordinate Response Measure (CRM) paradigm, where speech stimuli all have the same form: “Ready [Call Sign] go to [Color] [Number] now” (Bolia et al., 2000). Each sentence has one of eight Call Signs (e.g. “Baron,” “Charlie,” “Eagle”), one of four Colors (“Red,” “Blue,” “White,” or “Green”) and one of eight numbers (1-8, excluding “seven” in order to maintain a constant number of syllables from trial to trial). Participants are told to attend to the target sentence, which always contains the call sign “Baron,” and report the Color and Number that appear within that sentence. When the target sentence is presented amidst other sentences, the Call signs, Colors, and Numbers that appear in the background sentences are always different than that of the target. The natural rhythm of target speech and the natural rhythm of background speech were independently altered to make the speech increasingly rhythmically irregular. 3 The authors found that increasing alterations to the natural rhythm of target speech led to poorer recognition of target Color and Number (a target rhythm effect). Conversely, increasing alteration of background speech rhythm led to better target Color and Number recognition (a background rhythm affect) as well as a reduction in intrusion errors (responses coming from the background) (McAuley, Shen, Dec, & Kidd, 2020). These results are consistent with the Selective Entrainment Hypothesis, which predicts both the target and background rhythm effects. If selective entrainment to target speech plays a role in the perception of target speech in noise, disruption of the natural rhythm of target speech should impede target speech recognition. Without a stable speech rhythm to guide temporal expectations, there is predicted to be a misalignment of attentional focus and information-carrying events in the target speech. In contrast, a disruption of the natural rhythm of background speech should enhance target recognition. This is hypothesized to occur because of a reduction of competing entrainment to the background rhythm, thereby strengthening entrainment by the target speech rhythm. In essence, without a regular background rhythm it is less likely that attention would be accidentally entrained by the background rhythm at the expense of entrainment to the target. It would also reduce the likelihood of intrusions from the background due to inadvertent entrainment to the background speech rhythm. In a set of follow-up studies using backgrounds that varied in their similarity to the target, the target rhythm effect was found to be robust to disparity-based segregation (McAuley, Shen, Smith, & Kidd, 2021). The background rhythm effect, however, was only observed with a background talker of the same sex as the target. When the background was of the opposite sex, or when it contained amplitude envelope information but was removed of semantic content or temporal fine structure, the background rhythm effect was absent. This suggests that the 4 background rhythm effect is not driven by amplitude envelope-based rhythm alone and may be reduced when there are strong cues for perceptual segregation or when the background is unintelligible (McAuley, Shen, Smith, & Kidd, 2021). Although McAuley and colleagues provide evidence for both target and background rhythm effects that are in line with the Selective Entrainment Hypothesis, the work raises some outstanding questions. A key assumption in prior experiments is that the rhythmic context leading up to the Color and Number words is what guides expectations for the timing of those words; moreover, it is the disruption of that rhythmic context (and, in effect, disruption of temporal expectations) that drives changes in performance. However, McAuley and colleagues applied the rhythm alteration to the entirety of the sentences- not just the rhythmic context leading up to Color and Number alone. As a result, the rhythm alteration additionally alters the relative timing of the onsets of background and target Color and Number words. To account for this, McAuley and colleagues applied the rhythm alteration using a range of different phases, so the relative timing of background and target Color and Number words varied from trial to trial – thus averaging out any systematic effect of relative onset timing. Nonetheless, it is possible that separate from an effect of rhythmic context (entrainment), onset asynchrony between background and target Color and Number words differentially affects target recognition. Support for a role of onset asynchrony between to-be-attended and to-be-ignored speech material comes from a number of sources. First, asynchrony of onsets is a strong cue for segregation in both speech and non-speech sounds (Bregman, 1990). Second, performance on the CRM paradigm used by McAuley et al. (2020, 2021) has been shown to be better for sentences presented asynchronously than for sentences presented synchronously (Humes, Kidd, & Fogerty, 2017). The latter finding, however, relates to the asynchrony of sentence onsets rather 5 than for the onsets of the Color/Number words within the sentences that participants are specifically listening for. Other research on the effects of onset asynchrony has tended to focus on the fusion of tones or individual speech sounds, rather than whole words or phrases embedded within sentences. For example, onset-based grouping effects on the number of sounds heard have been observed when vowel formants that have a common or uncommon onset time are presented (Darwin, 1981). Onset asynchronies can also prevent a mistuned frequency component from influencing the perceived pitch of a harmonic complex (Darwin & Ciocca, 1992), and can abolish the fusion of short noise bursts across frequency and location (Turgeon, Bregman, & Roberts, 2005). To directly address the issue of onset asynchrony in the present investigation, Experiment 1 used a modified version of the CRM paradigm to examine how differing amounts of onset asynchrony between target and background Color and Number words impacts target recognition in the absence of background rhythmic context. Here, listeners were presented with a single target sentence and an isolated set of background Color/Number words spoken by the same talker. The onset of background Color and Number relative to the target Color and Number was varied. In this scenario, it is expected that the listener will form temporal expectations for the onset of the target Color/Number word pair based on the natural spoken rhythm of the target sentence. If this is the case, correct Color/Number recognition should improve with increasing asynchrony of Color/Number onsets, regardless of the direction of asynchrony. That is, according to a DAT-based Temporal Expectation Hypothesis, performance would be worst when target and background Color and Number are synchronous, because in this condition the timing of both the target and background Color/Number pairs will be consistent with the temporal expectations set by the target speech rhythm and so there will be less information to distinguish 6 between the two. As onset asynchronies become larger, it is expected that target speech recognition will improve because the to-be-ignored background color and number will not align with the temporal expectations established by the target speech rhythm. An alternative possibility is that there may simply be a bias to attend to whichever Color/Number pair appears first, regardless of temporal expectations (i.e. a Temporal Order Hypothesis). Distractors that are early or late with respect to a target stimulus have been shown to interfere differently with task performance. For example, when synchronizing with a moving target dot that has a sinusoidal trajectory, having additional moving dots on the screen only interferes with synchronization when the distractors lead in phase (Booth & Elliot, 2015). There is also evidence from memory research that distractor-response binding effects in retrieval-based probe responding appear when the distractor occurs before the target stimulus that must be responded to, but not when the distractor occurs after the target (Frings & Moeller, 2012). Generally, it seems that early distractors are in a sense more distracting than late distractors. In the absence of an influence of temporal expectations coming from the target speech, this would produce a linear effect where the more the background leads the target, the more it interferes and the worse performance becomes; conversely, the more the background lags the target, the less it interferes and the better performance becomes. It is also possible that the data support both the Temporal Expectation and Temporal Order hypotheses. If a temporal order bias interacts with temporal expectations, this would result in the beneficial effect of increasing onset asynchrony on performance being attenuated when the background Color/Number onset leads the target (compared to when it lags the target). A final possibility is that temporal expectations for Color/Number onset and the temporal order of target 7 and background color and number play no role in correct target recognition, resulting in no difference in performance as a function of onset asynchrony. A second factor of interest in the present investigation concerns the effects of the rhythm alteration on the intelligibility of the Color and Number words. McAuley et al. (2020) established that the rhythm alteration does not make individual target sentences any less intelligible when presented in isolation in quiet listening conditions (McAuley, Shen, Dec, & Kidd, 2020). However, it is possible that the rhythm alteration degrades the intelligibility of the Color and Number words in more difficult listening situations (i.e., in the presence of noise or other competing sounds). The purpose of Experiment 2 was thus to investigate the effects of rhythm alteration, while controlling for Color/Number intelligibility as well as Color/Number onset asynchrony. Toward this end, Experiment 2 focused on the background rhythm effect, namely the improvement in target color and number recognition found when applying the rhythm alteration to a to-be-ignored background talker. One motivation for focusing on the background rhythm effect, rather than the target rhythm effect, is that previous work has found that the background rhythm effect is notably absent when the background is unintelligible vocoded speech; this suggests that the background rhythm effect might depend in part on background speech intelligibility (McAuley, Shen, Smith, & Kidd, 2021). In Experiment 2, the rhythm alteration was applied only to the beginning of the background sentence in order to manipulate temporal expectations, while Color and Number words remained unaltered (i.e., intact). In addition, the onset of background relative to target Color and Number was controlled across rhythm alteration conditions. If the background rhythm effect is indeed due primarily to reduced inadvertent entrainment to the background speech (and thus a facilitation of selective entrainment to the target speech), the manipulation of the 8 background rhythmic context alone should be sufficient to produce the effect when the intelligibility and timing of background Color and Number words is held constant. 9 GENERAL METHODS Speech stimuli were taken from the Coordinate Response Measure (CRM) Corpus (Bolia et al., 2000). Sentences from this corpus all have the form “Ready [Call Sign] go to [Color] [Number] now.” Each sentence contains one of eight Call Signs (e.g. “Baron,” “Charlie,” “Eagle”), one of four Colors (“Red,” “Blue,” “White,” or “Green”) and one of eight numbers (1- 8, excluding “seven” in order to maintain a constant number of syllables from trial to trial). The Call Signs, Colors, and Numbers that appear in the target and background were always different. Both target and background sentences came from the same male talker (talker #1). The target sentence always contained the Call Sign “Baron,” which acted as a cue for which sentence to attend to. The target sentence was always a complete sentence, while the background consisted of just the portion beginning with the Color and Number in Experiment 1 and the complete background sentence in Experiment 2. Both target and background sentences were presented binaurally at 65 dB SPL, using Senheiser HD 280 Pro over-the-ear headphones at a sampling rate of 22050 Hz. On each trial, participants reported the Color and Number they heard in the target sentence by selecting a square on a computer screen with the corresponding combination of Color and Number, presented via a custom MATLAB program. The study took place over two sessions. Experiment 1 and Experiment 2 were administered in separate sessions on different days. The order of the experiments was counterbalanced and randomly assigned for each participant, in order to control for carryover or practice effects from one experiment to the next. Each session lasted approximately 1.5 hours. At the beginning of Session 1, participants were given a brief familiarization task. In one block of 32 trials, participants were presented with intact CRM target sentences in quiet (with no background) and were instructed to report the Color and Number that appeared within each 10 sentence. This familiarization task acted as a means to screen participants for obvious task- relevant hearing difficulties or a failure to understand instructions. Additionally, the task served to acclimate participants to the procedure prior to experiencing the more difficult experimental conditions. The exclusion criterion for use of a participant’s experimental data was performance below 90% on the familiarization task. No participant’s scores fell below this level and none were excluded on this basis. At the end of both sessions, participants completed surveys about the strategies that they used during the experiment as well as any factors that might have influenced their performance. Additionally, at the end of Session 1 participants completed a survey about their personal and musical background. At the end of Session 2 participants completed a short form of the Speech and Spatial Qualities of Hearing (SSQ) questionnaire (Noble et al., 2013) and the Noise Exposure Questionnaire (NEQ) (Johnson et al., 2017). The SSQ includes questions about one’s subjective ease of sound segregation, listening to speech in noise, and locating sounds (e.g. “You are talking with one other person and there is a TV on in the same room. Without turning the TV down, can you follow what the other person you’re talking to says?”). Participants respond using a 0 to 10 scale where 0 means “Not at all” and 10 means “Perfectly” (with the exception of two questions which use different anchors) (Noble et al., 2013). The NEQ indexes the frequency and length of exposure of individuals to both occupational and non-occupational noise (e.g. use of power tools, attending loud events such as concerts, driving loud vehicles), which can be used to calculate a measure of annual noise exposure that is indicative of risk for noise-induced hearing loss. 11 EXPERIMENT 1 Methods Participants and Design. 18 participants (15 female; age range: 19-26, M = 21.6, SD = 2.0) were recruited from the Michigan State University community and were compensated at a rate of $10/h in the form of digital Amazon gift cards. All participants were native speakers of American English and had self-reported normal hearing. Onset asynchrony of CRM key words (Color and Number) was manipulated within subjects (OA = 0ms, ±25ms, ±50ms, ±100ms, ±200ms). Stimuli. In Experiment 1, full target sentences were presented with a single partial background sentence. The beginning of each background sentence was removed and replaced with silence so that only the phrase “[Color] [Number] now” was heard. The onset of the background key word pair relative to the target key word pair was manipulated. Specifically, the onset asynchrony was defined as the timing of the onset of the background Color word relative to the onset of the target Color word. Procedure. The experiment was conducted in a single test session of 15 experimental blocks. Each block consisted of 36 trials. Each of the nine OA conditions (±0ms, ±25ms, ±50ms, ±100ms, ±200ms) was presented 4 times per block for a total of 60 presentations over the 15 blocks. For each consecutive subset of 9 trials within a single block, each OA condition occurred once in randomized order. A mandatory break was provided about halfway through the experiment (after 8 blocks), and participants were encouraged to take breaks as needed after each block. 12 Results For each participant, the proportion of correct responses (trials where both the correct Color and correct Number were reported) was calculated separately at each level of onset asynchrony (Figure 1). Consistent with the target rhythm guiding temporal expectations about the onset timing of the target Color and Number, there was a significant quadratic trend as a function of OA, F(1, 17) = 158.09, p < 0.001, η2 = 0.90, where performance was worst for an onset asynchrony of -50ms, and improved with increasing deviations from this value in either direction.. There was additionally a significant linear trend as a function of OA, F(1, 17) = 76.89, p < 0.001, η2 = 0.82, where performance was overall better when the background was lagging compared to when the background was leading. Next, we considered the types of errors that were made by participants. For each participant the proportion of Color intrusions (trials where the background Color was reported instead of the target Color) (Figure 2A) and Number intrusions (trials where the background Number was reported instead of the target Number) (Figure 2B) were calculated separately at each level of onset asynchrony. There was a significant quadratic trend for Color intrusions, F(1, 17) = 43.81, p < 0.001, η2 = 0.72), and Number intrusions, F(1, 17) = 90.38, p < 0.001, η2 = 0.84. In each case, the pattern of results is the opposite that of proportion correct scores. That is to say, intrusion errors were most frequent for an onset asynchrony of -50ms, and were reduced with increasing deviations from this value in either direction.There was again a significant linear trend in the opposite direction of proportion correct scores for both Color intrusions, F(1, 17) = 78.18, p < 0.001, η2 = 0.82, and Number intrusions, F(1, 17) = 71.67, p < 0.001, η2 = 0.81. The background onset occurring first leads to more intrusions compared to when the background onset occurs second. 13 We additionally examined the relationship between several individual difference characteristics and performance. To get one overall measure of performance, proportion correct scores were averaged across all OA conditions individually for each participant. Pearson correlations were run between these overall scores and self-reported years of formal music training, average SSQ scores (measuring self-reported hearing abilities), and annual noise exposure (ANE). SSQ and ANE scores were unavailable for 2 participants because they did not complete the second session of the study where the corresponding surveys were administered, leaving n = 16 for those analyses. SSQ scores were calculated by averaging across responses to questions (M = 7.15, SD = 1.15). Two of the sixteen questions were excluded from the average because the response scale anchors were presented incorrectly for those two questions for the majority of participants. ANE was calculated based on the procedure outlined in Johnson et al. (2017) where the minimum possible value was 64 and higher values mean greater noise exposure (M = 70.94, SD = 3.19). Out of 18 participants, 12 reported having formal music training. Including those who did not receive any formal music training, the mean number of years of formal training for this group was 4.67 (SD = 4.85). No correlation was found between task performance and years of formal music training, r = -0.001, p = 0.99. There was also no relationship between performance and SSQ scores, r = 0.09, p = 0.73, or between performance and ANE, r = 0.19, p = 0.48. 14 Discussion The purpose of Experiment 1 was to determine how the relative timing of background and target Color and Number (absent the preceding background rhythmic context) impacts target speech recognition. Consistent with the Temporal Expectation Hypothesis, results show that increasing asynchrony of background Color/Number onset with respect to the expected temporal onset of the target Color/Number generally leads to improved performance and a reduction of intrusion errors. This U-shaped curve, however, was slightly left-shifted such that performance was worst (i.e. the background Color/Number pair was more intrusive) when the background Color/Number onset led the target by a small amount. Separately, there was a tendency to select whichever Color/Number begins first, thus also providing support for the Temporal Order Hypothesis. This interaction between temporal expectations and temporal order effects leads to a pattern of performance such that at larger asynchronies where the background leads, there is an improvement in target recognition attributable to a violation of temporal expectations by the background, but the improvement is attenuated by the detrimental effect of the background occurring first. 15 EXPERIMENT 2 Previous work has demonstrated a background rhythm effect such that increasing alteration of the natural rhythm of background speech enhances target speech recognition (McAuley et al., 2020). If selective entrainment is driving the background rhythm effect it is expected that the rhythmic context leading up to the Color and Number is what builds temporal expectations, which are unaffected by the timing of the Color and Number words themselves. Toward this end, Experiment 2 applied the rhythm alteration only to the beginning of the background sentence leading up to the Color and Number (which will be referred to as the “precursor”). This manipulation should interfere with temporal expectations, without differentially interfering with the intelligibility or timing of background key words between rhythm conditions. This will ensure that any effects of background rhythm alteration are not due to reductions in the intelligibility of the background Color and Number associated with the rhythm manipulation. Methods Participants and Design. 16 participants that took part in Experiment 1 (14 female; age range: 19-26, M = 21.50, SD = 1.90), also participated in Experiment 2 and were compensated for their participation at a rate of $10/h in the form of digital Amazon gift cards. Background speech rhythm alteration was manipulated within subjects (m = 0, 0.25, 0.50, 0.75), as was onset asynchrony (OA = +50ms, -50ms). Stimuli. On each trial, target CRM sentences were presented with complete background CRM sentences spoken by the same talker (talker #1). In some conditions, the natural rhythm of background speech was disrupted. This disruption was achieved by temporally expanding and contracting the speech in a sinusoidal fashion. In order to preserve the intelligiblity of the 16 background Color and Number, only the precursor (“Ready [Call Sign] go to”) was altered while the key words (“[Color] [Number] now.”) remained intact. Alterations to the original CRM sentences were made using Praat’s Pitch Synchronous Overlap and Add (PSOLA) algorithm, according to a compression ratio (CR) given by CR(t) = 1 + m sin(2πfmt +ϕ) (Fig. 3). The rate of rhythm alteration, fm, was set to 1Hz, based on McAuley et al (2020), who showed that this value preserved speech intelligibility in quiet while still providing a strong percept of timing variation. The degree of rhythm alteration is determined by the modulation depth, m, which took on values of either 0.0, 0.25, 0.50, or 0.75. The initial phase of alteration, ϕ, was randomly assigned for each trial within a block from a set of equally probable values (0, π/4, 2π/4, 3π/4, 4π/4, 5π/4, 6π/4, and 7π/4) so that different parts of each sentence were expanded or contracted. Onset asynchronies (background color word onset relative to target color word onset) were set to +50ms or -50ms with equal probability. Both target leading (+50ms) and background leading (- 50ms) conditions were included and randomly varied from trial to trial so that participants could not simply distinguish between target and background Color and Number words based on which pair appeared first. This also provided a test of how the order of Color/Number onsets influences target recognition in the presence of rhythmic contexts in both the target and the background. Procedure. The experiment was conducted in a single test session of 16 experimental blocks. Each block consisted of 32 trials with the same level of rhythm alteration. Each of the four levels of rhythm alteration occured four times total, once within each set of 4 blocks; the order of rhythm alteration levels was counterbalanced across sets. Additionally, the entire sequence of 16 blocks was presented in one of four orders which were counterbalanced across participants. A mandatory break was provided after 8 blocks, and participants were encouraged to take breaks as needed. 17 Results For each participant, the proportion of correct responses (trials where both the correct Color and correct Number were reported) were calculated separately at each level of background rhythm alteration (m = 0.0, 0.25, 0.50, 0.75) and onset asynchrony (OA = +50ms, -50ms) (Figure 4). Compared to the equivalent OA conditions from Experiment 1 where the background contained only the Color and Number words, performance was overall much worse in Experiment 2 where the full background sentence was present. This suggests that the presence of a background rhythm may have disrupted temporal expectations for the target rhythm, leaving participants at a disadvantage for identifying (based on timing) which Color/Number pair came from the target and which came from the background. A 4 x 2 repeated measures ANOVA revealed no significant main effect of background rhythm alteration, F(3, 45) = 2.10, p = 0.11, η2 = 0.12, or onset asynchrony, F(1, 15) = 1.53, p = 0.235, η2 = 0.093, and no significant interaction, F(3, 45) = 0.80, p = 0.50, η2 = 0.051. Similar to Experiment 1, for each participant the proportion of Color intrusions (Fig. 5A) and Number intrusions (Fig. 5B) were calculated separately at each level of background rhythm alteration (m = 0.0, 0.25, 0.50, 0.75) and onset asynchrony (OA = +50ms, -50ms). There was a main effect of onset asynchrony for both Color, F(1, 15) = 4.56, p = 0.05, η2 = 0.23, and Number, F(1, 15) = 8.35, p = 0.01, η2 = 0.36, intrusions, but the direction of the effects were reversed: there were more Color intrusions when the target was leading (compared to when the background was leading) and more Number intrusions when the background was leading (compared to when the target was leading). There was no main effect of background rhythm alteration, nor was there an interaction between onset asynchrony and background rhythm alteration. 18 Intrusion errors accounted for nearly every incorrect trial, and averaged across background rhythm alteration and onset asynchrony participants selected the target word in their response (Color: M = 0.50, SD = 0.13; Number: M = 0.48, SD = 0.14) as much of the time as they selected the word coming from the background (Color: M = 0.49, SD = 0.13; Number: M = 0.52, SD = 0.14) for both Color, t(15) = 0.42, p = 0.68, 95% CI [-0.04, 0.05], and Number, t(15) = -1.43, p = 0.17, 95% CI [-0.09, 0.02]. If we assume that participants heard both Colors and both Numbers on each trial, chance performance would be 50% for either Color or Number. Notably, the proportion of correct Color responses and the proportion of correct Number responses were approximately 0.50 in all conditions, as were the proportions of Color intrusions and Number intrusions. As in Experiment 1, we examined the relationship between overall performance and formal music training (M = 4.81, SD = 5.06), SSQ scores (M = 7.15, SD = 1.15), and ANE (M = 70.94, SD = 3.19). Proportion correct scores were averaged across both rhythm alteration and OA conditions individually for each participant. No correlation was found between performance and years of formal music training, r = 0.05, p = 0.86. There was also no relationship between performance and SSQ scores, r = -0.22, p = 0.41, or between performance and ANE, r = 0.13, p = 0.62. 19 Discussion It was expected that with increasing alteration of the natural speech rhythm of the background precursor, recognition of target speech would improve. This would have replicated the previously observed background rhythm effect, while controlling the timing and intelligbility of Color and Number words. Instead, however, a background rhythm effect was not observed. Rhythm alteration of the background precursor had no effect on the proportion of correct responses or on the proportion of intrusion errors. One interpretation of this result is that the previous observations of the background rhythm effect were not attributable to background speech rhythm specifically, but instead were a result of incidental changes in the relative timing of target/background key words or in background key word intelligibility. However, another possiblity that seems more likely in the present context is that aside from leaving Color and Number words intact and controlling onset asynchrony across rhythm conditions, the stimuli in this experiment differed in another critical way from the prior work of McAuley et al (2020, 2021). Namely, using the same CRM talker as both the target and the background talker eliminated fundamental frequency (F0) cues or other cues of speech quality that could have been used to initially segregate the target and background into separate auditory streams. Without this initial segregation that could be used to differentiate between and selectively track the speech rhythm of one sentence over the other, the combined target and background sentences might have been percieved as one auditory object with a jumbled, irregular rhythm. Such a jumbling of rhythms would be reminiscient of how a familiar melody interleaved with a rhythmically irregular tone sequence is not recognized until the interleaving tones are in a sufficiently different pitch range to be percieved as a separate auditory stream (Dowling, 1973). This interpretation is suggested by the result that both proportion correct scores 20 and the proportion of intrusions computed separately for Color and Number were close to chance level, potentially indicating that participants could not differentiate between the target and background and had to guess which Color and Number came from which sentence. 21 GENERAL DISCUSSION Prior experiments have established both a target rhythm effect and a background rhythm effect using the CRM paradigm that are consistent with a selective entrainment hypothesis. With the target rhythm effect, increasing alteration of the natural speech rhythm of a to-be-attended target sentence worsens recognition of target speech. With the background rhythm effect, increasing alteration of the natural rhythm of a distracting background talker (or talkers) improves recognition of target speech (McAuley et al., 2020, 2021). The observation of these effects supported a DAT-based Selective Entrainment Hypothesis which proposed that listeners’ attention is selectively entrained by the natural rhythm of to-be-attended target speech, which facillitates the tracking of that speech over time in difficult listening situations. The Selective Entrainment Hypothesis would predict that background rhythm alteration improves target recognition because inadvertent entrainment to the background (at the expense of entrainment to the target speech rhythm) would be reduced. The background rhythm effect has proved fickle, however, and does not occur either when the background is unintelligible or can easily be segregated into a separate auditory stream based on strong fundamental frequency cues, suggesting that there might be more to the effect than a disrupted background rhythm alone (McAuley, Shen, Smith, & Kidd, 2021). The experiments reported here were designed to examine two stimulus characteristics that might have contributed to target speech recognition independent of the background rhythm itself: (1) the timing of background Color and Number words relative to the timing of target Color and Number words and (2) the intelligibility of background Color and Number words due to rhythm alteration. Either of these factors might have in part produced changes in performance with increasing alteration of the background rhythm. Experiment 1 additionally explored how the deviation or conformity of a distracting 22 background Color and Number pair to the expected timing (based on the target rhythm) of the target Color and Number influences target recognition. Experiment 1 demonstrates that when there are no temporal expectations coming from the background that could disrupt the buildup of temporal expectations for the target, the relative timing of backgound Color and Number words with respect to target Color and Number words alone is sufficient to influence performance. The results support both a Temporal Expectation Hypothesis and a Temporal Order Hypothesis, such that a distracting background Color and Number pair becomes less intrusive as the onset increasingly violates temporal expectations for the target Color and Number words, and are more intrusive when the background onset leads the target (compared to when the background onset lags). The improvement of performance with large deviations from synchrony is compatible with the idea that the natural rhythm of target speech sets up temporal expectations for the occurance of future speech events, and that these expectations have an influence on speech perception. This is consistent with the broader literature on speech rhythm, which has shown that the temporal patterning of speech can influence how later speech events within the same stream are percieved in a way that is congruent with the expected continuation of the pattern (e.g. Dilley & McAuley, 2008; Baese-Berk et al., 2019). It is also consistent with the perspective of DAT, which would predict that a temporally predictable stimulus such as speech can entrain attentional rhythms in order to concentrate attentional energy near the expected time of future information- carrying stimulus events in order to better percieve those events and better ignore irrelevant ones (Jones, 1976; Jones & Boltz, 1989, Large & Jones, 1999). The asymmetric effect of asynchrony on performance suggests that the distracting background Color and Number words are more intrusive when they appear prior to the expected 23 onset of the target Color and Number words. In contrast, in Experiment 2 where the background precursor was present and the background Color and Number words appeared either 50ms before or after the target Color and Number words, there was no effect of onset order on performance. Additionally, Experiment 2 produced much worse performance overall than either the +50ms or - 50ms OA conditions from Experiment 1. Since the presence or absence of the background precursor was the sole difference between the stimuli in Experiment 2 and the ±50ms OA stimulus conditions in Experiment 1, the contrast in performance and in the effect of onset order (or lack thereof) can be attributed to the background context. The question then is what is the background precursor doing? If the presence of a rhythmic background context that can disrupt the development of precise temporal expectations for the target is what makes the task of Experiment 2 comparatively more difficult than Experiment 1, then the selective entrainment hypothesis would predict that as the background rhythm becomes increasingly irregular inadvertent entrainment to the background would be reduced and performance would improve. This was not the case in Experiment 2. Despite rhythmic alteration of the background precursor, participants were equally likely to select the Colors and Numbers from the background as they were to select the Colors and Numbers from the target no matter the level of alteration. This is in contrast to a previous experiment with a single male background talker that was different from the male target talker where the entirety of the background sentence (including the Color and Number) had the rhythm alteration applied to it. With the same levels of rhythm alteration (m = 0.0, 0.25, 0.50, 0.75) applied, there was a clear background rhythm effect such that an increasingly altered background rhythm led to improved performance (McAuley et al., 2021). 24 The discrepancy between the prior observation of the background rhythm effect and the results of the current Experiment 2 might still be explained by disrupted temporal expectations for the target. If participants were unable to distinguish between the target and background sentences toward the beginning of the stimulus, there would not be two distinct speech streams with rhythms that could be selectively entrained to but instead one single auditory stream with a jumbled and unpredictable rhythm. The background rhythm effect had previously been observed for a single-talker male background with a different male target talker, where the average F0 of the two talkers was similar but not identical and other vocal qualities might also have differed between them (McAuley et al., 2021). In Experiment 2 of the present study we instead used recordings from the same talker for both target and background, thus eliminating any characteristic differences that could be used to initially segregate target from background. While performance at baseline (with no background rhythm alteration) is comparable between Experiment 2 and this prior two-talker experiment, speech shaped noise had been added to the stimuli in the prior experiment in order to make the task more difficult. No such noise was added in the present study, suggesting that the lack of segregation cues available when the target and background talker were identical did indeed make it harder to distinguish the two sentences from each other. Not having a temporally predictable target rhythm that is perceptually distinct would also result in small deviations from synchrony not being as useful a cue for which Color/Number pair is correct, which would explain the reduction in performance with the reintroduction of the background precursor from Experiment 1 to Experiment 2. At first, this explanation might seem to contradict an earlier interpretation of the lack of a background rhythm effect when the background talker was of a different sex than the target talker. We had suggested that the lack of an effect was due to the presence of a strong F0-based 25 segregation cue, which rendered selective entrainment superfluous (McAuley et al., 2021). However, the joint evidence that the background rhythm effect does not occur either in situations where the target and background are easily segregated into separate auditory streams or when there is a lack or absence of primary segregation cues might suggest a sort of “Goldilocks” zone where the background rhythm becomes a deleterious presence. Such a Goldilocks principle would predict that the background rhythm effect will only occur if the following conditions are satisfied: (1) The listening situation is difficult enough to require the use of secondary perceptual processes for attending to the target in addition to the use of primary segregation cues, (2) There are sufficient segregation cues to facilitate the initial selection of one rhythm over the other as a source of temporal expectations, and (3) The source of the competing background rhythm can be mistaken for the target. However, based on the experiments reported here we cannot eliminate the less interesting possibilities that the background rhythm effect is not an entrainment effect at all, but is instead an effect of either the background Colors and Numbers becoming less intrusive due to changes in intelligibility or systematic differences in the relative timing of the Color and Number words. To investigate the first possibility further, it will be important to compare the intelligibility of isolated Color and Number pairs amidst noise with different amounts of rhythm alteration. It has been established with pilot testing that the rhythm alteration does not impact intelligibility in quiet (McAuley et al., 2020), but if the intelligibility of the words that participants are meant to report is reduced when presented in more difficult listening situations, this could lead to both a background rhythm effect (by virtue of a reduction in intrusions) in situations where intrusions are likely and a target rhythm effect regardless of the possibility of intrusions. 26 The second possibility that the background rhythm effect is one of relative timing of Color and Number words seems somewhat less plausible. The relative onset of background color and number words varied from trial to trial in past work since the background rhythm alteration was applied with different phases, which should have prevented any systematic differences in onset asynchrony between rhythm conditions. Additionally, there was no difference in performance in the present study between the +50ms and -50ms OA conditions in the presence of the background precursor and any incidental changes in onset asynchrony due to rhythm alteration in prior experiments were likely small. The fact that there was such a large effect of onset asynchrony in Experiment 1, however, means that this possibility cannot be entirely dismissed and warrants further scrutiny. Overall, the present pair of experiments demonstrates that expectations for target speech timing can help listeners to distinguish between a target and an intruding background, but also that such temporal expectations are weakened or no longer useful when target and background speech are not distinct enough (e.g. based on fundamental freuqency differences) to trigger initial perceptual segregation and selection. Future experiments will further investigate the effect of onset asynchrony when target speech timing is irregular and rhythm-based temporal expectations are weakened, and a second line of experiments will systematically vary the F0 difference between target and background when the background context is intact or rhythmically irregular. 27 APPENDIX 28 APPENDIX Figure 1. Results for Experiment 1: Proportion correct target Color and Number recognition for each value of onset asynchrony (0, ±25ms, ±50ms, ±100ms, ±200ms). Negative values indicate that the onset of the background Color appeared early relative to the onset of the target Color (background leading), while positive values indicate that the onset of the background Color appeared late relative to the target Color (background lagging). Error bars represent standard error. 29 Figure 2. Results for Experiment 1: Proportion of Color (Panel A) and Number (Panel B) intrusions for each value of onset asynchrony (0, ±25ms, ±50ms, ±100ms, ±200ms). Negative values indicate that the onset of the background Color appeared early relative to the onset of the target Color (background leading), while positive values indicate that the onset of the background Color appeared late relative to the target Color (background lagging). Error bars represent standard error. 30 Figure 3. Examples of rhythm unaltered and altered versions of a spoken CRM sentence of the form “Ready [call sign] go to [color] [number] now.’ The top panel (Panel A) shows the sample sentence where the rhythm is unaltered (m = 0), as represented by the bars equally spaced in time. The middle and bottom panels show how the same time points in the speech signal are shifted by the rhythm transformation (m = 0.75, maximally altered condition) for two different phases (Panel B, phi = 5π/4; Panel C, phi = π/2). 31 Figure 4. Results for Experiment 2: Proportion correct target Color and Number recognition for each level of background rhythm alteration (m = 0.0, 0.50, 0.25, 0.75). Black squares with a solid line represent performance when the background Color/Number was leading (OA = -50ms) and open circles with a dashed line represent performance when the target Color/Number was leading (OA = +50ms). Error bars represent standard error. 32 Figure 5. Results for Experiment 2: Proportion Color (panel A) and Number (Panel B) intrusions for each level of background rhythm alteration (m = 0.0, 0.50, 0.25, 0.75). The solid grey lines represent the chance of choosing the background Color or Number at random when both background and target Colors and Both numbers are heard. Black squares with a solid line represent performance when the background Color/Number was leading (OA = -50ms) and open circles with a dashed line represent performance when the target Color/Number was leading (OA = +50ms). Error bars represent standard error. 33 REFERENCES 34 REFERENCES Assmann, P. F., & Summerfield, Q. (1989). Modeling the perception of concurrent vowels: Vowels with the same fundamental frequency. The Journal of the Acoustical Society of America, 85(1), 327-338. Assmann, P. F., & Summerfield, Q. (1990). Modeling the perception of concurrent vowels: Vowels with different fundamental frequencies. The Journal of the Acoustical Society of America, 88(2), 680-697. Aubanel, V., Davis, C., & Kim, J. (2016). Exploring the role of brain oscillations in speech perception in noise: intelligibility of isochronously retimed speech. Frontiers in Human Neuroscience, 10, 430 Baese-Berk, M. M., Dilley, L. C., Henry, M. J., Vinke, L., & Banzina, E. (2019). Not just a function of function words: Distal speech rate influences perception of prosodically weak syllables. Attention, Perception, & Psychophysics, 81(2), 571-589. Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cognitive Psychology, 41, 254-311. Bolia, R. S., Nelson, W. T., Ericson, M. A., & Simpson, B. D. (2000). A speech corpus for multitalker communications research. Journal of the Acoustical Society of America, 107, 1065-1066. Booth, A. J., & Elliott, M. T. (2015). Early, but not late visual distractors affect movement synchronization to a temporal-spatial visual cue. Frontiers in psychology, 6, 866. Bregman, A. S. (1990). Auditory scene analysis. Cambridge, MA: MIT Press. Brokx, J. P. L., & Nooteboom, S. G. (1982). Intonation and the perceptual separation of simultaneous voices. Journal of Phonetics, 10(1), 23-36. Darwin, C. J., & Ciocca, V. (1992). Grouping in pitch perception: Effects of onset asynchrony and ear of presentation of a mistuned component. The Journal of the Acoustical Society of America, 91(6), 3381-3390. Darwin, C. J. (1981). Perceptual grouping of speech components differing in fundamental frequency and onset-time. The Quarterly Journal of Experimental Psychology Section A, 33(2), 185-207. Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51- 62. 35 Desjardins, J. L., & Doherty, K. A. (2013). Age-related changes in listening effort for various types of masker noises. Ear and hearing, 34(3), 261-272. Dilley, L. C., & McAuley, J. D. (2008). Distal prosodic context affects word segmentation and lexical processing. Journal of Memory and Language, 59, 294-311. Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109(29), 11854-11859. Ding, N., & Simon, J. Z. (2014). Cortical entrainment to continuous speech: functional roles and interpretations. Frontiers in Human Neuroscience, 8, 311. Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19, 158. Dowling, W. J. (1973). The perception of interleaved melodies. Cognitive psychology, 5(3), 322- 337. Frings, C., & Moeller, B. (2012). The horserace between distractors and targets: Retrieval- based probe responding depends on distractor–target asynchrony. Journal of Cognitive Psychology, 24(5), 582-590. Ghitza, O. (2011). Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm. Frontiers in Psychology, 2, 130. Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: emerging computational principles and operations. Nature Neuroscience, 15, 511. Golumbic, E. M. Z., Ding, N., Bickel, S., Lakatos, P., Schevon, C. A., McKhann, G. M., Simon, J.Z., Poeppel, D. & Schroeder, C. (2013). Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron, 77, 980-991. Goswami, U. (2019). Speech rhythm and language acquisition: an amplitude modulation phase hierarchy perspective. Annals of the New York Academy of Sciences. Houtgast, T., & Festen, J. M. (2008). On the auditory and cognitive functions that may explain an individual's elevation of the speech reception threshold in noise. International Journal of Audiology, 47(6), 287-295. Humes, L. E., Kidd, G. R., & Fogerty, D. (2017). Exploring use of the coordinate response measure in a multitalker babble paradigm. Journal of Speech, Language, and Hearing Research, 60(3), 741-754. 36 Johnson, T. A., Cooper, S., Stamper, G. C., & Chertoff, M. (2017). Noise exposure questionnaire: A tool for quantifying annual noise exposure. Journal of the American Academy of Audiology, 28(1), 14-35. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review, 83, 323-355. Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96, 459-491. Jones, M. R., Kidd, G., & Wetzel, R. (1981). Evidence for rhythmic attention. Journal of Experimental Psychology: Human Perception and Performance, 7, 1059-1073 Jones, M.R, Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stimulus-driven attending in dynamic arrays. Psychological Science, 13, 313-319. Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time- varying events. Psychological Review, 106, 119-159. McAuley, J. D., & Jones, M. R. (2003). Modeling effects of rhythmic context on perceived duration: A comparison of interval and entrainment approaches to short-interval timing. Journal of Experimental Psychology: Human Perception and Performance, 29, 1102- 1125. McAuley, J. D., Jones, M. R., Holub, S., Johnston, H. M., & Miller, N. S. (2006). The time of our lives: Life span development of timing and event tracking. Journal of Experimental Psychology: General, 135, 348-367. McAuley, J.D., Shen, Y., Dec, S., & Kidd, G. (2020). Altering the rhythm of target and background talkers differentially affects speech understanding: Support for a selective- entrainment hypothesis. Attention, Perception, & Psychophysics, 82, 3222–3233 McAuley, J. D., Shen, Y., Smith, T., & Kidd, G. R. (2021). Effects of speech-rhythm disruption on selective listening with a single background talker. Attention, Perception & Psychophysics, 1-12 Miller, J. E., Carlson, L. A., & McAuley, J. D. (2013). When what you hear influences when you see: listening to an auditory rhythm influences the temporal allocation of visual attention. Psychological science, 24(1), 11-18. Morrill, T. H., Dilley, L. C., McAuley, J.D., & Pitt, M. A. (2014). Distal rhythm influences whether or not listeners hear a word in continuous speech: Support for a perceptual grouping hypothesis. Cognition, 131, 69-74. 37 Noble, W., Jensen, N. S., Naylor, G., Bhullar, N., & Akeroyd, M. A. (2013). A short form of the Speech, Spatial and Qualities of Hearing scale suitable for clinical use: The SSQ12. International journal of audiology, 52(6), 409-412. Riecke, L., Formisano, E., Sorger, B., Baskent, D., & Gaudrain, E. (2018). Neural entrainment to speech modulates speech intelligibility. Current Biology, 28, 161-169. Rosen, S., Souza, P., Ekelund, C., & Majeed, A. A. (2013). Listening to speech in a background of other talkers: Effects of talker number and noise vocoding. The Journal of the Acoustical Society of America, 133(4), 2431-2443. Tilsen, S., & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages. The Journal of the Acoustical Society of America, 134(1), 628-639. Turgeon, M., Bregman, A. S., & Roberts, B. (2005). Rhythmic masking release: effects of asynchrony, temporal overlap, harmonic relations, and source separation on cross- spectral grouping. Journal of Experimental Psychology: Human Perception and Performance, 31(5), 939. Wang, M., Kong, L., Zhang, C., Wu, X., & Li, L. (2018). Speaking rhythmically improves speech recognition under “cocktail-party” conditions. The Journal of the Acoustical Society of America, 143, EL255-EL259. 38