4 tab). This is to certify that the dissertation entitled All Things Coherence presented by Zachary Andrew Constan has been accepted towards fulfillment of the requirements for Ph D degree in Physics $4.4m, / Major professor Date W09 2' MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE ULo 8W DATE DUE DATE DUE NOV 2 1 zogs 6/01 cJCIRC/DatoDmpGS-pjs ALL THINGS COHERENCE By Zachary Andrew Constan A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Physics 2002 ABSTRACT ALL THINGS COHERENCE By Zachary Andrew Constan Interaural coherence is a crucial element of binaural hearing. However, the nature of real-world environments is to reduce coherence through reverberation (especially in rooms) and extraneous noise. It becomes necessary to understand coherence discrimination and how somewhat-coherent sounds influence the use of auditory cues. Because coherence is closely related to interaural time differences (ITDs), coherence discrimination may diminish for frequencies above the 1500 Hz “limit” where sinusoidal ITD cues lose effectiveness. Listeners’ sensitivity to coherence in 100-Hz wide bands above and below that limit was tested. Some listeners’ thresholds dropped at high frequencies, while other listeners were consistent on both sides of the limit. These results were attributed to variable envelope coherence sensitivity. Listeners showed poorer coherence discrimination as the stimulus duration decreased below 100 us and when the stimulus level lowered from 64 to 34 dB SPL. While it is possible for human listeners to lateralize high-frequency noise on the basis of ITDs in the envelope, the ability to do so makes great demands on the interaural coherence of the noise. Chapter III explores listeners' ability to lateralize a broadband, high-passed, coherent-noise signal in the presence of a broadband incoherent masker. Results showed that as the high-pass cutoff frequency increased through a critical region from 1 to 4 kHz, the required interaural coherence increased rapidly, especially for small changes in lateral position. This can be predicted by a neural model of lateralization based on the centroids of bandwise cross-correlation functions of model peripheral inputs that have been rectified and low-pass filtered [e.g., Bernstein and Trahiotis, J. Acoust. Soc. Am. 100, 1754—1763 (1996)]. Coherent noise produces a compact lateralized image, and incoherent noise (independent signals to left and right ears) produces a diffuse image that fills the head. Nevertheless, listening experiments showed that coherent and incoherent noise images seem to be equally affected by small interaural level differences (ILD). The ILD threshold for lateralizing incoherent noise is less than 0.5 dB greater than that for coherent noise. In this sense, the human binaural system appears to behave like an ideal level meter, insensitive to the waveform and envelope fine structures that determine coherence. The small discrepancy (less than 0.5 dB) can be understood from a standard model of loudness perception—incorporating critical band filtering, half-wave rectification, amplitude compression (0.6 power law), and temporal integration (300 ms). A final goal was to discover if listeners were sensitive to the incoherence produced by head dispersion. The dispersion was modeled using Kuhn’s (JASA 62, 157- 167, 1977) derivation for pressure on a spherical surface due to a plane wave, which led to synthetic head-related transfer functions. In a headphone experiment, listeners attempted to distinguish these artificial head-related dispersions from perfectly-coherent stimuli of constant ITD. The results showed that listeners could discriminate the two stimuli, but only on the basis of lateral position. Another experiment, designed to eliminate the lateral cue, found listeners unable to consistently identify the head- dispersed sound. ACKNOWLEDGMENTS The following tome would never have been possible without the help of innumerable people, but I’ll try to thank them all anyway. Tim McCaskey, Scott Lawton and fellow graduate student Xinya (Peter) Zhang provided a lot of data in return for subjecting me to their experiments. Mary J o Hidecker taught me all I needed to know about analysis of variance. Dr. Brad Rakerd provided countless hours of listening, advising, and reviewing. . .not to mention an awful lot of useful C code. Thanks to many, many listeners who sat in a dark, quiet, warm room and tried not to fall asleep. Most importantly, I’d like to give much of the credit for this dissertation to my advisor: Dr. William M. Hartmann. He put up with my shenanigans for six years, instructed me in the ways of psychoacoustics, listened, suggested, waited, and mentored. I have nothing but admiration for Dr. Hartmann, which is why I’m so glad he has shaped my academic career. I aspire to be the professor he always has been. I’d also like to thank those who looked after me in my graduate career. First, the late-night homework support group: Wayne and Marguerite Tonjes, Andrew Schnepp, Brian Sharpee, Viktoria Greanya, Jim Armstrong, Jack Ryan. Without these folks, I’d never have made it past the first semester. Phil Klirnkewicz, Ryan Humphrey, Bob Slater, Martin Eltzroth and Jason Greanya also kept me sane with many, many diversions. My family (Louis, Karen and Megan) is the rock I stand on to see possibilities for the future. . .thanks for letting me see such big dreams. Finally, to my wife Diane, who provided all the basic love, support, wisdom, knowledge and maintenance necessary to finish graduate school: take a bow, darling! TABLE OF CONTENTS LIST OF TABLES ................................................................................... ix LIST OF FIGURES ................................................................................. xi Chapter I. An introduction to coherence, and its importance in human auditory localization Binaural time cues ....................................................................... 1 Real-world sources and noise ......................................................... 2 Interaural cross-correlation ............................................................ 3 Coherence - Mathematical ............................................................ 5 Coherence — Perceptual ................................................................ 6 The foundation for correlation-based models of the auditory system ............ 8 The significance of coherence ....................................................... 10 References .............................................................................. 12 Figures ................................................................................... 1 3 Chapter II. A study of decorrelation sensitivity as a function of frequency band, duration, intracranial position, and stimulus intensity Introduction ............................................................................. 23 Experiment 1: Coherence discrimination at low and high frequencies ......... 25 Experimental Procedure .......................................................... 25 Stimulus Generation ............................................................... 27 Sources of Error ................................................................... 30 Presentation and Listeners ....................................................... 34 Results .............................................................................. 35 Discussion .......................................................................... 37 Experiment 2: Broadband coherence discrimination at short durations ........ 40 Introduction ........................................................................ 40 Part A: Variable Duration ........................................................ 42 Part B: Variable Duration with randomized ILD .............................. 43 Part C: Variable Duration with zero, constant, and random ILD ........... 46 Experiment 3: Level-dependent coherence discrimination ....................... 48 Introduction ........................................................................ 48 Results ............................................................................... 49 Conclusions ............................................................................. 50 Further study ............................................................................ 52 References ............................................................................... 54 Figures ................................................................................... 56 Chapter III. ITD discrimination of several stimuli Introduction ............................................................................. 86 Experiment 1: ITD discrimination as a function of interaural coherence. ......87 Method-Interference experiment ................................................ 87 No-interference experiment ...................................................... 89 Absolute Threshold experiment ................................................. 89 Listeners ............................................................................ 89 Results .............................................................................. 90 Discussion-Absolute Threshold ................................................. 91 Non-interference .................................................................. 92 Interference ........................................................................ 92 Experiment 2: ITD discrimination as a function of high-pass cutoff ........... 93 Introduction-the salience of high-frequency ITD cues ....................... 93 High-frequency ITD cues in the room environment .......................... 94 Method .............................................................................. 95 Results and Discussion ........................................................... 97 Experiment 3: ITD discrimination of high-pass-filtered targets and maskers ...................................................................... 101 Introduction ....................................................................... 101 Method ............................................................................. 103 Results ............................................................................. 103 Discussion-Analysis with respect to known models ........................ 105 The Jeffress model ...................................................... 105 The auditory nerve model ............................................. 106 The equalization-cancellation model ................................. 107 The position-variable model ........................................... 107 The coherence-based model ............................................ 108 Model calculation results ....................................................... 111 Model predictions ................................................................ 1 13 Conclusions ............................................................................ 1 15 References ............................................................................. l l 7 Figures .................................................................................. 119 Chapter IV. Intensity—based lateralization using four stimuli of varying coherences Introduction ........................................................................... l 59 Models ........................................................................... 160 Methods ................................................................................ 162 Experiment 1: Broadband signal ................................................... 165 Results ............................................................................ 165 Discussion ....................................................................... 166 Experiment 2: Low-passed signal .................................................. 169 Introduction and method ........................................................ 169 Results and discussion .......................................................... 170 Experiment 3: Low-passed signal with randomized levels ...................... 171 Introduction and method ........................................................ 171 Results and discussion .......................................................... 173 Summary/Discussion ............................................................... 174 The inclusion of DLI as a test of the level-meter model .................... 174 Experimental inconsistency with the level-meter model ................... 176 The evolution from level-meter to loudness-meter .......................... 177 vi Conclusions ............................................................................ 178 Appendix A: “The strange case of Listener C” .................................. 179 Appendix B: Noise-type discrimination project ................................. 180 Introduction ....................................................................... l 80 Experiment 1 ..................................................................... 180 Experiment 2 ..................................................................... 182 Experiment 3 ..................................................................... 183 Conclusions ....................................................................... 1 85 References ............................................................................. 1 87 Figures ................................................................................. 189 Chapter V. Discrimination of decorrelation due to frequency-dependent phase shifis, as calculated by a model of sound pressure on a spherical head Introduction ........................................................................... 205 The current study ................................................................. 208 Experiment 1: Discrimination at two incident angles ............................ 209 Method ............................................................................ 209 Results ............................................................................. 213 Discussion ........................................................................ 214 Experiment 2: Randomized angle ................................................. 216 Introduction ....................................................................... 216 Method ............................................................................ 216 Preliminary Experiment: “Equivalence of Position” ........................ 217 Experimental task, Results and Discussion .................................. 223 Experiment 3: “Wrong” phase shifts ............................................... 226 Introduction ....................................................................... 226 Method ............................................................................ 227 Results and Discussion ......................................................... 228 Head-shift coherence, and delayed coherence discrimination .................. 229 Introduction ....................................................................... 229 Coherence calculations ......................................................... 230 Analysis of peak cross-correlation lag values ................................ 232 Discussion-lateral position ...................................................... 232 Discussion-correlation .......................................................... 234 Experiment 4: Off-axis coherence discrimination ................................ 235 Introduction and Method ........................................................ 235 Results and Discussion .......................................................... 236 The nature of incoherence ........................................................... 237 Summary and conclusion ............................................................ 238 Appendix A ........................................................................... 240 References ............................................................................. 241 Figures .................................................................................. 243 vii GENERAL APPENDICES .................................................................. 281 Signal Detection Theory ............................................................ 282 Levitt Staircase Method ............................................................. 284 References ............................................................................. 287 Figures ................................................................................. 288 viii LIST OF TABLES Table 11.1. Interval composition .................................................................... 28 Table 11.2. Bandwise coherence thresholds ...................................................... 36 Table 11.3. Level-dependent coherence thresholds .............................................. 50 Table 111.]. Coherence thresholds by ITD and cutoff .......................................... 98 Table 1112. Calibration settings .................................................................. 102 Table III.3. Results comparison .................................................................. 104 Table IV.1. Broadband ILD thresholds according to stimulus type ......................... 167 Table IV.2. Broadband pairwise comparisons ................................................. 168 Table IV.3. Narrowband ILD thresholds according to stimulus type ....................... 170 Table IV.4. Narrowband pairwise comparisons ............................................... 171 Table IV.5. Random-level ILD thresholds according to stimulus type ..................... 173 Table IV.6. Random-level pairwise comparisons .............................................. 174 Table IV.7. Informal noise-type discrimination ............................................... 181 Table IV.8. Formal noise-type discrimination ................................................. 183 Table IV.9. Noise-switched discrimination .................................................... 184 Table IV.10. Summary of noise-type experiments ............................................ 185 Table V. 1. Ideal lateralization results ........................................................... 218 Table V.2. Experiment 2 lateralization data .................................................... 221 Table V.3. Experiment 2 discrimination data .................................................. 225 Table V.4. Wrong-head lateralization data ..................................................... 228 Table V.5. Wrong-head discrimination data .................................................... 228 ix Table V.6. Stimulus cross-correlations .......................................................... 230 Table V.7. CC peak magnitudes and lags ....................................................... 231 Table V.8. ITD limits by stimulus ............................................................... 232 Table V.9. Lag/ITD comparison ................................................................. 233 LIST OF FIGURES Figure 1.1. Schematic of a head in the presence of a sound source .......................... 14 Figure 1.2. Four signal waveforms ................................................................ 16 Figure 1.3. Representation of coherent and incoherent sources ............................... 18 Figure 1.4. Representations of intracranial images .............................................. 20 Figure 1.5. Depiction of Jeffress’ coincidence-counter model ................................. 22 Figure 11.1. Bandlimits for narrowband stimuli used in this experiment ..................... 57 Figure 11.2. Description of the experimental task ............................................... 59 Figure 11.3. Schematic of the experimental setup in the form of a patch diagram... . . .61 Figure 11.4. Estimation of coherence in the case of Programmable Attenuator (PA) inaccuracy ............................................................................................ 63 Figure 11.5. Actual measurement of error in the Programmable Attenuator (PA). . . . . .65 Figure 11.6. Results of Experiment 1 .............................................................. 67 Figure 11.7. Calculated normalized correlations in a Masking Level Difference experiment ............................................................................................ 69 Figure 11.8. The concept of the envelope ......................................................... 71 Figure 11.9. Results of Experiment 2: variable duration ........................................ 73 Figure 11.10. Results of Experiment 2: variable duration with randomized ILD. . . ....75 Figure 11.11. Results of Experiment 2: variable duration with zero ILD .................... 77 Figure 11.12. Results of Experiment 2: variable duration with constant ILD. . .........79 Figure 11.13. Results of Experiment 2: variable duration with random ILD ................ 81 Figure 11.14. Graphic representation of the “fuzzy ball copier analogy”? .................. 83 Figure 11.15. Results of Experiment 3: level-dependence ...................................... 85 xi Figure 111.1. Depiction of a human head and a sound source in a room .................... 120 Figure 111.2. Schematic of the experimental setup in the form of a patch diagram.......122 Figure 111.3. Description of the experimental task ............................................ 124 Figure 111.4. Results of Experiment 1: ITD discrimination as a function of coherence for listener B ............................................................................................ 126 Figure 111.5. Results of Experiment 1: ITD discrimination as a function of coherence for listener L ............................................................................................ 128 Figure 111.6. Results of Experiment 1: ITD discrimination as a function of coherence for listener S ............................................................................................ 130 Figure 111.7. Results of Experiment 1: ITD discrimination as a function of coherence for listener T ............................................................................................ 132 Figure 111.8. Results of Experiment 1: ITD discrimination as a function of coherence for listener W ............................................................................................ 134 Figure 111.9. Results of Experiment 1: ITD discrimination as a function of coherence for listener Z ............................................................................................ 136 Figure 111.10. Results of Experiment 1: ITD discrimination as a function of coherence averaged over all six listeners .................................................................... 138 Figure 111.11. Spectral diagram of experimental stimulus for Experiment 2 .............. 140 Figure 111.12. Results of Experiment 2: ITD discrimination as a function of high-pass cutoff ................................................................................................ 142 Figure 111.13. Results of Experiment 3: ITD discrimination of high-pass-filtered targets and maskers for listener W ....................................................................... 144 Figure 111.14. Results of Experiment 3: ITD discrimination of high-pass-filtered targets and maskers for listener X ........................................................................ 146 Figure 111.15. Results of Experiment 3: ITD discrimination of high-pass-filtered targets and maskers for listener Z ........................................................................ 148 Figure 111.16. Results of Experiment 3: ITD discrimination of high-pass-filtered targets and maskers averaged over three listeners ...................................................... 150 Figure 111.17. Simulating the actions of the auditory periphery in the model ............ 152 xii Figure 111. 1 8. Binaural processing in the model .............................................. 154 Figure 111.19. Model calculation results ........................................................ 156 Figure 111.20. Model predictions ................................................................ 158 Figure IV.1. Depiction of a two-interval ILD trial in Experiment 1 ........................ 190 Figure IV.2. Depiction of a two-interval DLI trial in Experiment 1 ........................ 192 Figure IV.3. Results of Experiment 1: Broadband signal .................................... 194 Figure IV.4. Results of Experiment 2: Low-pass filtered signal ........................... 196 Figure IV.5. Depiction of a two-interval ILD trial in Experiment 3 ........................ 198 Figure IV.6. Results of Experiment 2: Low-pass filtered signal with randomized standard level .................................................................................................. 200 Figure IV.7. ILD discrimination according to signal detection theory (TSD) ............. 202 Figure IV.8. Loudness-meter model predictions compared to results from Exps. 2 and 3 ........................................................................................................ 204 Figure V. 1. Depiction of a human head and an incoming sound wave .................... 244 Figure V.2. Head-related dispersion according to Kuhn’s spherical head model ........ 246 Figure V.3. Results of Experiment 1: Discrimination at 30 degrees ....................... 248 Figure V.4. Results of Experiment 1: Discrimination at 30 degrees ....................... 250 Figure V.5. Results of Experiment 1: Discrimination at 30 degrees ....................... 252 Figure V.6. Results of Experiment 1: Discrimination at 30 degrees ....................... 254 Figure V.7. Results of Experiment 1: Discrimination at 45 degrees ....................... 256 Figure V.8. Results of Experiment 1: Discrimination at 45 degrees ....................... 258 Figure V.9. Results of Experiment 1: Discrimination at 45 degrees ....................... 260 Figure V.10. Results of Experiment 1: Discrimination at 45 degrees ...................... 262 Figure V.11. Head-related dispersion according to Kuhn’s spherical head model, and the inverse ............................................................................................... 264 xiii Figure V. 1 2. Cross-correlation of broadband model head-dispersed noise at a 45-degree angle ................................................................................................. 266 Figure V. 13. Cross-correlation of narrowband model head—dispersed noise at a 45-degree angle ................................................................................................. 268 Figure V. 14. Cross-correlation of narrow-narrowband model head-dispersed noise at a 45-degree angle .................................................................................... 270 Figure V. l 5. Cross-correlation of broadband coherent noise at 520 us ITD .............. 272 Figure V.16. Cross-correlation of narrowband coherent noise at 520 us ITD ............ 274 Figure V.17. Cross-correlation of narrow-narrowband coherent noise at 520 us ITD...276 Figure V. 1 8. Cross-correlation of broadband “wrong-head” noise at a 45-degree angle ........................................................................................................ 278 Figure V.19. Results of Experiment 4: Off-axis coherence discrimination ............... 280 Figure A. l. Graphical representation of signal detection theory (TSD) ................... 289 xiv Chapter I: An introduction to coherence, and its importance in auditory localization Binaural time cues When performing sound localization in the real world, humans rely on several aspects of the sound in order to accurately identify its origin. The fact that we have two cars greatly improves our chances, because the comparison between sounds incident on both of them (termed “binaural hearing”) is a powerful tool. Given an external sound source at some azimuthal angle to the head’s forward direction (Figure 1.1), there is a difference in path lengths to the ears. Assuming that the speed of sound is constant, it is clear that the sound will be incident on one car before the other. This factor is called Interaural Time Difference (ITD). It provides one of the most critical cues for localization. The ITD is approximately equal to 3a , At=-—sm6 c where a is the radius of the head (about 8.75 cm), and c is the speed of sound in air (343 m/s). This solution is predicted by a diffraction formula in the limit of low frequencies, and establishes a relationship where a particular time delay corresponds directly to an azimuthal angle. For purposes of scale, the maximum ITD expected (given a signal located on the extreme right or left) is about 800 us. The auditory system can use the detected ITD to determine the direction of (localize) the sound source, at least in the horizontal plane. Real-world sources and noise Accurate sound localization is impaired by several factors, not the least of which is the presence of competing sounds. Listeners often find it necessary to localize a single source of interest while trying to filter out the “clutter” of many extraneous sources or reflections. Yost (1997) gives a review of the related “cocktail party problem,” which simply asks how humans are able to identify and understand one conversation in a room where dozens of people are speaking at once. This environment involves a target signal (human voice) that must be comprehended in the presence of a masking noise (crowd). Imagine that the target source is off to the listener’s right, and that its signal therefore leads in the right ear and lags in the left by some delay (top two panels of Figure 1.2). The signals at left and right ears are identical aside from the delay, because they come from a single source. The listener can easily identify similar features in both. By measuring the time between corresponding peaks, the listener can establish an ITD and determine the direction of the target. The listener also has no trouble distinguishing characteristics of the target such as its nature as speech. The bottom panels of Figure 1.2 show the same target signal after an independent masking noise has been added to each car. This noise may be constructed of multiple signals with different ITDs, as in the case of a talking crowd. The masker “muddies” the target, making it less distinguishable and localizable. The left and right signals are no longer identical, and common features are harder to detect. 1n the case of Figure 1.2, the masker is of low amplitude with respect to the target, and so the target is still somewhat recognizable. As the noise increases, there comes a point at which the target is “drowned out” altogether. Interaural cross-correlation The degree of similarity between left- and right-ear signals is referred to as interaural “correlation,” and measured quantitatively by the cross-correlation (CC) function. Plainly stated, the CC tests how close the relationship of two signals is across a range of interaural delays. The mathematical definition of CC is described by the following equation, where xL and xR are simply the waveforms incident on lefi and right ears: 2'): IXL(I)XR(I+T)dI (1) flxflt] )dt, I x%(t2)dtz The symbol 7 represents a measure of cross-correlation, which is a function of interaural delay time t. The numerator is a time integral of the signal in the left ear multiplied by the signal in the right. The denominator is the square root of the energy in the left ear multiplied by the energy in the right. To better understand the form of the CC function, let the reader concentrate on the numerator of the above equation, in essence calculating an unnorrnalized cross- correlation: y(r) = le(t)xR(t + r)dt By Fourier-transfonning the signals, xL and Jr}; are broken down into sums of cosines: = IdtZCnL cos(a),,t + gonL )Z CmR cos[wm (t + r) + gomL + Arum] n m Using the trigonometric identity for cos(A+B): = Idtz CnL cos(wnt + (an )2 CmR[cos(a)mt + (me )cos(a)mz' + Agpm) n m - sin(a),,,t + qme ) sin(a)mr + A(pm )] And distributing the multiplier: = Z CmR Z CnL 1d11905(wn’ + (0121. ) cos(wmt + 40ml. ) cos(wm T + Agom) m n — cos(wnt + ¢nL ) sin(a)mt + (DmL ) 3111(0)”; T + A¢m )1 Performing the time integral within the limits 0 < t < T, we can use orthogonality of cosines to sines and to cosines of different frequencies to collapse the sums into one. By returning the normalizing denominator, the equation is made complete: ZCnLCnR cos(wnr+Agon) ye): " 2 2 ZCniLXank n1 n2 The value Ag)" represents the difference between right and left phases. Thus, the normalized CC function is simply constructed of cosines featuring various frequencies, phase differences, and amplitudes. To improve understanding of the measure of cross-correlation, the following analysis involves a signal that is exactly the same in both ears, with all frequencies having a time delay of zero. Because xm) = xR(t + 0) the numerator of equation 1 becomes a power integral at t = 0, and the two powers in the denominator become equal, yielding the power squared. Thus, the cross-correlation goes to one at ‘l.’ = 0, then falls off as a sum of cosines, indicating that the two signals are perfectly correlated given no delay (also known as an “autocorrelation” function). However, if the signals from the two cars are totally unrelated, having no time delays in common (uncorrelated), the integral in the numerator goes to zero, and the correlation is zero everywhere. This is how we assign numerical values to correlation. Coherence — Mathematical The CC function, while an interesting and useful representation of the relationship between two signals, is not exactly concise. A simpler way to characterize the similarity of left and right channels is the coherence measurement, which is typically established by the peak value of the CC function. A mathematically rigorous definition begins with the CC equation. Let left and right signals contain a target A with some ITD to (as the top half of Figure 1.2), as well as two masking noises, B and C respectively, that are mutually independent and uncorrelated with A (as added to the bottom half of Figure 1.2). To simplify the derivation, let B and C also be equal in intensity. Thus, le(’)xR(’ + r)dt I[A(t) + B(t)lA( t— 2'0 + r)+ C(t + r)]dt \/le (“)d’llxR (’2)d’2 =\[[A(t1) )(+Btl )]2 dill[A()t2 )2+C(t2)] dtz 7()= The peak value will occur when T is equal to to. The time integral of independent functions goes to zero, so we can simplify the equation: Mmld’ \/I[A2(t1)+32 (t1 )ldtll[A2(t2)+C2(12]dt2 Coherence = 7(ro)= This equation deals with the powers in A, B and C integrated over time. Because the two masking noises have equal intensities, 182(tldt =1C2(t)dt Let the time integral of X2 (the total power in X) be symbolized by PX: PA ___ PA WPA+PBIPA+PB] PA+PB Therefore, coherence is simply the ratio of the correlated power in both ears to the total Coherence(C) = power. If the masker is absent (P3 = PC = 0, resulting in the conditions on the top half of Figure 1.2) then the coherence equals one, representing 100% correlation between the cars. This is a perfectly coherent sound. If the target is absent, then the coherence equals zero—a completely incoherent sound. The range between these limits represents various combinations of target and masker, and provides a straightforward and quantitative way to describe the relationship of binaural signals. Coherence - Perceptual Certainly, the ability to identify an ITD is dependent on the signal coherence, but how does coherence relate to ITD? To understand how the binaural system might use coherence, one must account for the frequency analysis done by the auditory periphery. The incoming signal is transformed from the time domain to the frequency domain, and thus, the measurement of coherence is important as a function of frequency bands. For an illustration of coherence in terms of frequency and ITD, see Figure 1.3. The two plots graph the frequencies (y-axis) that make up the signal by their interaural delay (x-axis). In the upper figure, the line labeled “coherent source” contains a range of frequencies with a common delay (in this case, -667 us). The small breaks in the line indicate that the 1800-Hz-wide source is separated into several frequency bands in the auditory system, which then measures coherence within those bands. Frequency analysis into bands occurs at the auditory periphery (Pickles, 1982), and thus, central processes like coherence perception must evaluate the signal within those bands. The common delay indicates that these frequencies issued from one direction, probably from the same source. Because they share one delay, the listener will have no trouble identifying the ITD and localizing the source. A CC calculation of this band in left and right ears peaks at one, indicating perfect coherence. The “incoherent source” in the lower figure, however, has randomly-distributed delays across frequency. Each open circle represents the delay for a particular frequency. The dotted curves on either side identify delays of half a period for each frequency, beyond which phase shifis indicate an image on the opposite side. Because different frequencies have different delays, the peak CC over any critical band is likely close to zero, and the sound has no identifiable direction. These frequencies do not produce a localizable image because they seem to come from many angles. The extent to which ITDs are consistent over a band of frequencies is called “straightness,” a factor which may have some significance in auditory localization (Buell and Trahiotis, 1997). Coherence is intimately connected to one’s ability to localize a sound source. Consequently, it influences one’s perception of the sound as well. In the case of a sound produced by headphones, the binaural system places an internalized image of that sound along an axis in the listener’s head, much like the 1 axis of a CC function, or the time delay axis of Figure 1.3. If the source arrives at the left ear first, the internal image is pulled toward the left ear. Detection along this intracranial axis is called lateralization. The image location is in no small way determined by the delay at which the CC function peaks. In order to better understand these images, and how they correspond to the two sources in Figure 1.3, refer to Figure 1.4. In the case of a coherent source, since all the frequencies are located at one time delay, they produce a single, compact image at one position along the lateralization axis. The clearly defined ball on the left represents this image. For an incoherent band, the time delays are spread out randomly, and thus appear at various lateral locations. They produce a fuzzy, head-filling image as depicted on the right. The middle image is a compromise between the two, the internal representation of coherence between zero and one. The foundation for correlation-based models of the auditory system For most real-world stimuli, the sound source is composed of a band of frequencies as depicted in Figure 1.3. As mentioned above, once the signal is incident on the ears, the first act of the auditory system is to analyze it into component frequencies (Pickles, 1982). Any model of the auditory system must take that fact into account from the beginning. That approach is fundamental to correlation/coherence-based models that stem from the work of Jeffress (1948), who suggested a possible method of binaural interaction illustrated in Figure 1.5 (from Hartmann, 1999). The bottom axis represents internal lag, while the vertical axes delineate channels where the frequency-analyzed auditory signal travels from the two cars. Each cluster of five lines represents a channel that receives only one characteristic frequency (CF) from the auditory input. The black dots where the vertical arrows originate symbolize actual neurons, which serve as “coincidence detectors”. If they receive simultaneous signals (within a time window) from both sides, they will fire, signifying a coincidence of neural spikes in the left and right channels. The purpose of the coincidence detectors is simple: take, for example, the detectors on the far left of the time axis. If they fire, the binaural system recognizes that the signal reached the right ear long before the left, allowing it time to travel a longer path to that neuron and still coincide with the signal from the left ear. Therefore, the source must be located towards the extreme right. The various neurons in each CF channel represent many different combinations of signal path lengths, and thus, ITDs. In this manner, J effress’ construct uses a neural “place” mechanism to infer external ITDs from the activity of detectors at known internal delay (Stern and Trahiotis, 1995). Those detectors with higher firing rates simply indicate more coincidence for their characteristic time delay. Recent physiological studies have revealed the existence of cells in the superior olivary complex and inferior colliculus that might play just such a part in binaural processing. J effress’ model of coincidence counters may have a factual basis in ITD- sensitive cells found in the medial superior olive (Yin and Chan, 1990). The cells are particularly sensitive to a specific interaural delay, and investigators have identified a distribution of ITD-sensitive cells that includes all physically plausible (and some implausible) free-field delays. This evidence lends credence to the concept of coincidence detectors with a characteristic interaural delay. Going back to Figure 1.3, one can reinterpret the graph as a simpler view of Jeffress’ coincidence matrix. Once again, it is a plot of frequency versus time delay. The points at each fiequency that make up the coherent and incoherent bands could represent the highest firing rate among coincidence detectors tuned to that particular CF. The plot now represents neural activity across internal delays for a selection of frequencies available to the auditory system. The interpretation of coherent and incoherent bands is unchanged: while the coincidence peaks have a common delay over CF, the band is coherent, and its ITD readily established. Interestingly, the pattern of coincidence response among detectors for a single CF can approximate the short—term CC function of the input signal at that frequency. The proposed matrix of coincidence counters separated by internal delays forms the cornerstone of many auditory models. It is the primary binaural analyzer in the auditory-nerve-based model of Colbum (1977), the position-variable model of Stern and Colburn (1978), and the modern correlation-based model of Bernstein and Trahiotis (1996) to name a few. As such, the concepts outlined above are of great importance to the research reported hereafter, which deals with the effect of coherence on the use of auditory cues and as a cue itself. The significance of coherence What motivates this study of coherence? There are many answers, which tend to center around coherence’s critical role in localization. When the listener is presented with incoherent sound, he or she cannot establish an interaural time delay because there are no comparable features in lefi- and right-ear signals. The reduction of coherence in real-world situations can be a daunting obstacle to communication. For instance, referring back to Figure 1.], note that l and I’ are only two possible paths that the sound 10 can take to the cars. In a room, reflections of the walls can arrive shortly after the direct sound, providing a different directional cue and reducing coherence, which serves to confuse the binaural system. Also note that the head is not transparent to sound, and for the listener in the figure, the left car will actually receive the signal that has dispersed around the head. This also lowers coherence, and may make localization more difficult. However, in the field of sound reproduction, the introduction of incoherence is actually desirable. Since the listener wants to experience a realistic reproduction of the concert or theater environment, sound engineers must ensure that the illusion is not ruined by easily-localized sound at a speaker. By lowering the coherence of the signal (with multiple speakers and added reverberations), they produce a sound that seems to come from many directions, enveloping the listener. Finally, a key factor inspiring the study of coherence in the binaural system is that such study provides the experimenters with insight on how this part of the auditory system works. References Bernstein, LR. and Trahiotis, C. “The normalized correlation: Accounting for binaural detection across center frequency,” J. Acoust. Soc. Am. 100, 3774-3784 (1996). Buell, TN and Trahiotis, C. “Recent Experiments Concerning the Relative Potency and Interaction of Interaural Cues,” Binaural and Spatial Hearing in Real and Virtual Environments, ed. Gilkey and Anderson, Lawrence Erlbaum Associates, Mahwah, NJ, p. 139-149 (1997). Colburn, H.S. “Theory of binaural interaction based on auditory-nerve data. 11. Detection of tones in noise,” J. Acoust. Soc. Am. 61, 525-533 (1977). Hartmann, W.M. “How we localize sound,” Phys. Today, November 1999. Kuhn, G.F. “Model for the interaural time differences in the azimuthal plane,” J. Acoust. Soc. Am. 62, 157-167 (1977). Jeffress, L.A. “A place theory of sound localization,” J. Comp. Phys. Psych. 41, 35-39 (1948) Pickles, J .0. An Introduction to the Physiology of Hearing. Academic Press, New York, 1982. p. 24-70 Stern, R.M. Jr. and Colbum, H.S. “Theory of Binaural interaction based on auditory- nerve data. IV. A model for subjective lateral position,” J. Acoust. Soc. Am. 64, 127-140 (1978) Stem, RM. and Trahiotis, C. “Models of Binaural Interaction,” Hearing, ed. B. Moore, Academic Press, New York, p. 347-3 86 (1995). Yin, T.C.T. and Chan, J .C.K. “Interaural Time Sensitivity in Medial Superior Olive of Cat,” J. Neurophys. 64, p. 465-488 (1990). Yost, WA. “The Cocktail Party Problem: Forty Years Later,” Binaural and Spatial Hearing in Real and Virtual Environments, ed. Gilkey and Anderson, Lawrence Erlbaum Associates, Mahwah, NJ, p. 329-347 (1997). Figure 1.1. Schematic of a head in the presence of a sound source. a is the radius of the head, 6 is the azimuthal angle to the source as measured from the forward direction, 1 and l’ are the distances from the source to the left and right ears, respectively. SOUND LISTENEK’S HEAD ' Figure 1.1 14 Figure 1.2. Four signal waveforms. The top two plots represent a lOO-percent coherent signal in left and right ears (xL(t) and xR(t), respectively) with an interaural time delay imposed so that the right signal leads the left. The bottom two plots include the same coherent signal as well as a small amount of independent noise added to each ear, reducing the coherence to 90 percent. The arrows in top and bottom left-ear waveforms show the delay, where the lefi signal waveforms correspond to the earliest time in the right signal. 15 # I 1'1-111'11111 .. . é " :' < L > .. L > .. ‘ < J ; 4 j T i? . L £> . - :0) — 7 '0 2:: E . <3 ‘15 : 4> i. - .3 '1 T b — ; .: pl 111.1.r.i.r.r.r.- ir'.i.1.|.i.r.lni.r. a l : : S.» 1 1 <3 ’ - _ 4:: I. E < .. — < q P <> .3 - < -E P -"' > :1: L 4; L" - < _ L E .. C. p. .. L C 4 ’P _ [rill-111W (”X Figure 1.2 16 Figure 1.3. Representation of coherent and incoherent sources. In both plots, the y-axes are frequency scales, while the x-axes plot the interaural time differences (ITD) of a signal according to frequency. The upper plot contains a coherent source including frequencies from 200 to 2000 Hz, characterized by a constant ITD within frequency bands (indicated by breaks in the line). The lower plot shows an incoherent source composed of frequencies with random ITD in the same bandwidth. The dotted lines constrain the range of measurable ITDs because they mark delays of one-half period. 17 0 0 J 1 u d O . 2 n . v a I 1 V A .. . 1 A .. . _vl IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII [10 H |||'|""" H . m . e e . r c u h U n o o . .. C S . n. . v n 0 r 0 b n n p O . 2 Xm XN v: o . ANT: xocoacocn. Interaural time difference (its) 2000 “ H r . H . n ................ .6 o o H ”o ooooooooooooo 00° 0000 A .. ............. .0210. ............... 1.0 . o 0 00 . .. ............... O o . . .......... 0.0. o . 1 ..... . o n n .. U . n . . m . u e w ._ . .n r . . m u . . ..n. m . e . O . 0 p n h b 0 Va Va v: 3. 3.3 3:262“. Interaural time difference (its) Figure 1.3 18 Figure 1.4. Representations of intracranial images. A coherent image (leftmost) is compact and easy to localize. An incoherent image (rightmost) is broad and indistinct, and has no definite intracranial location. A slightly-incoherent image (center) is broader than a coherent image, but more defined than an incoherent one. 19 Coherent Slightly-incoherent Incoherent Image Image Image Figure 1.4 20 Figure 1.5. Depiction of Jeffress’ coincidence-counter model. Each horizontal band of five lines represents a characteristic frequency, and the x-axis spans interaural lags of i3 ms. The dots at varying lags are coincidence counters, and their outputs are the vertical arrows that presumably transfer information to the binaural processor. (From Hartmann, 1999) 21 l —-3 Signal from LAG 7 (ms) Signal from left car right ear Figure 1.5 22 Chapter 11: A study of coherence sensitivity as a function of frequency band, duration, intracranial position, and stimulus intensity Introduction It is well-known that the binaural system can use interaural time difference (ITD) cues in a signal waveform for frequencies up to about 1500 Hz, but not above. This limitation may be due to neural firing rates not keeping up with high frequencies, so the spike trains lose “phase lock” with the incoming signal (Pickles, 1982). In his “duplex” theory, Rayleigh (1907) attributed sinusoid localization to ITD cues for all frequencies below 1500 Hz and to interaural level difference (ILD) cues above. His generalization of the localization process has held up well in decades of experiments to follow. The upper limit of 1500 Hz has appeared for many binaural effects, such as the lateralization of sine tones, binaural beats, and Masking Level Differences (MLD). The study reported in this chapter is one of many currently exploring the limitations of the duplex theory. While the theory holds true for sinusoids, Klumpp and Eady (1956) have shown that listeners are sensitive to ITDs of high-frequency complex waveforms. Bernstein and Trahiotis (1995) discovered that for narrowband(100-Hz wide) noise presented at a center frequency of 4 kHz, threshold ITD sensitivity for listeners was around 250 us. While this threshold is better than no ITD sensitivity at all, it represents a large percentage of the total internal delay that the auditory system can detect, and is a very poor resolution compared to previously reported thresholds of 10 us for low-frequency tones (Durlach and Colbum, 1978). Since coherence discrimination is 23 inherently dependent on ITD distribution across frequencies (Trahiotis et a1., 2001 ), one may conclude that it should be significantly impaired for frequencies above 1500 Hz. As a test of this hypothesis, the study reported here examined listeners’ coherence discrimination for 100-Hz-wide noise bands that were either entirely above or below this limit, as in Figure 11.1. If the above assumption was correct, listeners should perform well in the lower bands, and poorly in the higher ones. In order to actually measure coherence discrimination, an experiment was devised to test whether a subject could distinguish between a perfectly coherent stimulus and a slightly incoherent one. This kind of test finds a “just noticeable difference find)” from a reference coherence of one. To generate this slightly-incoherent stimulus, a small amount of independent noise was added to the coherent noise. In terms of intracranial perception, the coherent noise produces a focused image of small auditory source width (ASW), while the addition of independent noise effectively blurs the edges to make the image a “fuzzy ball” with increased ASW (Figure 1.3). This increase in ASW could relate to the standard deviation of probability density along the ITD axis (see General Appendix B on Signal Detection Theory). Listeners were then asked to identify which of the two stimuli was the slightly incoherent one to test their ability to discriminate coherence. This task may be compared to the well-known Masking Level Difference (MLD) experiments (Durlach and Colbum, 1978) that require listeners to detect a tonal signal in broadband noise. The name of these experiments comes from the fact that when the signal and masker have different interaural phases (for instance, the signal is shifted 180 degrees (871:) while the masker is diotic (No)), listeners can detect the signal at much lower levels than if the two had identical phase relationships. The addition of an out-of-phase signal 24 corresponds to a small drop in coherence, and some models indicate that NoSn MLDs are directly related to coherence discrimination (Jain et. al. , 1991). This interpretation of MLD suggests that it is easy for listeners to distinguish a very small change from a coherence of one (Robinson and J effress, 1963), and results show that the task becomes more difficult as the signal frequency increases to 1500 Hz due to the declining usefulness of ITD cues. The present results are compared to those of several MLD researchers at many points in this chapter, and the research takes hints from their discoveries. Experiment 1: Coherence discrimination at low and high frequencies Experimental procedure As described in the introduction, the initial goal was to test coherence discrimination of lOO-Hz bands both above and below the 1500-Hz cutoff for fine- structure ITD cues. The center frequencies of the narrow bands tested were 200, 500, 1000, 2000 and 4000 Hz—three below and two above the cutoff. See Figure 11.1 for a frequency-scale representation of those bands. The experimental program also presented runs with a broadband stimulus from 20-6000 Hz, which included all the narrow bands within its limits. Listeners were expected to produce thresholds for the broadband stimulus that were at least as good as for the band that gave their best result, since the broadband stimulus contained the same auditory information in that band as well as all other bands. This is, of course, assuming that more coherence data allows the binaural processor to make finer distinctions—however, results from Gabriel and Colburn (1981) 25 actually indicate that coherence discrimination degrades as bandwidth increases. This result is non-intuitive, and the present study is intended to provide more information on the subject. Subjects attempted to identify the less coherent interval in a two-interval, forced choice trial by pressing the appropriate button on a two-button response box. The intervals were 500 ms in duration, at 64 dB Sound Pressure Level (SPL). See Figure 11.2 for a schematic of two possible trials. The coherent and slightly incoherent intervals could occur in two orders: if the slightly incoherent interval (represented by the fuzzy ball) were presented first, the left response button would be correct, and if it were second, then the reverse was true. The “fuzziness” of the signal was determined by the amount of independent noise introduced to reduce coherence, which was in turn controlled by a 3- dOWn, 1-up staircase (Levitt, 1971) (see General Appendix A). If the listener answered corl-“ecztly three times in a row, the program lowered the level of the independent noise, making the two intervals more alike and the task more difficult. However, if the listener answered incorrectly, the program intensified the independent noise, making the slightly incoherent interval less coherent and easier to distinguish. Each time the listener changed dire(Etion on the staircase, the “turnaround” intensity was recorded, and the run went on for 1 4 turnarounds. The first four turnaround intensities were discarded, and the other ten were averaged to establish the 79% correct level (defined as the threshold) at that condition. Listeners performed training runs on each task until their performance was deemed consistent, and then data were recorded for the next four runs. For each Condi tion, the mean and (n-1) standard deviation of the values determined by those four runs f‘Or'rned the results. 26 Stimulus generation The setup (in the form of a patch diagram) designed to produce the stimuli can be found in Figure H.3. Beginning at the left, noises A and B are generated from two channels of the digital noise generator. A and B have identical levels and band limits, but are completely independent. Because they are constructed in the frequency domain before transformation into time, both noises have infinitely sharp bandlimits. They enter separate low-pass filters (cutoff frequency = 8.2 kHz, rollofi = -115 dB/octave) to eliminate components above half the sample rate (these spurious components are a consequence of the digital-to-analog conversion), after which A is split into two branches. The t0p branch feeds directly into an exponential voltage-controlled amplifier (EVCA) that is ramped on and off by a trapezoidal envelope generator, and then sent to the liSteIler’s left ear. The lower branch of noise A is connected to a Programmable Atterluator (PA), as is B. The PAS lower the level of the input signals, scaling by factors designated as “h” and “g” so that noise A comes out as hA, and B as gB. Then the two are I‘l'lixed to get signal hA+gB, which feeds through another EVCA into the listener’s right ear. The coherent and slightly incoherent intervals are generated as shown below in Tab 1e 11.1. To achieve a coherent stimulus, one would set h = 1 and g = 0, and therefore the Si gnals to both ears are identical. When adding some independent noise (B) to lower the cOherence from one, the settings change to h < 1 and g > 0. 27 Table 11.1 - Interval composition First Interval (100% Coherent) Second Interval (<100% Coherent) Left Ear Right Ear Left Ear Right Ear A 1.0A+0.0B A hA+gB In order to prevent any interaural level differences (ILDs), the mixture of A and B must be properly weighted to match the level of A in the left ear. It is imperative that the ILD be the same (approximately zero) for both coherent and slightly-incoherent stimuli, or the listeners may be able to use it as a cue. Specifically, if the mixture of A and B was always quieter than A alone, the listener could learn to do the task by identifying which interval provided the lower level to the right ear. To properly test coherence discrimination, one must eliminate all other cues. In the case of a coherent interval, the settings are simple: no attenuation on A and maxi mum attenuation on B (as shown in Table 11.1). However, when the interval is I“ea-I'lt to have a coherence slightly less than one, the settings require some calculation. The powers in each ear are as follows, with A and B representing the two independent norse waveforms, and the horizontal bars above them indicate the average value over Pl. § A2 PR e (hA+gB)2 =(hA)2 +2 There is no cross term of A and B in the right ear power because they are Compl etely independent waveforms, and so when multiplied and averaged, this term does not contribute to the power. By equating the left and right POWCYS, and noting that the 28 power in noise A equals that of noise B, one can then solve for a relationship between factors h and g: P. = P. $410214): +(gB>2 = (,2 +532)? (1) hZ+g2 =1—§h=‘/1—g2 Now, for any scale factor g chosen to impose on the independent noise (thus, determining how much incoherence will be introduced), the computer can choose an h that will ensure a balance of power between the two cars. With the knowledge of this relationship, one can make use of the equation for cross-correlation y(t), and insert Signals for the slightly incoherent interval. This calculation will establish the coherence for a particular combination of h and g: A?) ~ jx,(t)x,(t + ad: " fjdz,x:(r.)jdt.xi(tzl x,’ §A xR ‘\‘hA+gB A hA+ B Coherence=y(r=0)= .( X g ) (2) JA2(hA+gB)2 \ h? hill—2' _h:47_h \\‘ = __ _ @2(h2A2+gZBZ) ,/A‘(h2+g2) A2 From the above calculation, it is apparent that the coherence value for this interval i ' . . S’ In fact, srmply scale factor h for norse A. Note that the result depends on several 29 assumptions: that A is independent of B, A and B are of the same level, and h" + g2 = 1. The experimenters needed to ensure that the setup does not conflict with these assumptions, and that the results can be trusted. Sources of Error One possible source of error is the digital noise generator. It is fairly simple to generate two independent noises, but to guarantee that they have equal levels is something else altogether. There is an inherent randomness to noise that defies prediction, and that is no better illustrated than by Experiment 1’s first noise-generating algorithm. The original design used Rayleigh-distributed noise as a stimulus, which is defined as a type of “random-arnplitude, random-phase” noise. This noise type is coH'llnonly found in everyday stimuli. The sample period was 50 us per channel, buffer length was 163 84 points for a frequency spacing of 1.22 Hz and a top frequency of 10 kHZ- The phases were selected randomly after the program’s number generator received a ratldom seed between 0 and 9999. In the broadband case, when the digital signal is COIr71posed of many thousand components, the Rayleigh-distributed amplitudes are not a problem. Thanks to the law of large numbers, the overall power tends to be the same from trial to trial. However, the experiment called for well-defined 100-Hz bands, which included only 82 component frequencies (a result of the 1.22-Hz spacing). With fewer ranClem numbers, the power tended to vary every time the noise was generated, fluct11ating by as much as one decibel between intervals—obviously in violation of the assumption that the A and B powers were identical. This fluctuation influenced the deciSi on to change the stimulus to an equal-amplitude, random-phase (EARP) noise. 30 Since the EARP amplitudes were constant between trials, the power also remained constant. It is also important to note that EARP noise is perceptually indistinguishable from Rayleigh noise, so the experiment did not lose any relevancy with respect to real- world stimuli. The Programmable Attenuators were another element of variability in the experimental design. These components were crucial to setting the scale factors for noises A and B, and the coherence calculation depended on the relationship between them holding true (namely, that h‘? + g2 = 1). The task was then to establish how sensitive the coherence is to fluctuations in level, and the accuracy of the PAs with which parameters h and g were determined. First, knowing that the PAs are only precise to 0.1 dB, the effects that an uncertainty of that magnitude would have on coherence were calculated. Using the relationship between level and amplitudes: h 0.1dB = 20104—9394] h program Let- the program-set coherence be 0.90, which is one of the typical coherences involved in thls experiment. Thus, hpmgmm = 0.9: 0.1 h \ § 10g( actual ) 20 0.9 hm...” = 0.9 * 10% = 0.910422 31 The calculation cannot blindly stop here, since h is no longer the coherence as was previously established. Now, it must include the denominator from the original calculation: actual 2 2 hactual + g g = ,/1—h;,,,gm = 0.43589 _ 0.910422 " $828867 + 0.19 coh = = 0.901953 CO According to this corrected calculation, there is a change of less than 0.2 percent coherence from the original estimation. This is a reasonable error compared with the range of coherences found in the results. To illustrate the negligible nature of these deviations, Figure 11.4 graphs the effect of a i0.1 dB error on expected coherence. The x- aXiS represents the coherences as calculated by the computer, while the y-axis uses the recfillculation above to account for “actual” coherence. The range of coherences from 0.7 to 1 includes all values of h used in the experiments. Note that the dotted and dashed lines never deviate from perfect accuracy by more than 1% coherence. Also, most runs found listeners staying in the higher coherences, where the error is smallest. Thus, Experiment 1 could tolerate attenuation errors of this magnitude. Since the parameter g was also dependent on a PA, these calculations were repeated with a corrected version, and found that an error in noise B has the same effect on c()lrerence as it did for noise A, but with the signs reversed. Thus, if a +0.1 dB error 32 on noise A caused the coherence to increase by 0.001953, a -0.1 dB error on noise B would do exactly the same. All the above calculations were performed under the assumption that the PAs could have an error on the order of their precision. To be thorough, the PAS were tested for accuracy by comparing a sine wave input of known amplitude with measured output voltage. A program cycled the PAs from 0 to 80 dB of attenuation in 0.1 dB steps, while the disparity between the calculated (actual) and expected attenuations were recorded by a computer-controlled Keithley multimeter and saved in a file at each step. Both PAS were tested in this manner, with the input amplitude at 3 or 9 volts. One example of the results can be found in Figure 11.5, a plot of attenuation error vs. expected attenuation from 0 to 60 dB. All four cases (two PAS, two input voltages) showed errors of much less than 0.1 dB up to an attenuation of about 70 dB. However, since the attenuation for noise B during the sensitive mixing process tended to be less than 20 dB, one can assume that the apparatus is quite satisfactory for experimental purposes. A final test attempted to ensure that noises A and B were treated equally thl‘Ollghout their signal paths, save for the computer-controlled variation in the PAS. Having altered the noise generation technique to guarantee that A and B had equal levels upon creation, it was necessary to check that they were still equal upon mixing (refer to Figure 11.3). There were many opportunities for bias along their signal path, from the DAC outputs, through the low-pass filter and PAS, then across long cables to one of three mi"er inputs. Measurement of the DAC outputs with a voltmeter confirmed that they were Virtually identical. In order to account for the rest of the signal path, the expel‘ilnental program added two steps to the calibration procedure that alternated which 33 PA was passing the signal, but substituted sine waves for noise. This change was made for ease of measurement, since the RMS voltage reading on the multimeter would have been erratic and uncertain for narrowband noise. The voltage at the mixer output provided comparison data for each combination of channel and mixer input (two channels times three inputs gave us six results). The results guided the selection of inputs for A and B which offered the lowest voltage difference. The level difference between them was less than 0.1 dB in this configuration — once again, certainly low enough for this experiment. Presentation and Listeners Experiments were conducted in an Acoustic Systems double-walled soundroom. Stimuli were presented through Sennheiser HD-480 11 headphones at a standard level of 64 dB. In all, six listeners were involved in the initial coherence discrimination exPel‘iments, designated by the letters M, N, P, V, W and Z. All were males, between the ages of 18 and 26, except for listener W, who was 60 at the time of these trials. P and M Were left-handed. All had musical experience. Listeners W and Z had extensive prior listeIling experience, while the rest had never before participated in this type of hearing Study. The only hearing problems reported were possible high-frequency loss for W, c011Sistent with that for middle-age males, and childhood ear surgery for Z (adenoid removal). Audiograms for V, W and Z exhibited normal hearing over the range of 125- 8000 Hz, except for W, who did lose some sensitivity above 3000 Hz. 34 Results Figure 11.6 plots the results for all six listeners who participated. The x-axis represents the center frequencies of each lOO-Hz-wide band (including the broadband case at far right), while the y-axis plots the coherence threshold necessary to discriminate from a perfectly-coherent image in these bands. The difference of these thresholds from one is thus the coherence jnd. First, note that all six listeners did well for bands centered at low frequencies, based on how close their thresholds are to one. The higher the value, the more similar the coherent and slightly incoherent intervals were, and the more difficult the task. Most of our listeners achieved thresholds around 0.99 for the lowest- frequency bands, indicating a jnd of only one percent coherence. This is excellent discrimination, and it matches closely (to two significant figures) with results from Gabriel and Colbum’s (1981) two listeners for a 115-Hz wide band centered at 500 Hz. Note that one of their listeners achieved an incredible threshold of 0.997, compared to 0- 98 9 for the other. The second feature of interest is the broadband (BB) thresholds, ShOWn on the right side of the graphs. The BB results are in line with those of Pollack and Trittipoe (1959), who performed a similar discrimination experiment with “Wi deband” noise and found that listeners could distinguish between coherences of one and 0.98 in about 75 percent of trials. Each listener’s broadband threshold is nearly equal to 1‘lis or her best threshold among 100-Hz bands. However, some listeners actually Performed better in their best band. See Table 11.2 for comparisons, which also includes listeners’ thresholds for the 500-Hz-centered band when that did not happen to be their best band. 35 Table 11.2 - Bandwise coherence thresholds Listener M N P V W Z Broadband 0.962 0.981 0.977 0.985 0.980 0.988 (0.022) (0.006) (0.006) (0.005) (0.005) (0.003) I Best band 0.975 0.986 0.991 0.983 0.993 0.993 (0.010) (0.005) (0.003) (0.007) (0.003) (0.002) I SOO-Hz Same 0.982 Same Same 0.988 Same center band (0.006) (0.003) Coherence jnd thresholds (n-l standard deviation) for best 100-Hz band and broadband stimuli The slight degradation of thresholds according to bandwidth may indicate some of the effect discovered by Gabriel and Colburn (1981) mentioned earlier. Broadband thresholds in the two experiments appear to be similar; Gabriel and Colbum found correlation jnds of two to three percent for a lO-kHz broadband noise, while the six listeners in the present study showed jnds of about one to four percent in the BB case. Concentrating solely on the bands centered at 500 Hz, matching the condition tested by Gabriel and Colburn, most of the listeners in Experiment 1 showed improved thresholds. However, a t-test (t(5) = 6.168, p = 0.056) does not show significant improvement for the nat‘I‘ower band. These results do not indicate declining coherence discrrnination with inc teasing bandwidth as was suggested in the previous study. Finally, in the two highest bands where fine-structure ITD discrimination is poor, mo St subjects still performed extremely well. Listeners M and V both show a significant redllction in discrimination relative to the lower-frequency bands, but others held relEltively steady across all center frequencies. Relatively consistent performance above and below the 1500-Hz limit seems contrary to the described concept of the binaural System, although Bernstein and Trahiotis (1995) found very similar results in their tests of 36 the MLD. Their subjects could be split into two distinct groups, much like those reported in Experiment 1: a few that struggled with the high-fiequency stimuli, and the rest that showed little frequency dependence in their thresholds. Discussion One can explain the ability to discriminate high-frequency narrowband coherence vvith the existence of envelope ITD cues. Bernstein and Trahiotis (1996) predicted that the binaural system’s use of the normalized correlation (which includes the dc value of the envelope) in MLD tasks would allow discrimination at high frequencies. The first figure in their paper (Figure 11.7) graphs the calculated normalized correlations of narrowband stimuli for several center frequencies of Si: and signal-to-noise ratios, after processing the stimuli to extract the high-frequency envelope. By comparison, Experiment I typically used very little incoherent noise (signal), so listeners were mostly Operating within the leftmost region of this graph. In this region, all center frequencies have nearly the same normalized correlation, and thus are equally difficult to distinguish fr on) one. This implies that the use of an envelope cue allows listeners to establish the cOheI‘ence of MLD stimuli at very high frequencies, and that coherence is almost inva~1‘iant with respect to center frequency for low signal-to-noise ratios. Granting that liSteners perform the MLD task by coherence discrimination, their analysis leads to the PrediCtion that listeners should have relatively constant thresholds for narrowband noise at ctinter frequencies above and below the 1500-Hz “limit”. This was indeed the case for some, but not all, of the listeners in Experiment 1. 37 The envelope is illustrated in Figure 11.8, which plots a sample lOO-Hz-wide stimulus centered at 2000 Hz. While the waveform itself is oscillating rapidly, the envelope (joining the peaks of the waveform) varies slowly. The auditory neurons may not be able to keep up with the high frequencies involved in the fine structure, but could still follow the fluctuations of the envelope. One’s binaural system would find it much easier to identify an interaural phase difference (and thus, an ITD) from the shift in this envelope’s peak than to identify a shift in a particular waveform peak. To quantitatively illustrate the frequencies involved in envelope fluctuation, the following mathematical development of the envelope E (t) begins with the absolute value of the sum of all contributing components: N . zCnel(a)nl+¢%) n=l E(t) = 1530)! = 2 2 . N . ___ Z Cnet(a)nt+¢,,) ZCme_’(wm’+qt") n: l m=l N . Z Cnel((0nl+%) n=l 2 N 2 ‘> E (t): 2Cn+2 n=l 532(1): N n-l Z z CnCm (304(0))? _ mm)’ +(¢n — ¢m)]) = PE (t) n=lm=l AS seen on the last line, the envelope power is governed by the “difference freQuencies” that exist between waveform components (can - com). The envelope power is therefore “translation-invariant,” meaning that one could shift all the frequencies by the same amount, and the envelope would be unchanged. This leads to an important result: While the difference frequencies include all possible combinations of two components, the highest frequency present in the envelope can only be the largest distance between any 38 two parts of the frequency spectrum. In the case of our stimulus, a 100-Hz wide band, the center frequency has no relevance for the envelope fimction. The highest frequency of oscillation in the envelope is thus 100 Hz, regardless of whether the band lies above or below the 1500 Hz limit. The implication is that while listeners cannot follow fine- structure ITDS at high frequency, the slower-varying envelopes can still provide a usable ITD cue. This cue would account for the frequency-independent coherence thresholds of some listeners. With the above knowledge, one can revisit Figure 11.6 and better understand the patterns of data. For low-frequency bands, the listeners did equally well (to an extent, neglecting the contribution of listener M). However, their coherence thresholds diverged rapidly as the center frequency increased. Specifically, listeners Z, W, P and N made efficient use of envelope cues. A logarithmic curve fit to the average results of those four listeners resulted in an R2 of 0.839, indicating that there was some dependence on center fi'eq uency. Z and P were nearly unaffected by the loss of fine-structure cues. Listeners V and M, on the other hand, did not glean as much information from envelope cues, and their thresholds suffered. Bernstein and Trahiotis (1998) discovered Similar disparity 21ml011g listeners with respect to MLD at center frequencies of 500 and 4000 Hz. Their listeners’ results also suggest differing abilities to identify envelope correlation cues. The inter-subject variability seems to indicate the presence of two mechanisms for coherence discrimination. All listeners were able to make use of the traditionally- 11ndtitr'stood fine structure cues to measure interaural coherence. However, once those cues failed, some listeners found that they could perform the task just as well with enVelope information. Others did not benefit as much from the envelope. 39 The potential degradation of coherence discrimination at high frequencies may also provide a clue to the dependence of jnds on bandwidth, which is a discrepancy between the results of Experiment 1 and those of Gabriel and Colbum. Jain et. al. (1991) noted that some listeners show decreased decorrelation sensitivity at greater bandwidths, and suggested that may be a result of the gradual inclusion of higher frequencies where sensitivity is poorer. Rather than providing useful additional information to the binaural processor, perhaps the high-frequency coherence cues reduce the processor’s overall performance. This hypothesis necessarily suggests that listeners who show poor coherence sensitivity at high frequencies (in this study, listeners V and M) should also demonstrate degraded sensitivity as a function of bandwidth. A check of Table 11.2 shows that listener M certainly follows this pattern, as his broadband threshold falls well below his best band threshold. However, listener V’s broadband result is actually better than his best band. To confuse matters more, listener P, whose narrowband thresholds are most consistent across frequency, shows a significant decline for the broadband case. Thus, Experiment 1 cannot provide definitive support for the hypothesis of Jain et. al. ExI>eriment 2: Broadband coherence discrimination at short durations Introduction Imagine a model that explains the coherence jnd as a function of resolution of the nun’lber of neural spikes. Assume that discrimination is dependent on the accumulation 0f etrough data from Jeffress’ coincidence counters (Stern and Trahiotis, 1995), as defitired in Chapter 1. While the accumulation of signal (number of spikes) increases as 40 number N, the standard deviation (variability in spike count) only increases as the square root of N. Increasing the amount of information gathered leads to a more favorable signal-to-noise ratio, improving discrimination. This concept of inforrnation-gathering parallels a model of Durlach et al. (1986) used to predict MLD and coherence- discrmination results. Their model is also based on the ratio of usefiil signal to interfering noise. One method of augmenting the available data is to increase stimulus bandwidth, which would stimulate nerves of different characteristic frequencies and cause output from additional coincidence counters, supplementing the overall coherence measurement. The increase in information available to the system should lower the jnd. Of course, results from Experiment 1 and previous studies show that this may not be the case. While additional bandwidth can provide more data, another dimension of information-gathering is over time. By limiting the time window for data accumulation, one could theoretically decrease the available information, thus raising the coherence jnd. It i S well-known that the binaural system can make use of data accumulated over a duration called the “integration time.” Assume an integration time of 100 ms, a common Value found in the literature (Durlach and Colbum, 1978). The implication is that the binaural system only “stores” information gathered over the last 100 ms, and performs analyses ignoring data that arrived before that period. It stands to reason that the binaural System should perform equally well whether provided with signal durations equal to or longer than the integration time. On the other hand, if the signal is shortened below 100 ms, the binaural system has less information available, and coherence discrimination $110111d suffer. Blodgett, Jeffress and Taylor (1958) found results that support the above 41 “information-gathering” model in the course of an MLD experiment, estimating that a direct proportion existed between the logarithm of stimulus duration and detection threshold (in dB) until the threshold reached its asymptote. Part A: Variable Duration For the duration experiment, the task was exactly the same as the broadband (20- 6000 Hz) stimulus of Experiment 1, but the duration was set at one of ten values: 500, 300, 170, 100, 80, 60, 40, 20, 10, and 5 ms. The signal level was 64 dB SPL. Experiment 2 provided the listeners with a broadband stimulus to ensure that their binaural systems had access to a large amount of information in the frequency domain, even at very short durations. The choice of broadband also avoided the problem of “spectral splatter” for short stimuli. In order to provide very accurate signal durations for this time-sensitive experiment, the trapezoidal envelope was removed from the setup in Figure H.3. The stimulus was then “windowed” by a rectangular envelope, with efl‘ectively instantaneous onsets and offsets controlled by a precise timing generator independent of the computer, so that expected and actual duration were identical. The “hard” onsets were audible at the beginning of the stimulus, but experimenters deemed them unobtrusive enough to not distract the listeners. Listeners included W and Z from the previous experiment, as well as two new subjects: S and T. Both were very experienced in hearing studies and had normal hearing. The results of the variable-duration experiment are plotted in Figure 11.9. On the x-axis are the various stimulus durations (the label for 80 ms is omitted, but the results are Still present), and the y-axis once again represents the coherence threshold at the 79% 42 discrimination level. The four participating listeners are separated into two graphs for clarity. An analysis of variance (ANOVA) test measured the variation in thresholds for all listeners, and gave a quantitative assessment of any significant differences across conditions. The alpha level for significance was set at 0.05, meaning that differences between two means are significant with a 95% probability. Essentially, one in twenty of these measurements may show significance when there is none. The mean-difference calculations were performed under a least-significant-difference (LSD) procedure. The ANOVA showed that there is a significant effect on thresholds due to duration (F(9,27) = 9.505, p < 0.001). The upper graph of listeners S and T shows many of the features expected from the discussion above. For durations of 100 ms and longer, their thresholds are consistent. This implies that the integration time was indeed near 100 ms, and additional information beyond that time did not make the task any easier. At shorter durations, they required more incoherence to make an accurate judgement, showing a Strong duration effect (F(9,9) = 21.036, p < 0.001). The graph for listeners W and Z Sh<>vvs a different pattern. Their performance was not only consistent above 100 ms, but 3130 below, and almost as good at 5 ms as at 500 ms. These two plots show very little downturn at short periods, a smaller though still significant duration effect (F(9,9) = 8-446, p = 0.02). Once again, the results find two distinct groups of listeners. Part B: Variable Duration with randomized ILD Because W and Z performed well even at the lowest durations in part A, where S and T struggled, the experimenters theorized they may be using alternate cues to do the task. In this case, it was possible that the constant position of the image inside the 43 subject’s head might have simplified their decision-making process. Refer back to Figure 11.2 For a look at the two intervals. These images were presented with no ITD or ILD, Placing them in the center of the head. The coherent image is a tightly-defined point along the internal delay axis, while the slightly incoherent image is somewhat broader due to the addition of independent noise. Instead of judging which image had a greater ASW, the liStener could simply listen for the presence of off-center noise, which would indicate that the signal was not totally coherent. Thus, the task becomes one of position detection, rather than coherence discrimination. This concept is termed the “position-cue hypotllesis.” As a check of this conjecture, Experiment 2 was changed in order to eliminate the p083 ibility of a position cue. The stimulus modification included the addition of a ranclom ILD to each trial. ILD has the same effect on image lateralization as ITD. By rat“clomizing lateral position each trial, the listeners would not be able to learn a particular Ioczition on which to focus their attention. The computer picked an ILD from integers wit-Inn the range of i4 dB, and all values had an equal probability. The easiest method of introducing an ILD was obviously the programmable attenuators that were used to control the levels of noises A and B. In order to ensure that the overall coherence was not altered by this perturbation, it was necessary to perform a recalculation of h and g. It was a two- ecluation, two-variable problem, given that the power in the right ear was a certain percentage of that in the left (as determined by the ILD), and we wanted the coherence unchanged. Thus, the calculation found a new had and gm; (that met criteria) in terms of the previous h and g. Let x be the amplitude ratio of right car over left. Whereas 44 equation 1 equated left and right powers by setting h and g appropriately, this derivation uses them to affect an ILD: ILD = 20logx 2 2 2 hild + gild = x Because the coherence set by had and g,“ should be no different from coherence h set by the non—ILD task (refer to the calculation of equation 2), hm h'Ia! \___=h—> ' =h—>h-,d=hx \lhild 2 + 81'le W I 2 2 h x + 81le =x2 —> 81'le =x2(1-h2) =x2g2 -+ gild = 87‘ As the equations show, attenuating or boosting both A and B noises equally in the right Channel does not change the coherence, or in simpler terms, coherence is level- invari ant. This is an expected result. Part B of Experiment 2 used durations of 500, 100, 40’ 10 and 5 ms as a representative set of the original stimuli. Aside from these changes, the ex periment was identical to the duration tests as described in part A. The addition of a random ILD is not without precedent; Bernstein and Trahiotis (1997) i ntroduced just such a “roving” binaural cue in their tests of correlation sensitivity. They f()llnd that random ILD did not significantly degrade their listener’s performance from te Sts performed without the ILD. This experiment was performed with the 45 expectation that any changes in threshold from part A would be due to the loss of a Position cue, and not a result of random ILD. The results of the “random ILD duration experiment” may be found in Figure “-10, and should be analyzed with respect to Figure 11.9. Note that the thresholds of listeners T and S are relatively unchanged, and the new ILD factor has not affected their Perfortnance. They show a significant effect over duration (F(4,4) = 20.364, p = 0.006). However, for durations less than 100 ms, W and Z now show a distinct degradation COheI’ence discrimination, a behavior that agrees better with their fellow listeners (F(4,4) = 17-6 S , p = 0.008). These data fit the predictions of the information-gathering model, because jnds increase at the shorter durations where coincidence data are more limited. Overall, the ANOVA shows a strong duration effect (F(4,12) = 34.948, p < 0.001). The change in W’s and Z’s results also suggests that they were indeed using a position cue to 130031: their short-duration performance. Par ’ C: Variable Duration with zero, constant, and random ILD Analysis of part B’s results could lead to a different conclusion. What if W and Z did “’0 rse simply because their coherence discrimination was best when the image was centered in their heads? Is performance dependent on intracranial location? These questions shed doubt on the position-cue hypothesis, and the experiment was redesigned again to answer them. Part C re-evaluated durations 40, 10 and 5 ms, as they were the stimuli that showed some change between the original duration experiment (part A) and the faridem ILD revision (part B). Listeners were run at each duration under three conditions: a constant ILD of 0 dB (which was identical to the experiment of part A), a 46 constarlt ILD of +4 dB, and a random integer ILD within the range of i4 dB (reproducing the experiment of part B). If the theory of listeners using the position cue over coherence discrimination were correct, one would expect the constant ILD of +4 to produce similar results to the constant ILD of 0. But if discrimination truly were not as accurate outside the center of the head, one would expect it to resemble the random ILD results. The answers to this debate lie in Figures 11.11, 11.12 and 11.13. The first is a plot of constant ILD = 0 results, and shows an interesting feature. Whereas the listeners for the duration experiment of part A were originally split into two distinct groups, this retrial eXhibits more consistent thresholds for all subjects down to 5 ms. These results don’t quite Show a significant duration effect (F(2,6) = 5.076, p = 0.051). In comparison to the results in Figure 11.9, it is apparent that S and T have improved their short-duration Perfol‘lnance. This change could indicate that they had discovered the use of the position cue, but judgement should be reserved until all the facts are analyzed. Figure 11.12 contains data for the ILD = +4 condition, which are not significantly different from those for ILD = 0 according to statistical t-tests (t(3) = 2.197, p = 0.159). However, the same teStS Confirm that the random ILD results in Figure 11.13 are appreciably lower than their ILD a +4 counterparts (t(3) = -7.442, p = 0.018). These results serve to disprove the hmthe sis that ILD randomization reduced discrimination performance because the images Vvere off-center, and lend credence to the position-cue hypothesis. It seems all listeners were using the position cue by the beginning of part C. The duration-insensitivity found in Figure 11.11 raises a question: how do these results relate to MLD duration dependence, which Blodgett, J effress and Taylor (195 8) found to be very significant? In the same range of duration, from 40 to 5 ms, they 47 measured a lO-dB increase in the required signal-to-noise ratio. The four listeners of Experiment 2 averaged coherence thresholds of 0.955 at 40 ms, which lowered to 0.927 at 5 ms - The change was small enough that the ANOVA showed no significance. However, referring back to the small S/N region of Figure 11.7, one might discover that Very small changes in coherence near one correspond to large S/N ratio changes for an MLD task. The coherence reduction stated above appears to span a change of 3-4 dB. However, if one uses the 40- and 5-ms results in Figure 11.9 (before listeners S and T used the PO sition cue), the average coherences vary from 0.943 to 0.890, respectively. That change corresponds to a S/N shift of 6-7 dB on Figure 11.7, closer to the sensitivity of Bledgett, Jeffress and Taylor’s subjects. ExPel‘iment 3: Level-dependent coherence discrimination 1”” Oduction The final iteration of the coherence-discrimination experiments altered the one value mat had been constant through all previous variations: stimulus level. The binaural system-1 is notably better at analysis when provided with a high level (Durlach and Colbum), 1978), and the experimenters wanted to determine how coherence jnds might change as the level decreased from our standard of 64 dB. A brief study by Gabriel and COIb‘u‘l‘l (1981) indicated that decreasing spectral level led to larger coherence jnds for their two listeners. When measuring with a 115-Hz wide stimulus centered at 500 Hz, they f0llnd coherence jnds for two levels: 0.008 at 75 dB, increasing to 0.013 at 44 dB. 48 One approach to the question of level has been dubbed the “fuzzy ball copier analogy” by the experimenters. For an illustration, see Figure 11.14, which shows the familiar coherent and slightly incoherent images. Each subsequent copy of the pair, as they descend on the page, was made by lowering the darkness setting. This is analogous to lowering the overall level of the intervals. Note how the fuzzy edges of the slightly incoherent image fade away towards the bottom, leaving a more compact image that resembles that on the left. This demonstrates why it may be more difficult to distinguish beWVeen the two at low levels, as the edges fall below the threshold of hearing. Listeners would essentially lose the “position cue” that was postulated in Experiment 2. The darkness of the images could represent regions where the number of neural spikes is above the detectable minimum. As intensities decrease, the spike count is reduced until it ceases to register in the coincidence detector, which harkens back to the infor1“1':nation-gathering model. A lower intensity stimulates less neural activity, or the spike trains do not synchronize as well with the input signal (two elements of Colbum’s (1978) auditory—nerve based model), providing fewer/noisier data to the coincidence Countfitrs. The information-gathering model finds this situation no different than if the input was limited in time or frequency, because both have lower signal-to-noise ratios. Thus the model predicts that listeners will establish larger coherence jnds for lower intensit i es Methods To test the copier analogy, Experiment 3 ran the discrimination task at levels of 64’ 44 and 34 dB. The signal was broadband (20 Hz — 6 kHz, as the BB condition of 49 Experiment 1), and 100 ms in length. Listeners S, T, W and Z each performed four runs at the thee levels. Results Results from Experiment 3 appear in Figure 11.15. The three levels appear on the x-axis, while the y-axis is still the coherence threshold for the listener to distinguish correctly 79% of trials. At the experiment-standard level of 64 dB, listeners performed eQuivalently to their previous results. As the level decreased, however, their performance clearly degraded. The results averaged over listener are found in Table 11.3. Table 11.3 - Level-dependent coherence thresholds Level dB) 64 44 34 Avera eThreshold 0.972 (0.006) 0.936 (0.016) 0.873 (0.043) Average coherence jnd threshold (n-l standard deviation) for three stimulus levels The pattern of thresholds follows Gabriel and Colbum’s findings, although the 1iStellers of Experiment 3 exhibited a much larger coherence jnd (0.064) at 44 dB than reported in the cited work (0.013). Once again, their listeners appear to be much more sensiti Ve to changes in coherence. An ANOVA reveals a significant effect due to level (F(2,6) e 26.45, p = 0.001). It would seem that coherence discrimination is indeed sensitive to level, and the “copier” analogy may be a fitting one. The model’s predictions based on nerve-activity threshold were borne out by the results. Conclusions 50 for coh in ill C l: In Experiment 1, some listeners demonstrated that envelope coherence discrimination can be just as good as that for fine-structure. However, not all listeners found envelope cues as detectable as fine-structure cues in this context, leading to higher coherence jnds for narrow bands above 1500 Hz. Another comparison, between narrow- and broadband discrimination, found that some listeners did have larger jnds for a broadband stimulus. However, the effect was extremely variable among subjects, and ANOVA results showed no significant difference. Experiment 2 showed that while some listeners exhibit increasing jnds for eth'emely short stimuli, others can discriminate coherence consistently well even at very short durations. This result is likely due to an ability to perform the task using position detection cues. Once those cues are eliminated, coherence jnds increase with decreasing duration for all listeners. The position of intracranial images (shifted off-center by an ILD) apparently has little or no effect on coherence discrimination. Experiment 3 results revealed that coherence jnds also increase as the stimulus level decreases, possibly due to the loss of the same position detection cues used in EXpel‘i ment 2 (as illustrated by the copier analogy). The information-gathering model, Which establishes expected results based on the limitation of data in frequency, time and level, was able to accurately predict the patterns shown by our listeners’ thresholds. A possrble exception was the case of bandwidth dependence, where results were inconclusive. 51 Furthe r Study This line of experiments could lead to many other studies of coherence. Possible future directions are listed below: A. Some readers have suggested that randomizing ILD by trial is not quite enough to eliminate the position cue. While the listeners would not be able to focus on one off-center location, they would still be able to compare positions within a trial rather than overall “width.” The experimental program could instead randomize by interval. B. All the coherence discrimination tasks described essentially tested coherence jnds with a reference of one. Another interesting problem is to start with a completely incoherent noise and increase the coherence until listeners could tell the difference, corresponding to a reference-zero coherence jnd. Blodgett, Jeffress and Taylor (1958) report a very small MLD when the masking noise is uncorrelated. This could have to do with decreased coherence sensitivity in the region near zero correlation. C. Many experimenters are studying the effect of critical bands (frequency ranges that the binaural system uses for analysis) on other tasks, and it has implications for the information-gathering model. It may prove interesting to rerun Experiment 1 with a wide range of stimulus bandwidths. 52 D. The reported results could relate to the study of coherence in rooms, and the use of ITD cues in rooms. This direction is pursued in Chapter III. E. Using the current experimental task, one might replace the slightly-incoherent interval with noise modified to simulate dispersion around the head from a certain incident angle, and test how well our listeners discriminate between the two. While the compact image of a coherent noise may be easily identifiable, the listeners will no doubt be used to experiencing phase shifts due to dispersion. This very problem is addressed in Chapter V. 53 References Bernstein, LR. and Trahiotis, C. “Binaural interference effects measured with masking- level difference and with ITD- and IID-discrimination paradigms,” J. Acoust. Soc. Am. 98, 155-163 (1995). Bernstein, LR. and Trahiotis, C. “The normalized correlation: Accounting for binaural detection across center frequency,” J. Acoust. Soc. Am. 100, 3774-3 784 (1996). Bernstein, LR. and Trahiotis, C. “The effects of randomizing values of interaural disparities on binaural detection and on discrimination of interaural correlation,” J. Acoust. Soc. Am. 102, 1113-1120 (1997). Bernstein, LR. and Trahiotis, C. “Inter-individual differences in binaural detection of low-frequency or high-frequency tonal signals masked by narrow-band or broadband noise,” J. Acoust. Soc. Am. 103, 2069-2078 (1998). Blodgett, H.C., Jeffress, LA. and Taylor, R.W. “Relation of Masked Threshold to Signal-Duration for Various Interaural Phase-Combinations,” Am. J. Psych. 71, 283-290 (1 95 8). COlburn, HS. and Durlach, N.I. “Models of Binaural Interaction,” Handbook of Perception vol. 4. ed. E. Carterette, pp. 498-503 (1978). Durlach, NJ. and Colburn, H.S. “Binaural Phenomena,” Handbook of Perception vol. 4. 9d. E. Carterette, pp. 365-466 (1978). Purlach, N.I., Gabriel, K.J., Colbum, HS. and Trahiotis, C. “Interaural correlation :1: Scrrnination: II. Relation to binaural unmasking,” J. Acoust. Soc. Am. 79, 1548-1557 9 86). Gabriel, K.J. and Colbum, H.S. “Interaural correlation discrimination: I. Bandwidth and le‘vel dependence,” J. Acoust. Soc. Am. 69, 1394-1401 (1981). I‘IEIJ'trnann, W.M. Signals, Sound and Sensation. AIP press, Woodbury, New York, 1997. P- 256 Jilin, M., Gallagher, D., Koehnke, J. and Colburn, H.S. “Fringed correlation C11Scrimination and binaural detection,” J. Acoust. Soc. Am. 90, 1918-1926 (1991). Kllslmpp, R.G. and Eady, H.R. “Some measurements of interaural time difference thresholds,” J. Acoust. Soc. Am. 28, 859-860 (1956). LeVitt, H. "Transformed up—down methods in psychoacoustics," J. Acoust. Soc. Am. 49, 461477 (1971). 54 Pickles, J .0. An Introduction to the Physiology of Hearing. Academic Press, New York, 1982. pp. 173-177 Pollack, I. and Trittipoe, W.J. “Binaural Listening and Interaural Noise Cross Correlation,” J. Acoust. Soc. Am. 31, 1250-1252 (1959). Rayleigh, Third Baron “On our perception of sound direction,” Philos. Mag. 13, 214-232 (1907). Robinson, DE. and Jeffress, L.A. “Effect of varying the Interaural Noise Correlation on the Detectability of Tonal Signals,” J. Acoust. Soc. Am. 35, 1947-1952 (1963). Stem, RM. and Trahiotis, C. “Models of Binaural Interaction,” Hearing ed. B. Moore, pp. 347-386 (1995). Trahiotis, C., Bernstein, LR. and Akeroyd, M.A. “Manipulating the ‘straightness’ and ‘curvature’ of patterns of interaural cross correlation affects listeners' sensitivity to changes in interaural delay,” J. Acoust. Soc. Am. 109, 321-330 (2001). 55 Figure 11.1. Bandlimits for narrowband stimuli used in this experiment. The vertical axis shows frequency in Hz. The 1500-Hz “cutoff” for fine-structure ITD cues is marked by a dotted line along the center axis. The lOO-Hz-wide bands (delineated by solid horizontal lines) fall above and below that cutoff at center frequencies 200, 500, 1000, 2000 and 4000 Hz. 56 Amuv 8:8th 9:: _oejoLwEH com d mecca Nikos. H NI com“ ........ >13 (2H) Kouenbadd >19 Figure [1.1 57 Figure 11.2. Description of the experimental task. Each trial, listeners are asked to choose the least coherent of two intervals. The intervals are ordered randomly so the coherent interval may occur first or second, as shown in two possible trials: the left two images or the right two images. The correct response is listed below each trial. 58 03833 as; 538% a 3 3:05:06 00:20:85 :ozsm £me coznm tog m 86:0 g 36:0 mo N EEBE # HESS m 3285 ~ 8285 865-388 €335-05 m_ _mE comm Mad. EncoEEon—xm Figure 11.2 59 Figure 11.3. Schematic of the experimental setup in the form of a patch diagram. Begin at the left, where noises A and B are generated. LPF stands for low-pass filter, PA is the Programmable Attenuator, “hA” and “gB” are the attenuated noises, Amp stands for Amplifier and Trap. Env. for the trapezoidal envelope that ramps signals on and off. The signals feed into left and right ears as shown. 60 2E 90:. mm (a (n. mm: Figure 11.3 850m 362 .955 .Om .O< 61 Figure 11.4. Estimation of coherence in the case of Programmable Attenuator (PA) inaccuracy. The x-axis is a range of coherences calculated given the PA settings, and the y-axis is the actual coherence based on assumed error. The solid-line plot is the diagonal, where expected and actual coherence agree. The dotted and dashed lines represent the actual coherences if the PA has a i0.1 dB error. 62 [TrilllTll'lljilIllll1lll1j1j11 L00 a _"m o 02 L__ .1 3 m m - o > > o m m - U-— - .._cnm .0 U'U'U (B m 1 ”CT": - we? - D. . 1 I .80 .85 Expected Coherence 1 .75 l1111l1111ltitilnntnliitilt1it”. IE 0 LO 0 LO 0 LO 0 ° C2 92 02 “2 “2 “I “2 GDUGJBLIOD |Dntov FigureII.4 63 Figure 11.5. Actual measurement of error in the Programmable Attenuator (PA). The x- axis is the range of attenuations set on the PA, while the y-axis shows the measured deviations from those settings. 64 AmE cozozcwio vo+ooaxm em we we mm om em 2 me o ofiol _ _ _ a s s a _ _ s _ _ _ e _ s _ _ Figure 11.5 65 (SP) uouopedxa — Bumoea Figure [1.6. Results of Experiment 1. Thresholds for six listeners are broken up into two plots for ease of viewing. Both plots list the band limitations along the x-axis (five center frequencies for the lOO-Hz bands and the broadband case listed at the extreme right) and the coherence thresholds along the y-axis are just noticeably different from perfect coherence. The alternate y-axis on the right side of each plot lists the relative level of the independent noise (with respect to the coherent target noise) necessary to achieve the coherences marked. Results for listeners Z, V, and W are plotted in the upper graph; P, M and N are in the lower. Error bars on data points represent two standard deviations. 66 Coherence. Coherence 1.00 .95 .90 .85 .80 1.00 .95 .90 .85 .80 T‘T'T'l'l'l'l'l‘l‘l'l'l'l'l'l'l'l'T'l'T'l lrltlilrlrltlrltltltltltltjiltltltltltltl “it I—I Z ‘ ..... ‘ V O-uo W .L L l 1 l l 1 1 11 L l l l 1 l I 200 500 1000 2000 4000 88 Center frequency (lOO-Hz bond) n'trltr'lrr‘ltt'm'l'I'I't'l'l'ltl'ltt'r IlIlIlILLlIlIlIlIlJlIlILIIIJJlLLIIIIELIII \\ -- \\ \\ V x— —x P v—--v M o-—+ N I I l I I J I I I I I I I I I I 200 500 1000 2000 4000 85 Center frequency (lOO-Hz bond) 67 -99.9 ~10.ll -7.212 -5.567 Relative level: independent noise (dB) -4.437 -99.9 (dB) DOISO -10.11 -7.21 2 -5.567 Relative level: independent -4.437 Figure 11.6 Figure 11.7. Calculated normalized correlations in a Masking Level Difference experiment. The stimuli include No (lOO-Hz wide band of diotic noise) and Sn (antiphasic tone centered on the noise band), and their relative levels are established by the signal-to-noise ratio along the x-axis. The y-axis gives the calculated normalized correlation subsequent to half-wave square-law rectification and low-pass filtering. The various plots show correlations dependent on a variety of tonal signal frequencies. (Bernstein and Trahiotis, 1996) 68 od Nd vd 0.0 0.0 0.? on as 25 cm 9 o .. N: 88 C n . O 0 0 o c o I I 0 O 0 o . 0‘0‘000.m0\0.0\0000\ . o . IIIOOII \‘O o o o 0000000 o o o o 0 C o o 0 o o a 0 o 0 0 0 o o o o ~I com—... ... \. WM“. od Nd v.0 md md o... uogiejenoo pezuewJoN Figure 11.7 69 Figure 11.8. The concept of the envelope. A lOO-Hz wide band of noise centered on 2 kHz produced this signal, shown as a function of time. The signal waveform is rapidly- varying. By contrast, the signal envelope (marked by dashed lines along the waveform peaks) is slowly-varying, with a most common frequency of 50 Hz. 70 IDU5!S 71 20 30 40 Time (ms) 10 Figure 11.8 Figure 11.9. Results of Experiment 2: variable duration. Coherence thresholds for four listeners are divided into two plots for purposes of comparison. Both plots have a logarithmic scale of signal duration (nine values between 5 to 500 ms) on the x-axis, and the coherence necessary to discriminate from perfect coherence along the y-axis. The alternate y-axis on the right side of each plot lists the relative level of the independent noise (with respect to the coherent target noise) necessary to achieve the coherences marked. Results for listeners S and T are found in the upper graph, while those for listeners W and Z are in the lower. Error bars on data points represent two standard deviations. 72 Coherence Coherence l .00 .95 .90 .85 .80 .75 .70 l .00 .95 .90 .85 .80 .75 .70 IIIIII IIIIIII I I I 1 P 1 -99.9 a 7- - 3 L -- —- riff : L ,% ,if’h‘r ’ 140.118 t // ” \ 'I g :- . ,4- "" {-7.212 ‘57 " / I .. 'U . , . c " / / _ 0 :- xl - -5.567 a - I : “U I v’ a .l .E .— \V S O---§ 1 -4.437 735 L: T V---' : E 1: - _: _ a L.- 1LD=0 ; 359° .3 ” I 76 t- .. - -2.924 a: I I J I II I l I I I I I II I j I I 5 10 20 40 60 100 170 300 500 Duration (ms) :- l 1 I I I l I I I I I I I I] I I I I 1 -99.9 a - ”fir“ - 8 : é . . . a? : m — £ ------- —-1o.11,gr_ .- .l. / ‘ O : -—§—— ’ : c :- E -7.212 ‘g P .. u ' .. c I .. a - - -5.567 3' . j “g : w 0-—0 : -: :- Z I---I 14.437 g : : _g L -' - . 9 ° - ILD=O ; 35 ° :3 - q _2 ' . a :- l 1 . I I - -2.924 0: 5 10 20 40 60 100 170 300 500 Duration (ms) Figure 11.9 73 Figure 11.10. Results of Experiment 2: variable duration with randomized ILD. Coherence thresholds for four listeners are divided into two plots for purposes of comparison. Both plots have a logarithmic scale of signal duration (five values between 5 to 500 ms) on the x-axis, and the coherence necessary to discriminate from perfect coherence along the y-axis. The alternate y-axis on the right side of each plot lists the relative level of the independent noise (with respect to the coherent target noise) necessary to achieve the coherences marked. Results for listeners S and T are found in the upper graph, while those for listeners W and Z are in the lower. Error bars on data points represent two standard deviations. 74 Coherence Coherence 1.00 .95 .90 .85 .80 .75 .70 IIIIII r I IIIIII' 1 .00 .95 .90 .85 .80 .75 .70 T r: -: —99.9 g L ..-.:::’-’-:-"‘{ -'-' -10.11 ,3 ' ‘ o 3 -' ,IC’ 2 c :- zl’l’ J;,” :"7.212 g I I’ ”””” g 5 :- I x/ "' {—5.567 a I 7‘ ’ / - E " / .. .. E- “ S o—--o {—4.437 iii .. T V---' - 6 .. , __ . __ ; - m : i” -4 (1) v4 — -v __ ‘ —;’; P O 1 LIILIII‘llIIJIIIIIIIIIIIIIIIIII C) LO 0 LO 0 LO O O. 0’. 9?. °°. °Q ". '\. GDUGJGHOO Figure 11.15 Chapter III: ITD discrimination of several stimuli Introduction Coherence is important for a listener in a room with a sound source (see Figure 111.1). The direct sound (solid line) will reach the listener unobstructed, producing a very coherent signal that also has an interaural time difference (ITD) because it reaches his left ear first. However, the listener also receives sound that has reflected off walls and other obstructions. These sounds follow trajectories that are delayed and will have various ITDs due to their multiple reflections. The combination of reflected sounds may be very random and incoherent, and interfere with the listener’s ability to hear the direct sound. In a highly reverberant environment, it becomes difficult to identify the direction of the source as the direct sound is “drowned out.” As a test of listeners’ ability to detect ITD, experimenters often measure their ITD resolution or just-noticeable-difference (jnd) (Durlach and Colbum, 1978). The jnd is defined as the smallest range of signal image movement along the internal delay axis (the “lateralization” axis) that a listener can reliably identify. Clearly, an image that moves 800 us along the axis is easier to lateralize than one that moves 40 us. This has much to do with the variability of lateral judgments. For an image that moves 40 us, there is a higher probability of misjudging the direction of motion. See the discussion of Signal Detection Theory in the General Appendix B for more details. With the introduction of a masking noise, the task becomes even more difficult. Jeffress et al. (1962) asked listeners to center a correlated target noise in their heads by varying its ITD. As the 86 interfering masker intensified (decreasing the interaural correlation), their judgments became more erratic to the point where they selected random ITD values. Trahiotis et al. (2001) tested ITD discrimination for stimuli of varying “straightness” (an adjustment they made that happened to lower coherence), and concluded that ITD jnds were poorest for lower cross-correlations. The study reported in this chapter took this research a step further, with an experiment designed to establish the interaural coherence necessary to lateralize various ITDs. Experiment 1: ITD discrimination as a function of interaural coherence (COH) Method - Interference experiment See Figure 111.2 for a diagram of the experimental setup. The target signal was a coherent target noise (CTN) in both cars. This analog noise (N2) was generated by a Zener diode, and was white, thermal, and Gaussian. The signal bandwidth ranged from 0 ' to 10 kHz, where it was cut off by a Brickwall filter with -115 dB/octave rolloff. This target was delayed x us in the left ear, where x was selected each run from the set of 20, 40, 100, 200, and 400. The signal to the right ear was delayed either 0 or 2x us, producing an ITD of :tx us that served to offset the target image to either the left or right inside the listener's head. Both delays were introduced using a BSS PCM delay line, model TCS-803, which had three taps and a resolution of 10 us. Two other mutually-independent noise sources (N1 and N3), one for each ear, provided a 50-dB incoherent masker when mixed with the target. Both interfering masker noises were low-pass filtered, cutting off at 10 kHz with a rolloff of -48 dB per 87 octave. The mixed signals were sent through matched exponential voltage-controlled amplifiers (EVCAs), which were in turn ramped on and off by a trapezoidal envelope generator (Trap. Env.). Experimental runs presented listeners with trials composed of two 500-ms intervals. The first interval would have a positive or negative ITD imposed on the CTN, offseting the coherent image to the left or right. After a 500-ms pause, the second interval would have the ITD reversed, so the signal image would seem to move from left to right or vice versa (see Figure 111.3). The listeners were faced with a two-altemative forced-choice (2AF C) task to identify the masked coherent image’s direction of “motion” by pushing a button. The run did not continue until the listener had responded. There were no onset cues available for lateralization, because both left and right channels were ramped on simultaneously by matched voltage-controlled amplifiers with a 30-ms rise time. Subjects performed runs in an Acoustic Systems soundroom, listening through Sennheiser HD-480 headphones. The intensity of the CTN was determined by a 3-down l-up staircase as defined by Levitt (1971), so that if the listener answered correctly three times in a row, the target became softer, but just one incorrect response caused it to become louder (see General Appendix A). The target level was 60 dB at the beginning of a run, and the step size started at 4 dB, changing to 2 dB after the first turnaround. The higher initial step size allowed listeners to approach their threshold region quickly, and the subsequent step size increased precision in that region of interest. Listeners did three runs lasting two to three minutes at each of the five ITDs. 88 No-interference experiment To discover how much the interfering masker noise degraded the resolution of lateral movement across different ITDs, the same experiment was performed with no masker. Once again, each listener performed three runs at the same five delays. Absolute Threshold experiment Finally, to see how the results from the non-interference experiment compared to absolute thresholds (the level necessary to detect the signal’s presence), listeners participated in a detection experiment. The listeners were given a 2AF C task, where the intervals were defined by two lights (yellow and orange). The CTN was only presented concurrently with only one of the lights, and the listener had to identify which. All other parameters were essentially the same, except that each ITD was tested once with the left- leading delay, and then with the right-leading delay, to insure no preferential treatment of either ear. Thus, listeners performed two runs at five delays, each lasting from three to four minutes. One would expect all delays to yield similar thresholds, because there is no reason to believe that detection would depend on ITD; nevertheless, all were tested in order to be certain and averaged each delay over left- and right-leading conditions. Listeners The listening group was comprised of six subjects, each identified by one letter. 0 B, male age 45, common middle-age high-frequency hearing loss, very experienced in hearing studies 89 o L, female age 19, normal hearing, previous experience in localization studies 0 S, male age 17, normal hearing, previous experience in localization studies 0 T, male age 18, normal hearing, previous experience in localization studies 0 W, male age 58, common middle-age high-frequency hearing loss, very experienced in hearing studies 0 Z, male age 24, normal hearing, previous experience in localization studies Results Figures 111.4-9 show each listener’s individual results, while Figure 111.10 contains results averaged over all listeners. The individual results are intended to provide the reader with a sense of inter-subject variability. The staircase-method thresholds for the three experiments are plotted with respect to ITD. The dashed line indicates the level of the interfering masker noise on the runs in which it was present. Triangles, circles, and diamonds represent the interference, no-interference, and absolute threshold experiments, respectively. The right axis lists the coherence between left and right noises as a function of the correlated target level with respect to the uncorrelated masker level, and is only relevant for the interference results. Because the coherence scale is not linear, the coherence value corresponding to each data point can be found next to that point. Coherence is calculated according to the ratio of coherent (identical) noise power in both 90 ears to total incident power. Coherence is related to the threshold levels using the following derivation: p a 1% Coherence = 60" - T — M T Ptot —PM+PT_1+PA M (L/ ) level L(dB)=1010g(P/P0)-)P=P010 10 10(LT—LM)/10 1+10(LT-LM)/10 Coherence = The subscripts T and M refer to the target and masker, respectively. Discussion Absolute Threshold The plots bear out the expectation that all absolute thresholds averaged over left- and right-leading delays should be similar. There was some inter-subject variability, however. Listeners W and B’s thresholds are somewhat higher than those of the other subjects, which is not surprising because they were older. However, all listeners exhibited thresholds 10-20 dB above the threshold of hearing (0 dB). We can explain this apparent insensitivity by noting that the CTN had a lO-kHz bandwidth. The threshold of hearing is based on measurements of a sine tone. Thus, the CTN had a much lower spectral density at each frequency, and the listeners had less signal power available at their most sensitive frequencies (around 4 kHz, according to equal loudness contours in Hartmann, 1997). Assuming a 2-kHz-wide band of highest sensitivity, the actual level of this band would be 7 dB below the level of the lO-kHz-wide CTN. For the younger listeners, whose thresholds fell near 10 dB, this translates to a more reasonable absolute 91 threshold of 3 dB. Once satisfied that these detection thresholds met expectations, one can use them as a basis of comparison for the discrimination data garnered from no- interference runs. No-interference Note that the no-interference curves do not parallel the absolute threshold curves and have significantly higher thresholds. All listeners follow the same trend of higher thresholds at the lower delay times, with a monotonic and gradual shift as the ITD decreases. At 400 us ITD, the lateralization (non-interference) thresholds are still 10 dB higher than those necessary to detect the CTN (absolute thresholds). In order to see if the coherence thresholds vary significantly as a function of ITD shifi, analysis of variance (AN OVA) was employed. The test shows a significant difference overall (F(4,20) = 30.343, p < 0.001), and so Experiment 1 finds that the magnitude of ITD shifi does affect discrimination, requiring higher coherences at smaller shifts. This result is a natural consequence of Signal Detection Theory (see General Appendix B), which states that uncertainty in each ITD measurement will limit listeners’ ability to correctly identify the direction of shift. It is quite likely that lower stimulus levels introduce more ITD uncertainty. Interference The interference-case data show a very similar behavior to the non-interference thresholds with respect to ITD magnitude, although the thresholds are consistently higher than those for the no-interference case. The higher thresholds show that the SO-dB 92 masker degrades the ability to distinguish direction. As a check whether the two curves are truly parallel, each was averaged over the five ITDS. By subtracting the difference between the two means from each interference threshold, the interference curve was shifted down to the non-interference level. The resulting curves may be essentially identical. The ANOVA test on the modified results (F(4,20) = 0.652, p =_ 0.632) concludes that there is no difference, and thus, the average listener may have a constant difference between interference and non-interference thresholds. This parallelism between curves indicates that the external and internal interfering noises behave similarly. The internal, neural noise that represents the randomness of binaural processing produces the same ITD-dependent effect at a lower threshold. This result is an integral assumption of many binaural-interaction models (Colbum and Durlach, 1978). Experiment 2: ITD discrimination as a function of high-pass cutoff (COB) Introduction- The salience of high-fiequency IT D cues It has been previously noted by numerous psychoacousticians that the interaural time difference (ITD) information provided by high-frequency tones is useless to the auditory system (Durlach and Colburn, 1978). This is true simply because the waveform is cycling so rapidly, and peaks are so close together in time, that a given delay at high frequencies could cause many full cycles of phase shift (e.g. 0, 9+21t, 0+41r. . .). Physiologists have also noted that auditory nerve spikes lose phase-locking with high- frequency waveforms (Pickles, 1982), which could render the auditory system unable to identify an interaural delay. ITD information is too uncertain at high frequencies to be an 93 accurate tool for lateralization. Previous papers have estimated that fine-structure ITD cues lost their salience above 1500 Hz (Durlach and Colbum, 1978), which is in agreement with Rayleigh’s (1907) duplex theory. As reported by McFadden and Pasanen (1978) and Henning (1980), it is possible for human listeners to lateralize high-passed noise on the basis of interaural time differences in the envelope. High-frequency narrowband signals featuring rapidly- varying waveforms can still have slowly-varying envelopes, which are just larger constructs composed of the waveform peaks. See Figure 11.9 in Chapter II for an example. Since envelope structures may have a much longer period, the auditory system can extract meaningful ITD cues even when fine-structure cues are unusable. However, the ability to do so makes great demands on the interaural coherence of the noise, as found by Hartmann, Constan and Rakerd (1998). High-fiequency IT D cues in the room environment Experiment 1 verified that smaller shifts in ITD required more coherence to lateralize in the broadband regime. Experiment 2 involves expansion of the “head in a room” model in Figure III. 1. An interesting aspect of the reflected sound is that all surfaces have an absorption coefficient, which tends to increase with frequency. Thus, the reflections are low-pass filtered somewhat, and sound in the higher frequency bands will therefore be more coherent than lower-frequency components. To further test the limits of the auditory system, and more accurately represent the room conditions described above, a new experiment featured a high-pass-filtered coherent stimulus, effectively eliminating fine-structure ITD cues. 94 The expanded concept of the room environment inspired a new study much like Experiment 1. The new stimulus consisted of a broadband incoherent noise, representing the reflections and any ambient interference, and a high-pass filtered CTN with cutoff frequency fc, representing the direct sound. The coherent portion also had an applied ITD. Using this revised stimulus, the goal of the experiment was to determine the coherence threshold required to lateralize a high-pass—filtered target noise in the presence of a broadband masker. Method Refer to the diagram of the stimulus (Figure 111.1 1), composed of two parts for each car, left and right. The stimulus was a coherent noise (A) that was 60 dB SPL in the broadband case. Its spectrum ranged from the high-pass cutoff to 10 kHz. Spectral density was held constant throughout the experiment, so the total signal power varied with cutoff frequency fc. Using a Stanford high-pass filter with -115 dB/octave rolloff, Experiment 2 set fc to seven values: 0, 500, 1000, 1500, 2000, 3000, and 4000 Hz. One should recognize that for fc < 1500, the listener still had access to fine-structure ITD cues in the target. However, when the cutoff was 1500 Hz or greater, only the envelope structure provided ITD cues. This noise was digitally generated, and was white, thermal, and gaussian. A delay was imposed on the left or right ear to produce an ITD. Two thermal noise sources (B and C), one in each ear, provided the incoherent masker. These noises were independent of noise A and each other. Their spectra ranged from 20 Hz to 10 kHz. Through the use of programmable attenuators, the experimental software was able to set their levels between 40 and 80 dB. 95 At first inspection, it would seem that the coherence of the high-pass-filtered target in a broadband masker would be lower than for a broadband target, since there is less power in the hi gh-passed signal. According to a conventional measurement, the overall coherence is indeed lower. . .however, the binaural processor analyzes the broadband input in a different manner. According to critical band theory (Moore, 1995) the auditory periphery separates stimuli into discrete bands that are processed separately. Thus, only masker power in the same bands as the target noise provides any interference. Coherences calculated within each band better represents the information available to the auditory system, and tend to be higher than broadband values. This approach will be reviewed in more detail later in the chapter. Experiment 2 extends the masking noise below the target’s high-pass cutoff simply to better simulate the expanded room environment model. During the run, stimulus coherence was controlled by again varying the relative levels of target and masker according to a 3-down, l-up Levitt staircase function. Unlike Experiment 1, the masker level was the staircase variable (due to setup requirements), though this makes no difference in the coherence calculation. The change is significant because in Experiment 1, the CTN level tended to be lower than or equivalent to the masker, and so when its level moved on the staircase the overall level did not change much (a desirable result). In Experiment 2, the masker levels are low enough with respect to the CTN that choosing them as the staircase variable is the best way to maintain consistency in overall level. If the listener answered correctly on three successive trials, the program raised the incoherent masking noise level by 2 dB, lowering the overall stimulus coherence and making the task more difficult. If listener 96 answered incorrectly, the program lowered the masking noise level by 2 dB, increasing coherence and making the task easier. Runs continued for 14 turnarounds with the coherence value recorded each time. The thresholds were an average of the last 10 coherences. Between runs, the experimenter had control of two parameters. They were the same five ITDS used in Experiment 1 (20, 40, 100, 200 and 400 us), in addition to seven high-pass cutoffs (0, 0.5, 1.0, 1.5, 2.0, 3.0 and 4.0 kHz). The experimenter randomly varied these parameters so no two successive runs had identical stimuli. All listeners did three runs at each combination of ITD and high-pass cutoff, except for those combinations at which they failed on three runs (defined as requiring a coherence > .99 to do the task). Thus, their input totaled a maximum of 105 runs. Four experienced listeners participated in this experiment, including the previously-mentioned T, W and Z. A new listener was added for this study: 0 M, female age 21, normal hearing, no previous listening experience Results and Discussion Average coherence thresholds for all four listeners can be found in Table III.1 below. They are grouped by cutoff values for a particular ITD. 97 Table [11.1 - Coherence thresholds by ITD and cutoff ‘ITD HP cutoff M T W Z 20 us 0 0.772129 0.414501 0.830445 0.328759 500 0.802909 0.528751 0.827178 0.344179 1000 0.888184 0.57428 0.403371 40 us 0 0.359935 0.236075 0.454078 0.15098 500 0.403371 0.257464 0.42572 0.189905 1000 0.655821 0.375998 0.90717 0.212073 1500 0.957242 0.839951 2000 0.901187 100 us 0 0.223845 0.109549 0.20076 0.063327 500 0.253087 0.121295 0.261891 0.10299 1000 0.618584 0.197091 0.517263 0.10299 1500 0.833662 0.755519 0.823862 0.681322 2000 0.953309 0.823862 0.72911 3000 0.910976 0.823862 200 us 0 0.189905 0.109549 0.20076 0.054413 500 0.13411 0.066114 0.15098 0.05561 1000 0.27089 0.15098 0.42572 0.09283 1500 0.738109 0.540208 0.763925 0.4201 2000 0.78013 0.613137 0.738109 0.634743 3000 0.94676 0.86589 0.914641 0.629389 4000 0.966488 0.79552 400 us 0 0.148052 0.06755 0.13411 0.042758 500 0.28946 0.063327 0.176138 0.066114 1000 0.585499 0.261891 0.540208 0.333861 1500 0.557312 0.488489 0.528751 0.414501 2000 0.645352 0.482737 0.607661 0.197091 3000 0.772129 0.751241 0.681322 0.505756 4000 0.86589 0.84902 0.876229 0.700977 Coherence thresholds for various ITDS and cutoffs in Experiment 2 Note that the shorter-ITD data do not extend to all seven cutoff values, and some cells are blank. This is because at several conditions, listeners could not complete the task regardless of the interaural coherence. After three failures, that combination was 98 dropped. Some listeners were successful where others failed, but all ran into limitations at some point. The combination of short ITDS and high cutoff values proved too difficult overall. To facilitate a clear understanding of the relationship between coherence, frequency distribution, and ITD lateralization, Figure 111.12 contains data averaged over all listeners and plotted on a graph of coherence versus high-pass cutoff. The x-axis is the range of high-pass cutoffs, from 0 to 4 kHz. The y-axis is the interaural coherence threshold necessary to lateralize with 79% accuracy, as targeted by the staircase. The five plots are for the various ITD values. While taking the average threshold may eliminate some of the subtler effects within each listener’s results, those results all shared a common behavior much like that shown in Figure 111.12. Also note that the graph only includes points where data were obtained for all listeners. If one or more listeners failed at a particular condition, the experimenters did not average over the other listeners’ results to establish a threshold. Some general trends in the data are readily apparent. Runs with the longest ITD value tended to require the least coherence for adequate performance. While there are exceptions to that rule, the 400 us plot is clearly well below that of the 20 us condition. Attending to the finer details reveals that the ITD dependence of thresholds is nowhere near as smooth as in Experiment 1. Observe all points with 500- or 1000-Hz cutoffs: the threshold does not increase monotonically with ITD. The reversal of 200- and 400-us points at a 500-Hz cutoff is almost negligible. However, their positions at a 1000-Hz cutoff, where every listener required a lower coherence to discriminate ZOO-us ITDS, cannot be ignored. It is counter-intuitive that listeners might lateralize a smaller ITD 99 more effectively, a fact in opposition with nearly all other data on Figure 111.12. A possible explanation arises if listeners tended to concentrate on the lowest available frequencies (i.e. those near the cutoff) when discriminating ITD. A time shift between i400 us at 1000 Hz corresponds to a phase shift of nearly 1t, and interaural phase differences of almost half a period lead to confiJsion and reversal of the image along the internal delay axis. Thus, one might expect listeners to have more difficulty at that condition. Runs with the lowest high-pass cutoffs tended to require lower coherence thresholds. Note the shortest ITD, 20 us: once the high-pass cutoff exceeded 500 Hz, listeners went above the 99% coherence threshold quickly, and so this plot drops off the top of the graph (indicating failure). Longer ITDs do the same at higher cutoffs: listeners fail at 40 us ITD above 1 kHz, 100 us above 1.5 kHz, 200 us above 3 kHz, and listeners could still lateralize a 400 us ITD for a cutoff of 4 kHz. Also note that all listeners exhibited maximum slope in the vicinity of lkHz, a region near the frequency limit for fine-structure ITD cues. It would seem that envelope ITDS proved quite useful, since listeners could still perform the task for cutoffs of 1500 Hz and above. However, the required interaural coherence grew rapidly as the high-pass cutoff frequency increased through the critical region from 1 to 4 kHz, especially for the smaller ITD shifts. Thus, it appears that envelope cues are very sensitive to coherence, indicating that they are more vulnerable to masking and reverberated sounds. Fine-structure ITD cues, on the other hand, were still effective at very low coherences. This has implications for the localization of broadband sounds in rooms where the interaural coherence tends to increase with increasing 100 frequency, but possibly not rapidly enough to allow the envelope timing information to contribute usefully. The ability to lateralize could depend very much on the room characteristics. Experiment 3: ITD discrimination of high-pass-filtered targets and maskers (COD) Introduction In Experiment 2, the CTN was high-pass filtered, while the masker was always broadband (20-10000 Hz). The calibration procedure required the experimenter to keep the target’s spectral density constant, which led to different total powers depending on the cutoff (see the top half of Table 111.2). This procedure was justified by citing critical band theory, which states that masker power outside the target band should not contribute to the incoherence. However, the auditory filters measured by several experimenters do not feature the ideal bandlimits that our program was able to achieve. The cutoffs are much more gradual (Moore, 1995), especially for frequencies below the band. This inaccuracy leads to an effect called the “upward spread of masking.” Hartmann (1997) shows how a lower-frequency component can excite fibers of much higher CF, effectively masking signals of higher frequency. Trahiotis et al. (2001) found that interference by adjacent bands reduced the binaural release from masking in an MLD experiment centered at 500 Hz. Their listeners’ thresholds decreased as they widened the presented band, and continued to worsen as the bandwidth exceeded the critical bandwidth of 100 Hz. 101 Table [11.2 - Calibration settings Target Target Target Masker Masker Masker cal cutoff level spec lvl cutoff cal lvl spec lvl EXP 0 60.0 20.0 0 60.0 20.0 2 500 59.8 20.0 0 60.0 20.0 1000 59.5 20.0 0 60.0 20.0 1500 59.3 20.0 0 60.0 20.0 2000 59.0 20.0 0 60.0 20.0 3000 58.5 20.0 0 60.0 20.0 4000 57.8 20.0 0 60.0 20.0 EXP 0 60.0 20.0 0 60.0 20.0 3 500 60.0 20.2 500 60.0 20.2 1000 60.0 20.5 1000 60.0 20.5 1500 60.0 20.7 1500 60.0 20.7 2000 60.0 21.0 2000 60.0 21.0 3000 60.0 21.5 3000 60.0 21.5 4000 60.0 22.2 4000 60.0 22.2 Comparison of overall and spectral levels set by calibration routines in Experiments 2 and 3 as a function of target and masker cutoff frequencies The third and final experiment presented listeners with a different stimulus—-one that limited both target and masker to the same frequency range, so their respective levels were identical on initial calibration (compare the bottom half of Table 111.2 with the top). Experiment 3 is useful as a check on our hypothesis that the masker power below the target bands in Experiment 2 did affect the lateralization threshold. If this hypothesis is true, then the inclusion of upward spread of masking (from the low-frequency portion of the masker present in Experiment 2) indicates we should see lower coherence thresholds (i.e. higher masker—level thresholds) for Experiment 3 at higher cutoffs. If not, then the threshold spectral level for a particular high-pass cutoff should be the same in both experiments. 102 Method Experiment 3 reran the 200-us ITD condition of Experiment 2, with two significant differences. First, the bandlimits have changed as noted above, so that both target and masker are filtered identically. Second, this filtering is done when both noises are generated digitally, to ensure accuracy. The program arranges a frequency spectrum in the Tucker-Davis Array Processor, assigns random phases to each frequency, and constructs the noise with an inverse F FT. The experimenter’s only responsibility is to ensure that each run starts at the same level: 60 dB SPL for both noises, or 63 dB for the combination. During the run, the program uses the same level staircase (2-dB steps) as Experiment 2, but it calculates the relative intensity of the incoherent noise and scales the digital waveform appropriately, so the Programmable Attenuators are unnecessary in this particular task. Three listeners participated, including the aforementioned W and Z. A new listener was also included, and he also did runs of the ZOO-us Experiment 2 condition in order to provide comparison data: 0 X, male age 26, normal hearing, previous experience in localization studies Results Table 111.3 includes masker-level thresholds for all three listeners in both experiments, arranged according to high-pass cutoff of the masking noise. The calculated spectrum levels corresponding to each threshold are listed as well. When comparing the results of Experiments 2 and 3, one might note how they are similar when the masker 103 cutoff is low in Experiment 3, and deviate from each other as that cutoff increases. These data were converted to coherence values and plotted in Figures 111.13-1 5, one for each listener, then averaged over listeners in Figure 111.16. The x-axis charts high-pass cutoff, while the y-axis measures the threshold coherence at that condition. Thus, when target and masker are at equal spectrum level, data points fall along the “0.5” line. The figures clearly illustrate a strong divergence that begins near 1500 Hz (or earlier, as in the case of listener X), a region that has some significance for fine-structure and envelope ITD cues as mentioned earlier. The difference is rather clear and dramatic for each listener. Table [11.3 — Results comparison Target Masker Masker Masker Masker Masker Masker Masker cutoff cutoff threshold spec threshold spec threshold spec W lvl W X lvl X Z lvl Z EXP 0 0 66.0 26.0 66.7 26.7 72.4 32.4 2 500 0 67.5 27.5 64.8 24.8 72.3 32.3 1000 0 61.3 21.3 59.9 19.9 69.9 29.9 1500 0 54.9 14.9 56.9 16.9 61.4 21.4 2000 0 55.5 15.5 56.8 16.8 57.6 17.6 3000 0 49.7 9.7 53.0 13.0 57.7 17.7 4000 0 51.7 11.7 54.1 14.1 EXP 0 0 64.8 24.8 68.3 28.3 73.8 33.8 3 500 500 65.5 25.7 66.0 26.2 73.3 33.5 1000 1000 60.9 21.4 63.3 23.8 69.7 30.2 1500 1500 59.1 19.8 59.7 20.4 62.3 23.0 2000 2000 59.4 20.4 59.7 20.7 62.1 23.1 3000 3000 58.4 19.9 59.2 20.7 62.1 23.6 4000 4000 57.3 19.5 57.6 19.8 61.4 23.6 Comparison of threshold overall and spectral levels in Experiments 2 and 3 frequency above the 1500-Hz limit. This would seem to indicate that once fine-structure ITD cues have been eliminated, the usefulness of envelope cues is relatively unaffected In Experiment 3, listeners were relatively unaffected by increases in cutoff by the loss of further high-frequency bands. However, the additional low-frequency masker in Experiment 2 seems to degrade thresholds at higher cutoffs. As the target cutoff increases, the loss of bandwidth is compounded by the remaining masker power in frequencies removed from the target. The dependence of threshold on cutoff in Experiment 2 supports the hypothesis that listeners experience upward spread of masking in this experiment. Discussion Analysis with respect to known models Experiments 2 and 3 tested the ability of listeners to follow the movement of a high—pass target coherent noise in an incoherent masker. It is clear that the task becomes more difficult as the signal-to-noise ratio decreases, but how the auditory system’s decision-making process is affected by increasing the high-pass cutoff is not as obvious. Experiment 2 revealed upward spread of masking, and established a general behavior for thresholds at various ITDS and cutoffs. How do these findings fit into known models of the auditory system? The Jeffress model These experiments are intimately connected to J effress’ cross-correlation model of binaural perception. The results clearly show that increasingly difficult lateralization tasks (e. g. shorter ITDs) require a higher amount of interaural coherence to maintain the same level of performance. The coherence is represented in the Jeffress model by coincidence counters, and a higher coherence threshold means that the signal from one of the coincidence detectors must be stronger to establish a reliable indication of ITD, or 105 else the broad incoherent noise (which produces random coincidence data) will effectively mask the coincidence signal. Physiologists have noted a high concentration of comparatory cells sensitive to low-frequency sound in the medial superior olive (Yin and Chan, 1990), which would support the notion that ITD is primarily a low-frequency cue. When lateralizing high-pass signals with increasing cutoff frequencies, one does still have access to envelope ITDs, but has lost the waveform cue. To produce an equally strong coincidence signal for ITD lateralization, a high-frequency signal should require increased interaural coherence. The auditory nerve model The auditory-nerve—based model of Colburn (1977) presents an effective way to predict how the addition of incoherent noise should affect ITD discrimination. It assumes that neural firing rates in the auditory nerve are distinguished by a Poisson process, depending on the characteristic frequency (CF) of the particular fiber and temporal features of the presented stimulus. Colburn’s approach thus incorporates an internal, neural noise in addition to the external masking noise. Fibers of the same CF from left and right ears feed into a set of coincidence counter neurons distinguished by different delay lines, which operates much like the matrix of counters described by Jeffress (Colbum and Durlach, 1978). The model assumes that coincidence is a fimction of simultaneous firings of fibers from left and right, and thus, the counter output is another Poisson process that depends on the interaural lag and a time window. This output, when measured across several counters featuring different lags for one CF, forms a crude representation of the interaural cross-correlation for the neural signal at that frequency. 106 The auditory system can identify ITD by locating the lag with the most coincidences, which corresponds to the cross-correlation peak. Incoherent noise adds a random element to the Poisson firing of left and right ears, necessarily reducing the coincidence rate and lowering the peak. The flatter cross-correlation fimction introduces uncertainty to the ITD decision variable. The equalization-cancellation model The equalization-cancellation (E-C) model of Durlach (1972) suggests that the auditory system’s ability to detect a signal depends on noise suppression through various attenuations, time-shifts, and subtractions. It also includes the concept of neural “jitter,” which provides an internal noise that prevents the auditory system from achieving perfect performance. The jitter in the time domain is assumed to be independent of frequency. At higher frequencies, the constant time jitter translates into a large variability in phase, and thus the internal noise is much more effective. In this manner, the jitter introduces more incoherence to the system. Note that the E-C model is typically applied to MLD tasks rather than lateralization, but its definition of neural jitter would suggest that ITD discrimination should weaken as the high-pass cutoff increases. The position-variable model An updated version of the Jeffress model, the Colbum-Stern position variable model (Stern and Colbum, 1978), predicts the location of a lateralized image by computing the center of mass of all the coincidence-counter outputs. Because of its relation to the Jeffress model, the position variable model relates to this chapter’s results 107 in much the same way: a higher value of coherence will produce a larger response from the coincidence counters. This sharper peak leads to a better-defined centroid, and improves lateralization. The use of the centroid as a decision variable is more robust than simply including the cross-correlation peak. However, this model has no explicit elements to include the contribution of the envelope, which leads us to the next model. The coherence-based model Bernstein and Trahiotis (1996) have developed a model in which the auditory system detects target signals by their normalized cross-correlation, afier the auditory stimulus has been low-pass filtered. The filter has no effect on low-frequency signals, but as the frequency increases, it serves to extract the envelope of the waveform while eliminating the fine structure that conveys no useful ITD information. This model accounts for envelope-based ITD cues, predicting that high-pass filtering will still allow the listener to complete the task. This section analyzes the results of Experiments 2 and 3 using the model developed by Bernstein and Trahiotis and their colleagues (Bernstein and Trahiotis, 1996; Trahiotis et al., 2001), as well as Colbum, Stern, Kohlrausch and all their colleagues (Colbum, 1977; Stern and Colbum, 1978; Kohlrausch et al., 1997). This model has become a standard for narrowband binaural hearing, and has been applied to Masking Level Difference tasks that involved detection of a tone in narrow-band noise (van der Heijden and Trahiotis, 1997), as well as narrow-band lateralization tasks (Trahiotis et al., 2001). However, it could be paired with a wide variety of phenomena, providing a “unified” model of processing. The following steps employ this model in a broadband 108 situation, separating the stimulus into critical bands before processing so each band was similar to the prior narrow-band stimuli used in the model. This separation into narrow bands agrees with many disparate models of binaural interaction (Colbum and Durlach, 1978), which share the common element of bandpass filters at the auditory input. By making calculations of normalized correlation and establishing decision criteria, the model calculation attempted to predict the 200 us data of Experiments 2 and 3. Figures 111.17 and 111.18 are a road map of the model calculation process. It began with the generation of stimulus files according to specifications in Experiment 3, with high-pass-filtered coherent targets and incoherent maskers. The discussion of calculations involving Experiment 2 stimuli will follow later in this chapter. The relative levels of targets and maskers were determined by the coherence value alpha. The process involved separation of the stimuli into 15 odd-numbered critical bands from number 1 to 31, including center frequencies from 124 to 7021 Hz. To ensure that the model could be applied generally to both Experiments 2 and 3, allowing for the inclusion of upward spread of masking, critical band filters were designed to fit the profile of an actual auditory filter. A formula for the “roex(p)” auditory filter shape H(f) is found in Moore (1995). |H(f)1= ()1+Pge pg The roex (rounded exponential) function falls off exponentially with deviation from center frequency g=|(f—fc) /fc|, and uses parameter p to define the width of a critical band (limiting the area under the curve to approximate the bandpass filters that describe critical bands). Note that at higher CFs, this filter includes a much wider frequency band due to the normalization according to CF. Filters near the target noise’s 109 high-pass cutoff would include masker power below the cutoff frequency, and the increased incoherence due to additional masker power would only rise as the high-pass cutoff increased. Thus, the coherences for the higher cutoffs in Experiment 2 may have been significantly less than we assumed. By including filters with a gradual rolloff, the model has a somewhat accurate representation of the auditory system, and should be able to simulate the case of Experiment 2 successfully. The filters are also applicable to the Experiment 3 stimulus because the shape should have no effect when the target and masker have identical bandlimits. After rectifying each band’s left and right channels according to a half-wave square-law, the model process included a stage of low-pass filtering according to the formula used by Bernstein and Trahiotis (1996) (see bottom of Figure 111.17). G(f)= 1 ”any fo=‘“—f£—- 8%-. Bernstein and Trahiotis selected parameters n = 4 and comer frequency = 425 Hz, which best fit their data. The same parameters were adopted for this calculation. At their suggestion, a subsequent low-pass filter was introduced as a further development of the model. This additional filter was second-order Butterworth with a cutoff frequency of 150 Hz. It was applied only to bands with center frequencies falling above 1500 Hz. All of the processes thus far applied to the stimulus are meant to simulate the actions of the auditory periphery on a real-world source. 110 Following the filter stage, the model proceeded to cross-correlate left and right signals over internal lag r from -4 ms to +4 ms without normalization (top of Figure 111.18). Note that the cross-correlation function extends well beyond 4 ms in either direction. To eliminate the need to calculate out to infinite delays, the model applied Colburn’s p(‘t) function (center of Figure 111.18), which gives more weight to lower ITDS—this ensured that extreme ITDs provided little contribution. The function may represent the density of neurons available for detection along the lag axis, and is described mathematically below. p(r)=C for|t|SO.15 — 0.15 p(r) = Cexp(-— M 0 6 j for 0.15 < ltl 5 0.2.2 - 2.2 p(r) = C(0.33)exp(— M2 3 j for m > 2.2 The r values in the above equations are measured in milliseconds. For the purpose of this calculation, the constant C was set to one. From the weighted cross- correlation function for one band, shown at the bottom of Figure 111.18 along with the original function for comparison, the model process calculated the centroid to establish a corresponding point along the internal lag axis (indicated by an “x”), as predicted by Stern and Colbum (1978). The process has led to a lateral position for one band of the stimulus at one delay and one coherence. Model calculation results The sample output shown in Figure 111.19 is a summary of twenty centroid calculations in each critical band using the described process. The input stimuli had a 200 us ITD, and the coherence level gamma for this example was 0.5. The vertical axis 111 is the lag of the centroid, with 0 us being the center of the head and values above shifted off to the right. The horizontal axis includes all fifteen bands tested in this calculation. Model results show that the centroids for an ITD of 200 us are shifted well towards 0 ITD. . .none of them are delayed more than 60 us. This is due to the weighting by p(t). The distribution shows the well-known strength of lateralization at low frequencies due to fine-structure ITD—the centroids for bands 1 through 6 are shifted well away from the center, compared with centroids for higher bands. On the other hand, the decay of lateral position is non-monotonic with frequency, and there appears to be a slight peak at band 12 that may be the result of envelope ITD cues. The most important features of the model results are as follows: The mean of the 20 centroid calculations is greater than zero in each band. This stimulus should be clearly lateralized to the right because all bands are in agreement. Going even further, almost none of the points fall below zero. The standard deviation of the points, while large in some cases, never extends to the left side of the listener’s head. This result suggests no room for confusion. Centroids approach zero lag for the higher bands, where coherence is strongest (in the room-environment model). Also, the majority of coincidence counters are found near zero lag, which indicates that the auditory system works with smaller, more subtle lateral positions (Trahiotis and Stern, 1989) For decreasing coherences, centroids in all bands tend to shift towards the center of the head, which should degrade lateralization. At alpha = 0.4, some points start to appear below zero. At alpha = 0.1, those bands which are the strongest 112 lateralized in Figure 111.19 (1-6) are still generally above zero, while higher bands show no preference to either side. Model predictions The experimenters have attempted to establish a detection criterion that would analyze the centroid distributions of Figure 111.19 and match the data. The selected criterion made predictions according to the fraction of centroid values greater than zero in each band. 1f the stimulus contained a band where more than 71% of points are above zero, then the criterion considers it lateralized to the right. As is evident, this criterion would tend to predict equal performance regardless of high-pass cutoff frequency due to the vast majority of centroids being above zero in all bands at low coherences. Figure 111.20 includes the replotting of ZOO-us Experiment 3 data (represented by pentagons), as well as attempts at fitting the model to those results. The solid triangles are the result of the model as stated: because the calculated centroids were well-lateralized in all bands when stimulus coherence was 0.2, one would expect listeners to perform adequately at that level over a range of cutoff frequencies. The model predictions are somewhat close to the actual data for cutoffs of 1 kHz and below, but they clearly overestimate the ability of listeners to use ITD cues in the “envelope region” above 1500 Hz. Raab and Goldberg (1975) note that several models have been reconciled with real-world results by the inclusion of internal neural noise, for which the above coherence-based model has not yet accounted. The concept of neural noise, a representation of randomness in binaural processing, can be of some significance to sensitive discrimination tasks. Indeed, many estimations of neural noise find it several 113 times larger than the noise applied externally (Raab and Goldberg, 1975). To ensure completeness of the model, one must consider the effect of neural noise that is additive with the interfering noise, and thus reduces the overall coherence from its expected value. To better fit the model to results, it is clear from Figure 111.20 that the actual coherences must be much lower. For instance, the average threshold at 2000-Hz cutoff was 0.5 coherence, while the model predicted only 0.2. At the expected coherence of 0.5, the coherent power must equal the incoherent power: One can reconcile the model and results by the addition of estimated neural noise that has three times the power of the interfering noise to establish an actual coherence of 0.2: _ Pc _ Pc _ Pc 7‘ PC+P,+3P, " PC+PC+3PC ’51), = 0.2 In support of the above derivation, model calculations at a y of 0.5 (as in Figure 111.19) that include the estimated neural noise result in centroid distributions closely matching those of a y=0.2 calculation without neural noise. Continuing with the addition of neural noise at constant power, one can equate all expected coherences on Figure 111.20 with actual coherences that will be used for a new set of predictions. Open circles mark those predictions on the graph (the “N-noise model”). Note how well the model matches actual performance in regions below 1500 Hz (where listeners still had access to fine-structure ITD cues) and above (where they were limited to envelope cues). The 114 inclusion of neural noise appears to lend more realism to the coherence-based model as implemented here. Unfortunately, the model is much less successful at predicting the results of Experiment 2. The implementation of roex(p) filters that would accept masker power below the target noise cutoff should lead to lower centroids in bands near the cutoff, and thus, require more coherence to lateralize those bands. This is clearly the case when comparing the results of Experiments 2 and 3 in Figure 111.16. However, model predictions based on calculations with broadband maskers and high-pass filtered targets are exactly the same as the open circles in Figure 111.20. There is no increase in required coherence. The experimenters continue to refine the model and look for new interpretations that might account for this discrepancy. One possible alteration is to include a frequency-dependent version of Colburn’s p(r) function, or p(r,f). Also, switching to roex(p,r) filters (Hartmann, 1997) would incorporate much more masker power outside the target band, which may improve predictions for Experiment 2. Conclusions Experiment 1 presented listeners with three varied tasks. The first test verified that absolute threshold was unaffected by interaural delay. The second revealed that listeners required increased stimulus levels to adequately lateralize smaller ITD shifts. The third found that listeners needed increased coherences (i.e. S/N ratios) to adequately lateralize targets exhibiting smaller ITD shifts. In runs with and without a masking noise, listeners produced parallel thresholds as a function of the shift. This suggests an intrinsic 115 noise that interferes with the binaural system’s ability to discriminate in a manner similar to the masker. The key conclusion of Experiment 2 is that the envelope ITD cues are indeed usefirl for a high-passed signal. The coherence graph clearly shows that larger ITDS were still effectively lateralized out to increasingly higher cutoff frequencies above 1500 Hz. However, the results clearly show that listeners required high interaural coherence in those cases, suggesting that envelope cues are susceptible to masking and reverberated noise (similar to the conclusion reached in Chapter II). In contrast, waveform cues seem very effective at low coherences, as evidenced by listener performance for low cutoffs. This would seem to indicate that envelope ITDs may play a less significant role than fine- structure ITDs in the lateralization process. Experiment 3 was motivated by the suspicion that Experiment 2 had included some upward spread of masking. Eliminating the low-frequency masker power (an integral part of the described room environment) below the target’s high-pass cutoff yielded significantly lower coherence thresholds for cutoffs of 1500 Hz and above. These results strongly supported the hypothesis that masker outside of the target band was still reducing the coherence. The coherence-based model was not very effective at forecasting the results of Experiment 3. The inclusion of estimated neural noise improved predictions dramatically, to fit thresholds that varied according to the effectiveness of both fine- structure and envelope ITD cues. Modifications to approximate the stimuli of Experiment 2 did not change the model’s predicted thresholds. 116 References Bernstein, LR. and Trahiotis, C. “The normalized correlation: Accounting for binaural detection across center frequency,” J. Acoust. Soc. Am. 100, 3774-3784 (1996). Colbum, H.S. “Theory of binaural interaction based on auditory-nerve data. 11. Detection of tones in noise,” J. Acoust. Soc. Am. 61, 525-533 (1977). Colburn, HS. and Durlach, N.I. “Binaural Phenomena,” Handbook of Perception vol. 4 ed. E. Carterette, pp. 467-518 (1978). Durlach, N.I. “Binaural Signal Detection: Equalization and Cancellation Theory,” Foundations of Modern Auditory Theory ed. J. Tobias, pp. 371 -461 (1972). Durlach, NJ. and Colburn, H.S. “Binaural Phenomena,” Handbook of Perception vol. 4. ed. E. Carterette, pp. 365-466 (1978). Hartmann, W.M. Signals, Sound and Sensation. American Institute of Physics, Woodbury, NY, 1997. p. 60, 248. Hartmann, W.M., Constan, Z.A., Rakerd, B. “Binaural coherence and the localization of sound in rooms,” J. Acoust. Soc. Am. 103, 3081 (1998). . Henning, B. “Some observations on the lateralization of complex waveforms,” J. Acoust. Soc. Am. 68, 446-454 (1980). Jeffress, L.A., Blodgett, HQ and Deatherage, B.H. “Effect of interaural correlation on the precision of centering a noise,” J. Acoust. Soc. Am. 34, 1122—1123 (1962). Kohlrausch, A., Fassel, R., van der Heijden, M., Kortekaas, R., van de Par, S. and Oxenham, A.J. “Detection of tones in low-noise noise: Further evidence for the role of envelope fluctuations,” Acustica 83, 659-669 (1997). Levitt, H. "Transformed up—down methods in psychoacoustics," J. Acoust. Soc. Am. 49, 467-477 (1971). McFadden, D. and Pasanen, E.G. “Binaural detection at high frequencies with time- delayed waveforms,” J. Acoust. Soc. Am. 63, 1120-1131 (1978). Moore, Brian C.J. Hearing. Academic Press, New York, 1995. pp. 161-205. Pickles, J .0. An Introduction to the Physiology of Hearing. Academic Press, New York, 1982. p. 82 Raab, DH. and Goldberg, I.A. “Auditory intensity discrimination with bursts of reproducible noise,” J. Acoust. Soc. Am. 57, 437-447 (1975). 117 Rayleigh, Lord (3rd Baron Rayleigh) “On our perception of sound direction,” Philos. Mag. 13, 214-232 (1907). Stem, R.M. Jr. and Colbum, H.S. “Theory of Binaural interaction based on auditory- nerve data. IV. A model for subjective lateral position,” J. Acoust. Soc. Am. 64, 127-140 (1978) Trahiotis, C., Bernstein, L.R., Akeroyd, M.A. “Manipulating the ‘straightness’ and ‘curvature’ of patterns of interaural cross correlation affects listeners' sensitivity to changes in interaural delay,” J. Acoust. Soc. Am. 109, 321-330 (2001). Trahiotis, C. and Stern, R.M. “Lateralization of bands of noise: Effects of bandwidth and differences of interaural time and phase,” J. Acoust. Soc. Am. 86, 1285-1293 (1989). Trahiotis, C. and Stern, R.M. “Across-frequency interaction in lateralization of complex binaural stimuli,” J. Acoust. Soc. Am. 96, 3804-3806 (1994). van der Heijden, M. and Trahiotis, C. “A new way to account for binaural detection as a function of interaural noise correlation,” J. Acoust. Soc. Am. 101, 1019-1022 (1997). Yin, T.C.T. and Chan, J .C.K. “Interaural Time Sensitivity in Medial Superior Olive of Cat,” J. Neurophys. 64, p. 465-488 (1990). 118 Figure 111.1. Depiction of a human head and a sound source in a room. The source is represented by a dot in the upper left corner, while the head is at bottom right. A solid line from the source pointing to the head represents the path of a direct sound that is unimpeded by obstacles. The dashed lines indicate a few possible paths for reflected sounds arriving at the head. 119 Figure [11.1 120 Figure 111.2. Schematic of the experimental setup in the form of a patch diagram. Begin at the left, where noises N1, N2 and N3 are generated. LPF stands for low-pass filter, Delay x/2x represents the delay line and the relative settings in us, AS is the analog switch, EVCA is the exponential voltage-controlled amplifier and Trap. Env. stands for the trapezoidal envelope that ramps signals on and off. 121 m 5cm doc. m .22 xzflmo “_nj m2 m060 .932 NZ ,FZ Figure 111.2 122 Figure 111.3. Description of the experimental task. Images portray listener’s head with an intracranial image produced by the stimulus (depicted by a small “cloud”). Each trial, the first interval is equally likely to have a positive or negative ITD imposed so the intracranial image appears offset to the left or right side of the head. The second interval features an opposite ITD of the same magnitude so the image moves from left to right (as in the top two images) or vice versa (as in the bottom two images). 123 Two-interval forced choice task . First interval Second Interval Left to Right Trial OR First Interval Second Interval Right to Left Trial Figure 111.3 124 Figure 111.4. Results of Experiment 1: ITD discrimination as a function of coherence for listener B. The x-axis shows all tested ITDs from 20 to 400 us, and the y-axis is the threshold target level necessary to perform the task (with 79% accuracy) at each. The three curves plot results for interference, non-interference, and absolute threshold conditions. For the interference condition, also note the dashed line that represents the level of the masking noise and the alternate y-axis indicating the corresponding coherences to target levels on the right side of the graph. The actual coherence values corresponding to interference data points are listed next to those points for ease of evaluation. Error bars on data points represent two standard deviations. aouauaqog 00000.0 00000.0 00000.0 00000.0 00000.0 00000.0 0 0 000.0 00 r U r 00v 000 000 0v mfloc 0ctmtm+£ .--- 2038?: 22080 To mocmconBE oc olo oocmcwtflrg «Ila t t 0.0_ 0.00 0.00 0.0V 0.00 0.00 0.00 (‘IdS-SP) P10959491. Figure [11.4 126 Figure 111.5. Results of Experiment 1: ITD discrimination as a function of coherence for listener L. The x-axis shows all tested ITDs from 20 to 400 us, and the y-axis is the threshold target level necessary to perform the task (with 79% accuracy) at each. The three curves plot results for interference, non-interference, and absolute threshold conditions. For the interference condition, also note the dashed line that represents the level of the masking noise and the alternate y-axis indicating the corresponding coherences to target levels on the right side of the graph. The actual coherence values corresponding to interference data points are listed next to those points for ease of evaluation. Error bars on data points represent two standard deviations. 127 aoueuaqog 00000.0 00 0 00.0 00000.0 00000.0 883 00000.0 00000.0 I I WE Ed /.n 000 mfloc 0c_.._ooEo+c_ .--- 228a,: 2288 I oocwnmfflE oc olo wUCwLOWLwt: 4L — - b as 0: 00v 000 000 0v om .. AY Av! l0 LY % - 0.00 0.00 0.00 0.0V 0.00 0.00 0.0» ('IdS-BP) P10933491 Figure 111.5 128 Figure 111.6. Results of Experiment 1: ITD discrimination as a function of coherence for listener S. The x-axis shows all tested ITDs from 20 to 400 us, and the y-axis is the threshold target level necessary to perform the task (with 79% accuracy) at each. The three curves plot results for interference, non-interference, and absolute threshold conditions. For the interference condition, also note the dashed line that represents the level of the masking noise and the alternate y-axis indicating the corresponding coherences to target levels on the right side of the graph. The actual coherence values corresponding to interference data points are listed next to those points for ease of evaluation. Error bars on data points represent two standard deviations. 129 aoueueqog 00000.0 00000.0 00000.0 00000.0 00000.0 00000.0 0 0 000.0 as a: 00v 000 02 av om I «T lmvl lmfir Jmmr Aw .. . o O u c r r. O r . 400.0 0Y0“ 00.0w . 1 ......... 1H ......... --- .............. - I /..w. . - ofloc 0c_Lm0Lo+c_ .....- . Eocmmnf 220mg olo 00.0 . oococofflE ocolo mocmcotm+£ <|I4 m _ n p b P 0.00 0.00 0.00 0.0V 0.00 0.00 00> ("HS-8P) P10939491 Figure 111.6 130 Figure 111.7. Results of Experiment 1: ITD discrimination as a function of coherence for listener T. The x-axis shows all tested ITDs from 20 to 400 us, and the y-axis is the threshold target level necessary to perform the task (with 79% accuracy) at each. The three curves plot results for interference, non-interference, and absolute threshold conditions. For the interference condition, also note the dashed line that represents the level of the masking noise and the alternate y-axis indicating the corresponding coherences to target levels on the right side of the graph. The actual coherence values corresponding to interference data points are listed next to those points for ease of evaluation. Error bars on data points represent two standard deviations. 131 aouauaqog as a: 00v 000 000 0v 00 0500.0 .. .. 0.00 .- Q\¢\? Hi . 00000.0 . J $ - 0.0m 00000.0 .. Oi H1 - 0.0m . 4 508.0 - «who . W - 0.9. wmav .r 808.0 - ....................................... Um - 0.0m . 8.0: 0:75 am c. i-.. . 00000 0 . Eocwmnf 016me 016 - 0 00 . mococmmno+c_ oc olo . oocmeo Lo c_.IIa 208.0. .— Ta F -03 Figure 111.7 132 ("HS-8P) P10939491 Figure 111.8. Results of Experiment 1: ITD discrimination as a function of coherence for listener W. The x-axis shows all tested ITDs from 20 to 400 us, and the y-axis is the threshold target level necessary to perform the task (with 79% accuracy) at each. The three curves plot results for interference, non-interference, and absolute threshold conditions. For the interference condition, also note the dashed line that represents the level of the masking noise and the alternate y-axis indicating the corresponding coherences to target levels on the right side of the graph. The actual coherence values corresponding to interference data points are listed next to those points for ease of evaluation. Error bars on data points represent two standard deviations. 133 aoueuaqog 00000.0 00 0 00.0 00000.0 00000.0 00000.0 00000.0 0 0 000.0 00 F r 00v 000 000 mfloc 0ctoto+£ .--- Eonmmnf 32800 01¢ mocwcomco+c_ 0: To oococotm+£ «la b p - 0.00 0.00 0.00 0.0V 0.00 0.00 0.0x. ("Ids-8P) P10939491 Figure 111.8 134 Figure 111.9. Results of Experiment 1: ITD discrimination as a function of coherence for listener Z. The x-axis shows all tested ITDs from 20 to 400 us, and the y-axis is the threshold target level necessary to perform the task (with 79% accuracy) at each. The three curves plot results for interference, non-interference, and absolute threshold conditions. For the interference condition, also note the dashed line that represents the level of the masking noise and the alternate y-axis indicating the corresponding coherences to target levels on the right side of the graph. The actual coherence values corresponding to interference data points are listed next to those points for ease of evaluation. Error bars on data points represent two standard deviations. 135 aouauaqog 0500.0 00 _ 00.0 00000.0 00000.0 00000.0 00000.0 0 0 000.0 83 a: 0v om I 00v 000 000 mfloc 0ctmto+£ .--- Eocwmccz 220mg olo mocwnmtmt: oc olo oocwcmtot: <|I4 P b - 0.00 0.00 0.00 0.0V 0.00 0.00 0.00 ("Ids-8P) p|oqseuqi Figure 111.9 136 Figure 111.10. Results of Experiment 1: ITD discrimination as a function of coherence averaged over all six listeners. The x-axis shows all tested ITDs from 20 to 400 us, and the y-axis is the threshold target level necessary to perform the task (with 79% accuracy) at each. The three curves plot results for interference, non-interference, and absolute threshold conditions. For the interference condition, also note the dashed line that represents the level of the masking noise and the alternate y-axis indicating the corresponding coherences to target levels on the right side of the graph. The actual coherence values corresponding to interference data points are listed next to those points for ease of evaluation. 137 uogiolauuog 00000.0 00 0 00.0 00000.0 00000.0 00000.0 00000.0 0500.0 as at 0V om I 1 00v 000 000 o F a wfloc 0c_.._oto+c_ .--- 20585 32800 olo mochwtmsFE oc olo ooconofflE 4'4 KP' 101 0.00 0.00 0.00 0.0V 0.00 0.00 0.0m ('ldS-EIP) P10143941“. Figure 111.10 138 Figure 111.11. Spectral diagram of experimental stimulus for Experiment 2. The four plots show stimulus intensity versus frequency. The two columns represent left and right ears. The top row shows the characteristics of the coherent target noise (A), which extends from high-pass cutoff frequency fc to 10 kHz. The dashed line extending to lower frequency indicates that the cutoff can be adjusted by the experimenter. The bottom row represents the incoherent masking noises (B and C) in each car, which are broadband to 10 kHz. The dashed line extending to higher intensity indicates that the experimental program can vary the masking noise level during runs. 139 aucoavocm xucoavocm NIxS. . . NIxS . m... . m. 0 m. H3308 m m... m” «coho—.32: m. n. u - - - -- - A m - - M. 005362... xucmavmnm 3.22 on - erS 0.. . m. .305 m m. < m .... . m ... m 0 308200 < m m : .me. .......... .u m." . 338:0 EEoEEonm Figure 111.11 140 Figure 111.12. Results of Experiment 2: ITD discrimination as a function of high-pass cutoff. This graph shows the range of tested high-pass cutoff frequencies (seven values from 0 to 4000 Hz) along the x-axis. The y-axis indicates the interaural coherence necessary to correctly lateralize the internal image in 79% of trials. The five curves are the results for various ITDs (20 to 400 us) averaged over four listeners. The coherences were calculated ignoring incoherent power outside the target band. 141 ANIV “CEBU mmoaILQI 0000 000v 0000 000m 0000 0 . a: m: om «La - 000.05% E . a: 3 08 ore ‘3 a - a: m: 84 one . - .......................... ..-: . .. I a 4 .. O C 4\ . l- . I C l. n p p n n - o><~ l 0.0 0.0 aouauaqog Figure 111.12 142 Figure 111.13. Results of Experiment 3: ITD discrimination of high-pass-filtered targets and maskers for listener W. This graph shows the range of tested high-pass cutoff frequencies (seven values from 0 to 4000 Hz) along the x-axis. These cutoffs were applied to both the coherent target noise and the incoherent masking noise. The y-axis indicates the interaural coherence necessary to correctly lateralize the internal image in 79% of trials. The square symbols mark thresholds for a ZOO-us ITD in Experiment 3, while the pentagons replot 200-us results fiom Experiment 2 for comparison. 143 ANIV h£0.50 mmoalngr 0000 000v 0000 0000 0000 0 .r .. . N QXM m: CON 0110 C . .maxmmuoomnlu u I .. .. . .. I I n I I I H. . .U C . 4 .. . - - . 2, .. 0.0 0.0 0.0 0.0 v.0 0.0 0.0 >0 0.0 0.0 0.0 aoueuaqog Figure 111.13 144 Figure 111.14. Results of Experiment 3: ITD discrimination of high-pass-filtered targets and maskers for listener X. This graph shows the range of tested high-pass cutoff frequencies (seven values from 0 to 4000 Hz) along the x-axis. These cutoffs were applied to both the coherent target noise and the incoherent masking noise. The y-axis indicates the interaural coherence necessary to correctly lateralize the internal image in 79% of trials. The square symbols mark thresholds for a ZOO-us ITD in Experiment 3, while the pentagons replot 200-us results from Experiment 2 for comparison. 145 0000 000v ANIV h£0.50 mmoalcgr 0000 0000 0000 0.0 0.0 0.0 0.0 V0 0.0 0.0 5.0 0.0 0.0 0.0 aouauaqog Figure 111.14 146 Figure 111.15. Results of Experiment 3: ITD discrimination of high-pass-filtered targets and maskers for listener Z. This graph shows the range of tested high-pass cutoff frequencies (seven values from 0 to 4000 Hz) along the x-axis. These cutoffs were applied to both the coherent target noise and the incoherent masking noise. The y-axis indicates the interaural coherence necessary to correctly lateralize the internal image in 79% of trials. The square symbols mark thresholds for a ZOO-us ITD in Experiment 3, while the pentagons replot ZOO-us results from Experiment 2 for comparison. 147 ANIV h76.50 mmoaILEI 000_ r 0000 009... 0000 0000 0 90 m: 80 010 m ea 3 com nIu P. LLJL l 0.0 0.0 0.0 0.0 v.0 0.0 0.0 0.0 0.0 0.0 0.0 aouaueqog Figure 111.15 148 Figure 111.16. Results of Experiment 3: ITD discrimination of high-pass-filtered targets and maskers averaged over three listeners. This graph shows the range of tested high- pass cutoff frequencies (seven values from 0 to 4000 Hz) along the x-axis. These cutoffs were applied to both the coherent target noise and the incoherent masking noise. The y- axis indicates the interaural coherence necessary to correctly lateralize the internal image in 79% of trials. The square symbols mark thresholds for a 200-us ITD in Experiment 3, while the pentagons replot ZOO-us results from Experiment 2 for comparison. The dotted line connecting to the last Experiment 2 point indicates that it is the average of only two listeners’ results; the third listener failed at that condition. Error bars on data points represent two standard deviations. 149 0000 ANIV $9.30 mmoalcmf 000v 0000 0000 0000 0 J dl o>2‘ - 0.0 0.0 0.0 0.0 V0 0.0 0.0 0.0 0.0 0.0 0.0 aouauaqog Figure 111.16 150 Figure 111.17. Simulating the actions of the auditory periphery in the model. There are three steps listed in this figure. The uppermost shows the generation of the stimulus according to experimental specifications (coherence, cutoff). The stimulus is separated into critical bands (with center frequencies falling in the range listed) for processing. Each band is rectified (center panel) and low-pass filtered (bottom panel, Bernstein and Trahiotis, 1996) before cross-correlation. 151 Generate stimulus within odd critical bands from 1 to 31 (124-7021 Hz) ’ g ------------ n c .2 .E fc 10kHz Frequency Half-wave square-law rectify 11 ‘fir ' v ' f I v I V r I I v r7 I v 0 l Intensity I'U'I'I'I'I'V'I'Y'I'I 1.1-1.1.lxlulululul. r.1.r.n.rinrr.r.n.r.r lnflwmfly I 3 (In Amplitude (dB) -Qéiistéaééiéii; Frequency (Hz) Figure 111.17 152 Figure 111.18. Binaural processing in the model. Left and right channels of a critical band are cross-correlated without normalization (top panel) for delays between 21:4 ms. The peak falls at the experimental ITD of 200 us. To eliminate the contribution of larger delays, the cross-correlation values are weighted by Colburn’s P(r) function (center panel), which weights low delays heavily. The weighted correlation function (bottom panel, shown with un-weighted function for comparison) is used to calculate a centroid (labeled with an “x”) along the delay axis. 153 Cross-correlation 2400 . 120°: ézoous_ - . 94K -3K‘-—2K -1K 0 1K 2K 3K 4K Interaural delay (us) Cross-correlation " w 01 O O Apply Colburn’s P(t) function to weight smaller delays fl 1.00 r 0.80 f 0.607 0.40 r 0.20 : O'Ogmi-éx -éK‘-1K‘ 0 1K 2K 3K 4K Interaural delay (us) Weight Calculate centroid of the weighted CC function Cross-correlation (.0 01 D O 94K -3K -2K -1K 1K 2K 3K 4K Interaural delay (us) Figure 111.18 154 Figure 111.19. Model calculation results. The critical bands tested in the model are found on the x-axis, with corresponding center-frequencies along the top of the graph. The y- axis show the lateral position of the centroid in us, with 0 being the center of the head and values above shifted off to the right. The points represent twenty model calculations performed for each critical band. The coherence value 7 for this example is 0.5. 155 0.0 0.0 0.v 0.0 0.0 0.0 0.0 v.0 : 00. 0. vv. 3.20 500008... F. 0:00 0H:0_0_:000 0 w 0 0 v 0 0_ II I- l1'|"ml IIIIIIIIIIIIIIIIIIIII .qoam@@.. c . U n . m a, a m O T «a w m Om » c . Sm. ' i 0.01..LP 0m 0 m m. 00. 0_. O ‘1‘ 00 0.0 00 Figure 111.19 (871) piouiueg 156 Figure 111.20. Model predictions. The range of tested high-pass cutoff frequencies (seven values from 0 to 4000 Hz) are along the x-axis. The y-axis indicates the interaural coherence necessary to correctly lateralize the internal image 79% of trials. The pentagon symbols mark thresholds for a 200-us ITD in Experiment 3. The closed triangles show thresholds predicted by the model without inclusion of neural noise. The open circles are thresholds predicted once estimated neural noise is added. 157 ANIV h$0.50 mmoaILEI 0000 000v 0000 0000 0000 0 . . . _000E 2.000 I 0.: m: 000 Ola _onoE 0&0ch Olo 4r ta o>< - 0. 0 0.0 0.0 0.0 aouaueqog Figure 111.20 158 Chapter IV: Intensity-based lateralization as a fimction of coherence Introduction The goal of the interaural level difference (ILD) experiments described in this chapter was to study how the human auditory system uses ILD information to localize a broadband noise stimulus. Specifically, they were designed to establish how ILD discrimination was affected by interaural coherence, if at all. The usefulness of interaural time difference (ITD), another binaural cue, increases with higher interaural coherences (Stern and Trahiotis, 1995). However, ILD and ITD are disparate cues, as evidenced by the way the superior olive processes them differently (Pickles, 1982) and the variability of time-intensity trading (Stern and Trahiotis, 1995). It is not clear a priori whether coherence values should influence ILD discrimination performance. The above question readily applies to the problem of localization in rooms. While listeners can use both ITD and ILD cues in the presence of room reflections, there is always some interaural decorrelation. In a large lecture hall (reverberation time presumably less than a second), Lindevald and Benade found that signals to the two cars are uncorrelated for all frequencies above 500 Hz (1986). Binaurally uncorrelated signals provide no useful ITD cues, because they lack common temporal features in both ears. Thus, the ILD cue becomes paramount to sound localization for steady-state signals in highly-reflective rooms. Understanding the impact of decorrelation on ILD discrimination is quite a significant objective. 159 Models When considering the question of coherence and ILD, several different models suggest themselves. The first of these models is dubbed the “level-meter”. This model assumes the existence of independent level meters that measure the power at both ears, and ILD is determined when some central processor calculates the difference between them. The level meter has no way to detect coherence, which is an aspect of the waveform. Instead, it calculates a power by integrating the signal over time, losing all waveform information as a result. Thus, the level-meter model predicts that ILD discrimination should be unaffected by changes in coherence. The second model takes a less simplistic view of coherence. The “instantaneous ILD fluctuation” model notes that a binaurally coherent signal, time-shifted so the ITD is zero, will have the same ILD at any point along the signal waveform. This is because the left and right waveform variations are synchronized, and any disparity between them would be due to a level difference. However, any incoherence will cause power fluctuations in the two cars to grow more independent, leading to random variability in ILD from moment to moment. If the listener can track these “instantaneous” ILDs over time, then the fluctuations could lead to uncertainty in his or her judgments, and thus make the task more difficult. In terms of Signal Detection Theory (TSD, see General Appendix B), incoherence would actually increase the standard deviation a, decreasing the sensitivity index d'. This model predicts that listeners should require lower thresholds for ILD lateralization for coherent noise (cross-correlation = 1) than for 160 incoherent (cross-correlation = 0) and anticorrelated (cross-correlation = -1) noises, and the amount of ILD variability determines those thresholds. The third model concentrates on the intracranial images produced by stimuli. The auditory system interprets a perfectly-coherent noise as a compact image, where “compact” indicates a narrow Auditory Source Width (AS W). The movement of that image (due to a change in binaural cue) is readily identified because listeners can detect a clear lateral position for the noise. As the noise becomes less coherent, that image becomes less compact, increasing the ASW. At a coherence of 0, it spreads across the entire lateralization axis, “filling” the listener’s head. Instead of identifying the motion of a localized image, a listener would have to follow the image’s “center of mass” to make a lateralization judgement. For anticorrelated noise, as described below, the image is similar to that of incoherent noise, but doesn’t quite fill the whole head. The image model predicts listeners will have lower thresholds for stimuli that produce intracranial images of smaller ASW. Thus, listeners should perform best with coherent noise, worst with incoherent, and anticorrelated should fall in-between. The relationship of ILD sensitivity and binaural coherence has been previously reported in unpublished work by Grantham and Ahlstrom (1982). They raised a question about the image model, suggesting that ASW may impact a listener’s ability to lateralize. When the stimuli were broadband (bandwidth presumably greater than one octave), listeners had consistently lower ILD thresholds for coherent noise (low ASW) than for incoherent noise (high ASW). Their results for narrowband stimuli (bandwidth of 0.4 octave) weren’t as clear, as two out of three listeners performed better with coherent noise at short durations (30 ms), but exhibited no significant effect due to coherence at 161 longer durations (500 ms). The experiments detailed in this chapter are meant to expand on their work with both similar and newly-introduced stimuli. Methods A series of experiments explored the ILD question by testing ILD thresholds for various stimuli. The stimuli included are as follows (with the coherence, or maximum cross-correlation, value 7 indicated for each): --lnteraurally correlated noise (y = 1). This stimulus was diotic (except for level), producing a compact image in the listener’s head. --Interaurally uncorrelated noise (7 = 0). This stimulus was generated by two independent noise sources, one for each ear, and formed a head-filling image. --Interaurally anticorrelated noise (7 = -l). This stimulus was identical to the correlated noise above, except the signal to the right ear was inverted, causing the listener to hear images located at the left and right extremes that extended towards the center. The ILD experiment employed a two-interval, two-altemative forced-choice (212AFC) task, in which the listener identified changes in lateralization. A reversal of ILD (e. g. from +2 dB to —2 dB) between the two intervals caused the internalized image to move, and listeners responded by indicating the apparent direction of motion. Runs consisted of trials presented at a level of 60 dB SPL, with two 500-ms intervals per trial. Intervals were turned on by matched voltage-controlled amplifiers, 162 which were in turn directed by a single gating signal with rise/fall times of 30 ms. The simultaneous left and right-ear stimuli ensured that interaural onset differences would not affect lateralization. The total number of trials depended on the listener’s performance during the run, as explained below. On the first interval of a trial, either the right or left channel was attenuated or amplified, with each possibility having equal a priori probability. At the start of a run, the level shift in one ear was 21:2 dB SPL. After a 500- ms delay, the second interval played, with levels for right and left ears reversed. This succession of ILDs would cause the stimulus image to move laterally in the listener’s head. See an example chart of levels by interval in Figure IV.1, which simulates a trial where the image passes from right to left. The x-axis represents time, while the y-axis represents stimulus level in dB. The dashed line indicates the standard level of 60 dB. The figure exhibits two possible trials (a) and (b) that were equally likely to occur during the run, and were expected to present the listener with the same impression: that the image moved from right to left. The listener was then prompted to respond by pressing one of two buttons, indicating the direction of stimulus motion. The run would not continue until the listener had made a decision. The ILD was controlled by a 3-down l-up staircase (see General Appendix A). When the listener correctly identified the direction of ILD motion three trials in a row, the program reduced the ILD by an increment, making the “movement” smaller and more difficult to detect. However, only one incorrect answer caused the program to increase the ILD by an increment. 1f the ILD was greater than 1.5 dB (as at the start of the run), the increment was 0.5—that is, the ILD would be increased or decreased by 0.5 dB according to the 163 staircase. If the ILD was between 1.5 and 1 dB, the increment was 0.2 dB, and reduced to 0.1 dB when ILDs fell below 1 dB. Since listeners spent the majority of a run with ILDs less than one, this arrangement allowed them to progress rapidly downward from the starting ILD of :2 dB, then provided high resolution in the region of interest. The lowest ILD possible in this experimental setup was 0.1 dB. Each ILD run presented one of three perceptually-different stimuli, which produce distinct internal images: coherent noise leads to a compact image, incoherent noise is broad, and anticorrelated is somewhere in between. As mentioned in Appendix B, discrimination experiments showed that listeners could distinguish between them more than 75 percent of the time. The level meter model predicts no difference in ILD thresholds for the above stimuli, due to its coherence insensitivity. In addition, if the ILD information is based simply on levels in each ear, then the lateralization task can be no more accurate than a monaural difi‘erence-limen-intensity (DLI) task. This result was previously suggested by von Bekesy (1930). To provide a thorough test of the model, the experiments included one more stimulus: --DL1 (Difference Limen in Intensity) (y = l). The two ears received diotic noise. The DLI stimulus was used in a task that required listeners to identify the louder of two intervals, rather than an ILD-induced motion. Runs of the DLI task found the smallest detectable difference in interval levels for each listener. Aside from the differences in the stimulus and the task, the DLI experiment followed the same procedure 164 as the lateralization experiments. A level chart for an example DLI am can be found in Figure IV.2, which is of the same format as Figure IV.1. Runs featuring each of the four stimuli were intermingled during the experiment, ensuring that listeners were not exposed to the same stimulus on two successive runs. Due to prior tests in the lab, and the results of Grantham and Ahlstrom (1982), it was expected that any coherence-related effects would be small, and thus the experiments required a high amount of precision from the data. When a run resulted in an average ILD with standard deviation greater than 0.3 dB, it was redone to ensure a smaller variability. Following a few training runs, the final six runs for each listener at each condition (that met our above standard for variability) were used as data, and averaged together. Thus, each listener provided data in 24 runs. Each experiment took on unique modifications to the stimuli, as detailed below. Listeners performed runs of the ILD experiment in a double-walled Acoustic Systems soundroom, wearing Sennheiser HD-480 headphones to hear the stimuli. The letters R, W, H, K, L, Z, and M will refer to the listeners in this experiment. K and W were females age 21 and 19; R, M, L and Z were males age 19 to 28, and H was a 60- year-old male. Runs lasted between two and four minutes. Experiment 1: Broadband signal Results The first task used thermally-generated Gaussian noise that was both white and broadband, then low-pass filtered at 10 kHz (with a rolloff of -48 dB/octave). 165 Experimenters generated all necessary stimuli using two independent noise generators passing through two identical filters. This experiment involved listeners R, Z, H, K, M, and W. Refer to Figure IV.3 for the data collected during the broadband experiment. The symbols corresponding to each stimulus type are listed in the legend. Results from the four tasks are grouped by listener, as identified by letter code along the x-axis. The mean threshold ILD in decibels (determined by the staircase targeting 79% correct) is along the y-axis. The length of the error bars is twice the standard deviation (n-l weight). Note the shaded area below 0.15 dB on the plot. Any data within this region should be considered unreliable, because the experiment was limited to a minimum ILD of 0.1 dB and a minimum increment of 0.1 dB. Any listener who could discriminate that well would essentially cause the staircase to fail. None of the listeners ever ran into that “floor”—H reached it sporadically, but results from those runs showed that it wasn’t holding him back appreciably. Approximately 6.5% of all listener runs in this experiment were discounted due to a standard deviation higher than 0.3 dB. Discussion Looking at the broadband data, it is possible to make some general observations. The DLI value is the highest (and thus worst) threshold for every listener, but more significantly for R and Z at the left. R does best at the coherent task, while others perform best at the anticorrelated task. Incoherent results are consistently worse than for the other two binaural stimuli, but better than the DLI results. The thresholds averaged 166 across listeners for each stimulus are found in Table IV.1. They provide general evaluation of the relative difficulty of each task. Table 1V.l - Broadband ILD thresholds according to stimulus type Stimulus Anticorrelated Coherent Incoherent DLI Mean value 0.398 0.451 0.585 0.773 Threshold ILDs in dB from Experiment 1, averaged across listeners for each stimulus type. In most cases, the large error bars made it difficult to estimate differences in performance. For this reason, the experimenters decided to perform a statistical study through analysis of variance (ANOVA). The AN OVA indicated that there was indeed a significant difference among the four conditions (F(3,15) = 16.7, p<0.01). The results of post-hoc “general linear model univariate pairwise comparisons” for each combination of conditions can be found in Table 1V.2. Those comparisons with a p-value that indicates significant difference (i.e. less than 0.05) are marked with an asterisk. Table 1V.2 - Broadband pairwise comparisons p-values Anticorrelated Coherent Incoherent DLI Anticorrelated 0.501 0.008*‘1’ 0.002*l’ Coherent 0.002*'1 0.012* Incoherent 0.052 The p-values for comparison between factor levels in experiment 1 (broadband 10 kHz) * significant difference at the 0.05 level '1‘ significant difference at the 0.05 level after Bonferroni correction According to the pairwise comparisons, listener results were significantly different for all pairs of stimuli except for coherent/anticorrelated and incoherent/DLI. These relationships between the various stimuli indicate that coherence may, in fact, have an effect on ILD discrimination. Note, also, that the stimuli with images that didn’t fill the listeners’ heads (coherent and anticorrelated) were clearly easier to lateralize than 167 incoherent noise, but neither proved better than the other. This result agrees with Grantham and Ahlstrom’s result for a coherent/incoherent comparison (1982). Incoherent noise did not seem an improvement over the monaural DLI task. However, the previous analysis might require some evaluation of the experimental goals. A 95% confidence interval may be sufficient for one comparison, but this experiment was testing for any difference between four sets of values. The null hypothesis stated that all four conditions were equivalent, and any one of the six comparisons could disprove that. It seems that stricter confidence intervals would be in order, and that is provided for by the Bonferroni correction (Dunn, 1961). By dividing the significance of each mean-difference pair by six (the number of comparisons), it essentially assigns a new alpha level of 0.05/6=0.008. This is a very stringent requirement, and once the Bonferroni correction takes effect, the results change somewhat. In the case of this experiment, applying the correction eliminates the difference between coherent and DLI tasks. P-values that are significant under the Bonferroni correction are marked with a dagger symbol on Table 1V.2. With these results, it is difficult to make a case for the level-meter model. It would seem that the models that deal with ASW and ILD fluctuation are best supported, since the uncorrelated noise was significantly more difficult to lateralize than the more compact stimuli. 168 Experiment 2: Low-pass filtered signal Introduction and Method The purpose of Experiment 1 was to test the effect of coherence, an ITD-based cue, on ILD discrimination. However, in the broadband regime, ITD tends to play the strongest part at lower frequencies. A well-established aspect of binaural hearing is that ITD coherence cues provide the most useful information below about 1500 Hz (Moore, 1989), even though the auditory system is sensitive to envelope coherence at higher frequencies (see Chapter II) where waveform cues are unusable (Henning, 1974). On the other hand, human ILD sensitivity is generally independent of frequency (Yost, 1981). Because the stimulus in Experiment 1 was low-pass filtered at 10 kHz, most of its power was located in frequencies above the limit where ITD discrimination (and thus, coherence detection) was strongest. Thus, using the same noise as Experiment 1 and eliminating frequencies above 1 kHz by lowering the cutoff, Experiment 2 ensured that the stimulus fell entirely within ITD’s most effective region. The BrickWall lowpass filter used had a —l 15 dB/octave rolloff. If the interaural coherence were to have an effect on the process of ILD discrimination, such a change in stimulus would afford it the best chance. Also, the switch to narrowband noise may eliminate the observed effect of coherence on ILD threshold, as reported in Grantham and Ahlstrom (1982). The listeners involved were R, Z, H, L, M and W. 169 Results and Discussion Figure 1V.4 plots data from the lowpass (1 kHz cutoff) experiment. The between- subject variation is much more pronounced for this task, which also proved more difficult overall than the broadband case (as shown by the consistently higher thresholds in Table IV.3 below compared to Table 1V.1). The four listeners on the left seem to follow a pattern of low thresholds for coherent and anticorrelated, then worse for incoherent and DLI. However, M and W on the right are completely different. Overall, listeners seemed to do best with a coherent stimulus, and all but one had worse performance with an incoherent stimulus. This is a minor change from the broadband experiment, where anticorrelated noise produced the lowest threshold. Approximately 6.8% of all listener runs were discarded due to a standard deviation higher than 0.3 dB. Table IV.3 - Narrowband ILD thresholds according to stimulus type Stimulus Anticorrelated Coherent Incoherent DLI Mean value 0.733 0.679 0.835 1.021 Threshold ILDs in dB from Experiment 2, averaged across listeners for each stimulus type. Using the ANOVA to identify a pattern, the test finds a marginally-significant difference among the four conditions (F(3,15) = 3.35, p = 0.045). Pairwise comparisons show only one significant difference: between coherent and incoherent noises. The resultant p-values can be found in Table IV.4. Note that use of the Bonferroni correction (again, requiring a level of 0.05/6 = 0.008) eliminates the only pairwise difference identified in this experiment. 170 Table 1V.4 - Narrowband pairwise comparisons p-values Anticorrelated Coherent Incoherent DLI Anticorrelated 0.294 0.199 0.163 Coherent 0.017* 0.070 Incoherent 0.239 The p-values for comparison between factor levels in experiment 2 (low-pass 1 kHz) * significant difference at the 0.05 level 1' significant difference at the 0.05 level after Bonferroni correction This experiment provides support for the level-meter model, which predicts no significant differences between any two conditions. The results also agree with the conclusions of Grantham and Ahlstrom (1982), as suggested earlier. Experiment 3: Low-pass filtered signal with randomized standard level Introduction and Method While Experiments 1 and 2 provided useful insight, there existed a possibility that the listeners were not completing the task as instructed. Referring to Figure 1V.l, it becomes clear that the ILD “movement” can be identified monaurally. If the listener concentrated solely on the left ear, for instance, he or she could make a loudness comparison between intervals to make a decision. Granted, such a strategy would have a disadvantage when compared to binaural lateralization, because monaural level discrimination is subject to a greater variance than ILD discrimination. Thus, the monaural strategy would be less accurate. Nevertheless, it is important to establish what strategy the listeners chose. The final experiment was designed for comparison with Experiment 2, using the low-pass 1 kHz noise as defined above, but also introducing randomization of standard level between intervals. This strategy is similar to that of 171 Grantham (1984) and Koehnke et al. (1986), who were also trying to prevent monaural discrimination in their respective ILD sensitivity experiments. The randomization depended on an experimenter-set range value of 5, causing the standard level in each interval to vary among integer values within 5 dB of the 60 dB experimental norm. The distribution of selected levels was rectangular, meaning each possible choice had equal probability. This variation in standard level does not affect the relative ILD shifts. 1n the example trials of Figure IV.5 (again, formatted as Figure NJ), the levels are those of Figure IV.1 with a +2-decibel shift on the first interval, and a —l-decibel shift on the second. If the listener attempted to do the task with only information from the left car, he or she would conclude that the image had moved from left to right. However, using binaural ILD information, it is clear that the image moved right to left. The introduction of a random standard has confounded the possible monaural cue. If listeners did not use monaural cues for the task in Experiment 2, then the results of Experiment 3 should correspond closely. If they still attempted to concentrate on a single ear, their thresholds would increase dramatically. Green (1988) estimated that a level randomization of X dB would necessarily require a threshold level of L=X[1-\/572_1-Z] with PC representing the percentage of correct responses in a two-altemative task. Given the selected X of 5 dB, and PC of 79% targeted by the staircase, Green would predict a threshold of 1.8 dB. Note that there was no change in the DLI task from the previous experiment, i.e. level randomization was not applied to DLI runs because it would have interfered with the task. Listeners for this experiment were R, Z, H, L, M and W—the same as for Experiment 2. 172 Results and Discussion Data are summarized in Figure IV.6. Note how consistent the relative positions of coherent, anticorrelated and incoherent data were for the left four listeners. M and W once again do not fit the trend. They did, however, match all listeners but H by performing worst in the DLI case. Comparing the results for other (non-DLI) stimuli (as found in Table IV.5) to results without randomization from Experiment 2 (in Table IV.3), the pattern is similar. Thresholds for individual listeners both increased and decreased in the switch to random standard level. In addition, the thresholds do not approach the value of 1.8 dB predicted above. The evidence would suggest that listeners were indeed using binaural cues in Experiments 2 and 3. One curious result: although the DLI task was unchanged, some listeners’ results were not consistent with the prior values. Listener R, in fact, achieved much higher thresholds that were too variable to be included in our results, thus his closed circle is off-plot. Approximately 9.2% of all listener runs were discarded due to a standard deviation higher than 0.3 dB. Table IV.5 - Random-level ILD thresholds accordig to stimulus type Stimulus Anticorrelated Coherent Incoherent DLI Mean values 0.766 0.756 0.880 1.152 Threshold ILDs in dB from Experiment 3, averaged across listeners for each stimulus type- The ANOVA indicates a significant difference among the four thresholds (F (3,1 5) = 6.53, p < 0.01). The p-values for pairwise comparisons in Table IV.6 show that coherent and anticorrelated results were significantly different from DLI, and coherent data were significantly different from incoherent. These pairs are similar to those 173 identified in the broadband experiment. As with the Experiment 2 results, applying the Bonferroni correction eliminates all significant differences among thresholds. Table IV.6 - Random-levelJairwise comparisons p-values Coherent Anticorrelated Incoherent DLI Anticorrelated 0.843 0.271 0.041* Coherent 0.009* 0029* Incoherent 0.235 The p-values for comparison between factor levels in experiment 3 (low-pass, randomized level) * significant difference at the 0.05 level '1' significant difference at the 0.05 level after Bonferroni correction Summary/Discussion The inclusion of DLI as a test of the level-meter model The level-meter model assumed that ILD detection was based solely on independent level measurements in each ear. If this were the case, interaural cross- correlation should have no effect on ILD thresholds. The model predicts equivalent thresholds for stimuli of all three correlations, as well as in the DLI condition. The reasons to include the radically-different DLI task in our comparison are as follows: in TSD, measurements are selected from a Gaussian probability density function (PDF). See the leftmost two plots of Figure IV.7 for example intensity measurements in left and right ears. Upon receiving a stimulus, the probability that the auditory system will judge its intensity to be a particular value is determined by a Gaussian curve. The most probable measured value in the left ear is marked 11L (the location of the probability peak along the intensity axis), and CL is the intensity uncertainty caused by internal and 174 external noise. Similar variables are noted on the right-ear graph. It is important to mention here that there is no reason to assume that the intensity uncertainty is different between left and right ears. 0' = a L = 0R Note that 11;, is not equal to rig, indicating an ILD. In order to get the probability density for ILD = LR - LL, convolve the left and right ear curves to establish a new Gaussian of mean position A = uR-uL and standard deviation determined by that of the original measurement (\l(o2 + 02) = 0V2) (Hartmann, 1997). The ILD decision variable d' is defined by these two variables that describe the Gaussian (Green and Swets, 1966) thus: d. = _A__ 0/5 The magnitude of the decision variable is directly related to the probability of a correct judgment. Increasing the mean distance from zero A or reducing the uncertainty 0 will improve performance. The experimental task was comprised of two intervals, with equal and opposite ILDs (note the center two plots on Figure IV.7). To model the interval comparison, convolve both ILD probability densities (with means A1 and A2 = —A., respectively) to establish the density for one trial (see the rightmost plot on Figure IV.7). The Gaussian’s peak position is defined by A1 - (-A1) = 2A, dropping the subscript for simplicity, and the standard deviation becomes \l(202 + 202) = 20. The decision variable for one trial is: 175 The d' for comparing the levels of two intervals in a monaural DLI task is A/m/Z, for the same reasons as explained for individual ILD measurements above. But when the same task is done binaurally, with a diotic noise, the listener has more information with which to make a judgment. A paper by Jesteadt and Weir (1977) shows that the d' for the diotic case is higher than that of the monaural by a factor of \/2, since the listener has twice as many stimulus channels. The d' improves to No, the same as above for the ILD discrimination task. Therefore, an additional consequence of the level meter model suggests that the DLI task, using a diotic stimulus, should have thresholds equivalent to all the ILD stimuli. Experimental inconsistency with the level-meter model The thresholds of Experiments 1-3 all seem roughly comparable. Within each experiment, results from the four conditions did not differ more than 0.5 dB. To a first approximation, the level-meter model performed well. However, the presence of systematic and statistical differences argues against the level-meter model. While its predictions are nearly correct, the disparity between DLI and correlated thresholds is a glaring inconsistency. DLI thresholds tended to run higher than those for all ILD conditions, as evidenced in Tables IV.1, IV.3, and IV.5. The difference could be due to several factors missing from the model: additional neural processing applied to ILDs, the diverse natures of level and lateral discrimination, or perhaps other initial assumptions were erroneous. Additionally, smaller (but still interesting) differences in threshold were found between the various correlations. At any rate, it would seem that the level-meter model could use some adjustment. 176 The evolution from level-meter to loudness-meter The level-meter model assumed that listeners only used power measurements to establish ILD, and because power is time-averaged, correlation makes no difference. However, the fluctuations introduced by anti- or uncorrelated noise are only truly eliminated if the averaging time is infinite. Work by Grantham and Ahlstrom (1982) revealed an indication of this effect. They found equivalent ILD thresholds for both correlated and uncorrelated noises when the stimulus duration was 500 ms, but when they shortened it to 30 ms, the uncorrelated noise resulted in a clearly higher threshold. Their results suggest that listeners can indeed suppress the variability of uncorrelated interaural measurements given sufficient averaging time. This leads to the consideration that thresholds for uncorrelated noise may be limited by the auditory system’s inherent integration time, if the stimulus duration is long enough not to be a limiting factor (as in our experiments). Such information can account for the slight variation among our three experimental ILD thresholds, and harkens back to the “instantaneous ILD” model, which accurately predicted relative thresholds for coherent and incoherent noises. It is possible to correct the predictions of the level-meter model by including half-wave rectification, compression, and a realistic integration time for detection, all of which influence the perception of loudness. This effectively changes the model to a “loudness-meter.” Work by Hartmann and Constan (2002) on the loudness-meter resulted in a quantitative model that used the sensitivity index d' and the signal variance of each correlation condition to predict ILD thresholds. The results are included in Figure IV.8. This figure revisits data from Experiments 2 and 3, allowing straightforward comparison 177 for the model and all stimuli. Model-based calculations are indicated on the threshold scale by horizontal lines for each experimental condition. The dotted line represents predicted threshold levels if the model assumes a 200-ms integration time. The solid and dashed lines represent 300-ms and 400-ms durations, respectively. The closest agreement with Experiment 2 occurs for 300 ms, a value that agrees with Plomp and Bouman (1959) for similar low-frequency stimuli. The loudness-meter model also predicts that thresholds should decrease with increasing bandwidth, to 0.45 dB for a broadband correlated stimulus. Again, this value is in close agreement with our empirical results (see Table 1V.1). Conclusions The level-meter model provided a close approximation to experimental results. The ILD thresholds across stimuli were quite close, and did not vary more than 0.5 dB. However, there was some subtle variation according to interaural correlations, indicating that the model could use some adjustment. The introduction of rectification, compression and integration time provided an interpretation of the signal, as seen by the auditory system. These modifications transformed the level meter to a loudness meter, which was able to account for the differences in threshold. 178 Appendix A: “The strange case of Listener C” Listener K, a subject in Experiment 1, was unavailable for Experiment 2. This led to the search for a replacement, testing and rejecting three more listeners (males C and X, female J) whose l-kHz results consistently exceeded the standard deviation limit of 0.3. This problem was unexpected for listener C, as he had produced some of the lowest thresholds at the 10-kHz condition. C had little difficulty with the l-kHz condition, continuing to lateralize at low ILDs for all stimuli except correlated noise. The particular combination of correlated noise and narrow bandwidth proved more difficult, and C’s results varied wildly from run to run. Interviews revealed that listener C perceived what he termed a “hole” along the lateralization axis, representing an absence of noise. The hole’s position and movements were always opposite those of the correlated image. Occasionally, listener C would become distracted by the hole, following its movements opposing the experimental stimulus. The distraction was intermittent; often, C would complete the majority of a run before the hole captured his attention. Thereafter, he would answer incorrectly on every trial, even at large ILDs. Experimenters reduced the stimulus duration, increased the stimulus volume, and introduced an ITD, but failed to eliminate the hole. Once the low- pass cutoff was increased to 2 kHz, the hole appeared less frequently, and vanished altogether when the cutoff exceeded 3 kHz. No other listener experienced this phenomenon, and we cannot currently explain its apparent existence. 179 Appendix B: Noise-type discrimination project Introduction As an aside to the ILD project, which measured listeners’ ILD thresholds for No, N712, and Nu (the coherent, anticorrelated, and incoherent stimuli, respectively), an informal experiment attempted to verify the basic assumption prompting the project: that listeners could distinguish between the three kinds of noises, most likely on the basis of coherence. These stimuli must affect the binaural processing centers in very different ways, and there was every reason to believe that listeners could easily discriminate among them. The work of Pollack and Trittipoe (1959) provides evidence that listeners can do so, but indicates that the ability to discriminate may be very dependent on signal level, duration and frequencies. The letters H, X, J, C, R, L, and Z will refer to listeners involved in this experiment. I was a female age 42; R, C, X, L and Z were males age 19 to 28, and H was a 59-year—old male. Experiment I The listener was required to identify the less coherent (according to ASW) of two intervals in a 21FC task, while the two intervals presented a round-robin selection of coherent, anticorrelated and incoherent noise. Thus, on one trial the stimuli may have been coherent noise and anticorrelated noise, while the next trial included anticorrelated noise and incoherent noise. These noises were white, thermal, and Gaussian, ranging from 20 to either 1 kHz or 10 kHz, where they were low-pass filtered by Frequency Devices 901Fs with a -48 dB/octave rolloff. The experiment was run at both frequency 180 ranges because the results would be significant to both broad— and narrowband ILD experiments. Stimulus characteristics remained unchanged throughout the run, and the two stimuli for each trial were selected manually by the experimenter. Both intervals lasted 500 ms, with a 500 ms gap between. After hearing a trial, the listener recorded which interval seemed “broader” next to the corresponding trial number on a list. The run went on for 18 trials, including six of each stimulus pair. Listeners performed 3 runs at each frequency range. The subjects were H, J, and C. The results are found in Table IV.7 below, which contains each listener’s discrimination results for each stimulus pair. The ratios listed in each cell show the number of correct responses versus the number of trials for that particular pair. Listeners were expected to have no trouble with the task, rating incoherent (Nu) noise as “broader” than anticorrelated (Nit) and coherent (N o), and anticorrelated noise as “broader” than coherent. The listeners met these expectations with virtually perfect scores for all combinations except those comparing coherent and anticorrelated noise, for which they were generally below the 75% correct detection threshold. This seemed counter- intuitive, since informal experiments suggest that it is easy to distinguish between No and N7: stimuli. Table IV.7 - Informal noise-type discrimination 1K No vs Nu N71 vs Nu No vs N1: H 18/18 17/18 16/18 J 18/18 18/18 13/18 C 18/18 l7/18 4/18 10K No vs Nu N1: vs Nu No vs N‘It H 18/18 18/18 13/18 J 18/18 18/18 7/18 C 18/18 18/18 7/18 Results from the informal noise-type disrimination runs. stimulus pairs were intermingled during the run. 181 Trials for each of the three Experiment 2 The anomalous results of Experiment 1 prompted a more formal investigation with computer-controlled runs and automated response collection, in which the listener was asked to compare just two of the three stimuli. During a run of 40 trials, the listener would indicate which interval had a greater ASW by pressing one of two buttons on a response box. The left button corresponded to the first interval, and the right to the second interval. Additional listeners joined those that participated in Experiment 1. The subjects were H, X, J, C, R, L and Z. Stimulus duration and frequency distribution were the same as the 10 kHz case described above, and the results are placed in Table IV.8 below. Again, all listeners performed excellently at No/N u and Nit/N u discrimination. In the case of the No/Nir stimulus pair, H, X and C missed one or two of the first few trials, and then answered correctly for the remainder. Subsequent runs resulted in perfect scores (100% correct). Listeners R and L performed perfectly throughout. Listener J’s overall score of 75% does not fully represent the interesting progression of her runs. She exhibited a longer learning process than other listeners, beginning with scores near 50% and gradually improving each run until she was consistently scoring 95% correct. However, when the set of No/Nit runs was interrupted by an No/Nu run, she seemed to lose some of her previous accuracy on the next No/Nir run. J would then regain more accuracy with each repeated run. Listener Z, while lacking the amount of data that J accumulated, also exhibited a learning process, though he did not achieve as high an accuracy. He did improve from 50% correct to about 75% correct in his final runs. 182 Table 1V.8 - Formal noise-type discrimination N0 VS Nu Nit vs Nu No vs Nit H 40/40 40/40 1 58/160 X 40/40 40/40 1 19/120 J 80/80 1 19/ 120 450/600 C 40/40 40/40 77/80 R 40/40 40/40 40/40 L 40/40 40/40 40/40 Z 39/40 3 8/40 87/ 1 20 Results for the computer-controlled discrimination runs. Each run featured only one norse pair. J’s and Z’s apparent uncertainty raised a possibility that the presence of incoherent noise in the rotation of Experiment 2 served to confuse listeners. On a relatively easy trial with coherent and incoherent noises, the listener would be distinguishing between a compact image and a head-filling noise. On the next No/Nit trial, the distinction would not be nearly so clear, and the listener may have difficulty dealing with the range of coherence values. Similar “range effects” have been previously identified for tasks where the listener was asked to discriminate the more intense of two intervals (Purks et al., 1980), and for ITD/ILD localization in large and small ranges (Koehnke and Durlach, 1989). Both articles showed that listeners could correctly identify very small changes in intensity or localization, respectively, as long as the ranges of intensities or differences were small. When presented with larger ranges, the listeners’ just noticeable differences grew accordingly. Experiment 3 To test whether the mixing of stimuli had indeed confused our listeners, Experiment 3 was based on the same format with one change: while one interval was set to be coherent, the other was randomly chosen as anticorrelated or incoherent. The 183 modification allowed the formal computer-controlled runs to mix up the stimuli as done in the first experiment, and Experiment 3 could study what effect, if any, was introduced by the coherent-incoherent trials. In addition, it allowed experimenters to observe confusion due to “range effects” on a trial-to-trial basis, instead of between runs. The results are summarized below, in Table IV.9. Perfect performance by every listener for No/Nu trials is expected, and therefore uninteresting. Discussion will concentrate solely on results for No/Nit trials. Listeners C, L and X were unaffected by the stimulus switch, continuing to discriminate coherent noise from anticorrelated as well as they did in Experiment 2. Listener J was able to answer correctly 64% of the trials, which was a considerable loss of accuracy from her previous 75% score in Experiment 2. Her results were paralleled by listener Z, who only managed 57% correct when he had neared 73% previously. Listeners H and R averaged only 80% correct for the coherent-anticorrelated trials, compared to near-perfect performance (99% and 100%, respectively) when incoherent noise was excluded in Experiment 2. Table IV.9 - Noise-switched discrimination No vs Nu No vs Nit H 104/104 114/136 X 45/45 34/35 J 106/106 86/134 C 57/57 63/63 R 59/59 48/61 L 49/49 71/71 Z 83/83 44/77 Results for the noise-switched discrimination runs of Experiment 3. Each trial had an equal chance of comparing coherent-incoherent noise or coherent-anticorrelated noise. 184 The reduction in performance exhibited by the majority of listeners indicates a strong confusion effect. Indeed, the inclusion of a wide range of correlations on a trial- to-trial basis caused more difficulty than the run-to-run confusion identified in Experiment 2. Conclusions While this study began as a simple test, intended to show that listeners could easily distinguish between the stimuli used for ILD, it produced some interesting results. Listeners did eventually prove that they could distinguish between coherent and anticorrelated noises. See Table IV.10 for a summary of relevant results. Only in three cases did a listener perform worse than the 75% threshold. Those that demonstrated difficulty in discriminating No/N it stimulus pairs (specifically in Experiment 2) also improved with training. Table 1V.10 - Summary of noise-type egreriments Listener H X J C R L Z Exp.2 100.0 100.0 100.0 100.0 100.0 100.0 97.5 No szu Exp.2 100.0 100.0 99.2 100.0 100.0 100.0 95.0 Nit szu Exp.2 98.8 99.2 75.0 96.3 100.0 100.0 72.5 No szit Exp.3 100.0 100.0 100.0 100.0 100.0 100.0 100.0 No szu Exp.3 83.8 97.1 64.2 100.0 78.7 100.0 57.1 No vs Nit Percent correct in each experimental case by listener However, several listeners showed evidence that reflected the importance of range and previous stimulus on coherence discrimination. The declines in accuracy from 185 Experiment 2 to 3 were significant (t(6) = 2.471 , p = 0.048), a sign that binaural systems may be confused by the variable ranges of stimulus coherence. The context-coding mode of sensory memory (Durlach and Braida, 1969) could explain the “range effect”. Context-coding refers to a listener’s ability to compare the aspects of a stimulus with the overall context of stimuli given in an experiment. The relevant example from the noise- type discrimination study is the comparison of coherence values, or “diffusiveness” in the head. Such a comparison would suffer from internal noise, caused by imprecise memory of the context. According to Durlach and Braida, “. . .the amount of noise depends on the width of the context, larger widths leading to greater noise.” Thus, the “range effect” leads to confirsion on No/Nit stimulus pairs when they are mixed with No/N u pairs. The most significant question that needs answering is: what implication do these results have for the ILD experiment? The changing amount of internal noise shouldn’t impact coherence-related ILD thresholds, which involves only one stimulus type per run. The original goal was to show that the three stimuli are binaurally distinct due to the differing auditory source widths of their intracranial images, as asserted in the Methods section earlier in this chapter. The listeners bore that prediction out, for the most part. 186 References Békésy von, G. “Zur Theorie des H6rens: Uber das Richtungshfiren bie einer Zeitdifferenze oder Lautstéirkeungleichheit der beidseitigen Schalleinwirkungen,” Phys. Z. 31, 824-838, 857-868 (1930); cited by Blauert (1983) page 162. Dunn, 0.1. “Multiple comparisons among means,” J. Amer. Stat. Assn. 56, 52-64 (1961). Durlach, N1 and Braida, L.D. “Intensity Perception. 1. Preliminary Theory of Intensity Resolution,” JASA 46, 372-383 (1969). Grantham, D.W. “Interaural Intensity Discrimination: Insensitivity at 1000 Hz,” J. Acoust. Soc. Am. 75, 1191-1194 (1984). Grantham, D.W. and Ahlstrom, J .B. “Interaural intensity discrimination of noise as a function of center frequency, duration, and interaural correlation,” J. Acoust. Soc. Am. 71, SS6 (1982). Green, D.M. Profile Analysis Auditory Intensity Discrimination, Oxford Psychology Series No. 13, Oxford, New York, pg 20. Green, D.M. and Swets, J .A. Signal Detection Theory and Psychophysics, J. Wiley, New York. Hartmann, W.M. Signals, Sound, and Sensation, American Institute of Physics, New York, 1997. p. 384. Hartmann, W.M. and Constan, Z.C. “Interaural level differences and the level-meter model,” J. Acoust. Soc. Am., manuscript accepted (2002). Henning, G.B. “Detectablility of interaural delay in high-frequency complex waveforms,” J. Acoust. Soc. Am. 55, 84-90 (1974). J esteadt, W. and Weir, C. “Comparison of monaural and binaural discrimination of intensity and frequency.” J. Acoust. Soc. Am. 61, 1599-1603 (1977) Koehnke, J. and Durlach, N.I. “Range effects in the identification of lateral position.” JASA 86, 1176-1178 (1989) Koehnke, J ., Colbum, HS. and Durlach, N.I. “Performance in several binaural- interaction experiments,” J. Acoust. Soc. Am. 79, 1558-1562 (1986). Lindevald, I.M. and Benade, A.H. “Two-ear correlations in the statistical sound fields of rooms,” J. Acoust. Soc. Am. 80, 661-664 (1986). 187 Moore, Brian C. J. An Introduction to the Psychology of Hearing. Academic Press, New York, 1989. p. 196 Pickles, J .0. An Introduction to the Physiology of Hearing. Academic Press, New York, 1982. pp. 173-177 Pollack, I. and Trittipoe, W.J. “Binaural Listening and Interaural Noise Cross Correlation,” J. Acoust. Soc. Am. 31, 1250-1252 (1959). Purks, S.R., Callahan, D.J., Braida, L.D. and Durlach, N.I. “Intensity Perception. X. Effect of Preceding Stimulus on Identification Performance.” JASA 67, 634-637 (1980) Stem, RM. and Trahiotis, C. “Models of Binaural Interaction.” Hearing. ed. B. Moore. Academic Press, New York, 1995. pp. 347-386. Yost, W.A. “Lateral position of sinusoids presented with interaural intensive and temporal differences,” J. Acoust. Soc. Am. 70, 397-409 (1981). 188 Figure IV.1. Depiction of a two-interval ILD trial in Experiment 1. The dashed lines represent a level of 60 dB for conditions (a) and (b), which are equally likely. The levels of left and right ears appear under “L” and “R” for each interval. Both (a) and (b) diagram a right-left trial according to ILDs, but (a) establishes the ILD by intensifying one ear above 60 dB while (b) attenuates the other. 189 _0>.._m+c_ 0C0 _0>Lm+c_ +3 Figure NJ 190 Figure IV.2. Depiction of a two-interval DLI trial in Experiment 1. The dashed lines represent a level of 60 dB for conditions (a) and (b), which are equally likely. The levels of left and right ears appear under “L” and “R” for each interval. Both (a) and (b) diagram a trial where the second interval is more intense, but (a) establishes a difference by intensifying the second interval above 60 dB while (b) attenuates the first interval. 191 Figure 1V.2 ...... E m ._ m .0 _0>:w+:_ 0:0 _0>:m+:_ +3 192 Figure IV.3. Results of Experiment 1: Broadband signal. The x-axis lists the six listeners that participated, and each features ILD discrimination thresholds (denoted along the y- axis) for broadband (20 Hz - 10 kHz) correlated, anticorrelated and uncorrelated noises, as well as DLI thresholds. The shaded bar below 0.15 dB represents an experimental limitation that invalidates results in that region. Error bars on data points represent two standard deviations. 193 020: 0:0 0:95: 2 _1_ m >> // //fl 2 fl /.., /.,/ /, // ././/.///./N/l // / /U.//.d J/x/H/ // , / //,.// //././ ’7, /. /.... // ///., .ghWWM/flzgzwxwwwzxammeeawfl%waw Z? // ;,7/.fr/V////V 9/0...0%???./.W..w//w,./V/7%/,V/V,_. //.flfl%x - w WW mm W mwmm 0 :0 o - . E8010“; 0 - _Um+O_mLLOU_._.CO D 1 E80000 o - 00.0 00.0 0Y0 00.0 00.0 . 00.0 . 00.0 . 0v; . 00.0 . 00.0 . 00.0 Figure IV .3 (GP) G'II Ploqseuqi 194 Figure IV.4. Results of Experiment 2: Low-pass filtered signal. The x-axis lists the six listeners that participated, and each features ILD discrimination thresholds (denoted along the y-axis) for narrowband (20 —- 1000 Hz) correlated, anticorrelated and uncorrelated noises, as well as DLI thresholds. The shaded bar below 0.15 dB represents an experimental limitation that invalidates results in that region. Error bars on data points represent two standard deviations. 195 020: 0:0 75:32.0 2, 2 A I N m 77,777 ..477/777/7 4, /.A / / 7 A77 ..77.7.../ 7.x 7// 747W7/.7//, .7.../ / /... .. 7.77.7./7... .777 7777.. ../ xxxxx oo o 7.7 ., . 7 77 7777. 7 7 . 7. 777. . 77777, . , ,7 . ,. 77777.77 .7 , 7... w, 7 7 .7 7 7 . 7 /.... .r/ /7.7/ 1777.77 7.7.//./7./7 7/...7.7r7.7..7.7. . 77 7 7 .7 ...... 77./. 7 .. 777.. . / 7 .7 . ..77 7.../ r77 .7 7 7. / 77 7/./// .77.. 77 .. ..7 x7 .7 /7 7 7/// 7 /- .7 77 7 77.7/7/ .../..7/./..,./ /7//W 7,7..V7..7 .7 77, .. . 7./.7 V///// 7/_777/..... ..V/..7. 7.. .777... 7777 /. , // 7 77/47V/ //7 77. .7/ 7. ..7. 7 . . rm . 77. 77.. ////7 r... ,. .7../7 .. 7. 77 77 .7 777 I r .. 00 0 I I W - 8.0 .. . 0 - 8.0 7 m w - 8.0 r r 00.0 r .. 00.0 r r 0v; . :0 o o . 7 7Emaogoo£ O -7 - 00.0 . 0m+0_0::oo::0 D . 7 77:80:00 0 r 00; b L n - p [I OO.N Figure 1V.4 196 (SP) Cl'll 1310939qu Figure IV .5. Depiction of a two-interval ILD trial in Experiment 3. The dashed lines represent a level of 60 dB for conditions (a) and (b), which are equally likely. The levels of left and right ears appear under “L” and “R” for each interval. Both (a) and (b) diagram a right-left trial according to ILDs, but (3) establishes the ILD by intensifying one ear above standard level while (b) attenuates the other. Intervals feature a randomized standard level, which lies within 5 dB of 60. This figure is identical to Figure 1 aside from a 62-dB standard for the first interval and a 59-dB standard for the second. 197 _o>Lm.E_ vcm _O>Lw._.C_ +mfi Figure IV .5 198 Figure IV.6. Results of Experiment 3: Low-pass filtered signal with randomized standard level. The x-axis lists the six listeners that participated, and each features ILD discrimination thresholds (denoted along the y-axis) for narrowband (20 — 1000 Hz) correlated, anticorrelated and uncorrelated noises, as well as DLI thresholds. The shaded bar below 0.15 dB represents an experimental limitation that invalidates results in that region. Error bars on data points represent two standard deviations. I99 l l l 1 mfloc vco memth 2, _2 7 I N m 777. 7 .7../V7 7”V.V/V77. VVV../ 7V7../...V7V../.V/ 7../....77 7777.774 7.77 VV77V7J77VVV7 77.7.7 .V..7 77.77477 7.7.7.7 ..777/VV..7 /VVV7777d.7 .7...V..7. 7.77/7 /7.777 V.V ....7 ....V.~ ..7/..7 VVV7..V.7.4 V .7 _ 7. . . , .V 77777....V77...7VV..V 7.......7..7. V ,. .777.....V .. .....77...V..7.7. ..V..7. 77.//. Z . . 7 7/ .7../7 7.7.... .7....7 ../7 . 7. V7 V.. .7 67M 7.... V ..77.V.../.V... ..V 77 V...7V..7V..77 V.V.V 7.7VV V.V/... . .7..7.V./.7/fl.V// V../.V V../...... 7V/V7 V.V V.V.V /./.7V7.. 7777......7 ..7-..V 7VV7V.V V.. V/V 7.7...VV. V.VVlV .V....V... 7V7777LV 7V7V/lA I I 7 ._ I .7 ._ . W- . w . I II I 7 % . I l u . w ‘ .. u . I l . :o o L- . 7 28202: O - . 8337.32.78 D . 7 E9558 0 - .. . cod omd .. ovd omd omd 00; cm; 9: om; om; ooN Figure IV.6 (8P) CI'II PIOHSSJHJ. 200 Figure IV.7. ILD discrimination according to signal detection theory (TSD). All five plots show detection as a probability density curve. The lefimost two represent the intensity measurements of a single interval in lefi and right ears. The signal is more intense in the right ear, and thus in; (the most probable measured intensity) is greater along the intensity axis than H. The standard deviations of both curves (marked with dashed horizontal lines and 0 symbols) are assumed to be equal. The top-center plot results from the difference between the leftmost curves, leading to a most probable ILD A1 for the first interval. The bottom-center plot is the result of the second interval, with ILD reversed. The rightmost plot shows the probability of judging the ILD shift from right to left (the sum of all- probabilities to the right of zero). 201 Mme-Id 'I'I‘I‘I‘I'ITI‘I‘I'I c - s p d.- - d - 1 L- .- P d . T ,. _u —-------- -occcu-ndocpccnucnadQ: D ‘ S ob . 7 33 r- “ o =‘. b ‘ b I - - I- ‘O o :- ~° JnlnlnlnlLJllnlnlrln I N - O o o 6. =2 0 o Mme I'IJI'I'I'I'I'I'IYI'I g D 1“ - d D d h d b d - d A . u- ‘ ..- - - S .0 88 a“ -3: '3 C 2 _c_ J a o lLllJl!llAlllllllAlL N. "' o a O‘ O. O. O 9 Macaque 202 j'l'l'l'l'le'II’T'lo L- .(o O I ‘- b : cl 0 v- 0 q r- : d r z '1 r -------------- -----—S------- '? :— cl .. 7 «2 av r t _ .1 a; P * o I— q d b 4 t— -I r- J " W P 18 r- 42 llljlllllllllllLllll | N — O O 9 =2 =2 O O Y I V I I I V I I I I I V l V ' V | I I (er-b mile) 0' . 7 d b d h ‘ I— d b ‘ III l- IO qS ‘ llllljllllllllllln'j | N 'I' O a a a. o. O O I'I'I‘I‘I‘I'I‘I'I'Io :- O .. ‘8 .. J - 4 b d I— — ”annuity (orb mils) lllilililnl.l.l.l.l N — O D o a. O O “mud Figure IV.7 Figure IV.8. Loudness-meter model predictions compared to results from Exps. 2 and 3. The histograms show average threshold ILDs for all four stimulus types in Experiments 2 and 3. The dotted, solid and dashed lines represent predicted thresholds for integration times of 200, 300 and 400 ms, respectively. For the correlated condition, all three times result in the same prediction. 203 30 on? 95875 m vm+o_mccooc3 Uw+o_wccoo:c< 8,669.00 ma ,, , , _ ,, l , _ t 8.0 m 4 ; , V . m , I; , , f ; A - o; \ ; POV hx I 4 Iu ., ix fl/fl/J / L Aw 8.0 a lflrifikl .. _ _ i N homo % ................ u. m: D. - 03 I N as g m we 08 uuuuuuuu 9: oomll. -02 ) mE cow ................ m .+me I W _mUOE . (\ cm.“ 204 Chapter V: Discrimination of head-related decorrelation Introduction Consider the real-world case of a sound arriving at a listener’s ears. Refer to Figure V.1, borrowed from Kuhn (1987), for a simple diagram of this situation. This is a simplified model of the head, with ears placed at two points opposite each other on a sphere. Looking down on the head and an incident plane wave, it is clear that unless the listener is facing directly toward the source, one ear will always be caught in the “head shadow”. The shadowed ear is not cut off from the signal, however; the diffraction around the head, as well as reflections from the shoulders and torso, provide an indirect passage for sound. In the interest of simplicity for all further analyses, this discussion will focus on the diffracted sound with the system including just a spherical head, and study the simplified setup of Figure V.1 in great detail. The resulting stimulus at the shadowed ear has a particular set of phase and amplitude characteristics due to diffraction, which has been treated theoretically in the following equation for pressure induced by a plane wave on a spherical surface found in Kuhn (1977). It becomes useful when approximating the human head as a sphere with the cars at two opposing points (as in Figure V.1): [ 1)2[nmaxz'"+l(2n+l)Pn(cosfl) (1) n=0 1"): (ka) _ iy'n (ka) 205 The left term is the sum of incident and scattered pressures, normalized by the free-field pressure. Argument k is the wave number for the relevant frequency (equal to 21: times the frequency divided by the speed of sound), a is the radius of the head (approximately 8 cm), n is the order to which this sum is calculated (for purposes of this analysis, nmax=12 was sufficient), and Bis the azimuthal angle of a point on the head (measured from the direction of the incident plane wave). Referring again to Figure V.1, angle (9is defined as 90- 6M for the right ear, and 90+ 6m for the left. Functions P", j 'n and y'n are the nth-order Legendre polynomials and the first derivative of the spherical Bessel and spherical Neumann functions, respectively. When evaluated at the left and right ears, this equation leads to theoretical interaural phase shifts due to diffraction around the head, which, when divided by their respective frequency, yield Interaural Time Differences (ITDS). Those calculations are summarized in Figure V.2, which plots ITD versus frequency for incident angles of 10, 20, 30, 45, 60 and 90 degrees. The open circles on the left represent the low-frequency ITD limit for each angle. This limit is determined by reworking equation (1) with the assumption that ka < 1 (low frequencies have small k) (Kuhn, 1977): E 1i i(— ka) sin (9m (+ for near ear, - for far ear) (2) Since this value is complex, the pressure’s phase at each ear can be found by simply taking the arctangent of real divided by imaginary parts. Thus, the ITD in the 206 low-fiequency limit can be approximated by the phase difference between near and far (shadowed) ears: Ba , ITD E 75m 19,-m. (3) where c is the speed of sound in air (343 m/s). Proceeding'with other features of Figure V.2, the dashed line delineates phase shifts of half a wavelength for each frequency, at which point it is impossible to establish which ear is leading or lagging. The black squares on the right represent a theoretical high-frequency limit, which has an approximate value given by: [TD 5 7 Sin 6inc (4) Thus, it is clear that the high-frequency ITD is simply 2/3 that of the low- frequency ITD, and the squares are positioned according to that rule. The open triangles to the extreme right are predictions of the Woodworth formula 0 . ITD; -[0+sm6] (5) c another approximation that ignores reflection, but accounts for opacity of the head. It is considered to be a close approximation for the ITD of high-frequency signals (Kuhn, 1977). The plots for various angles terminate at several different frequencies, with low 207 angles including data beyond 5 kHz and high angles only extending to 1.2 kHz. This is due to computational limitations. The current study With data from Kuhn’s spherical-surface pressure equation, one can generate noises for left and right ears that demonstrate the correct distribution of ITDs (according to our calculation), and thus simulate dispersion of sound about the head with headphones. If the head-dispersed noise is treated as a slightly incoherent noise, it can be used in a coherence discrimination experiment (much like in Chapter II) that requires listeners to compare two intervals with the task of distinguishing the head-shifted stimuli from perfectly coherent noise (with uniform ITD over all frequencies). Theoretically, the listener could find it difficult to identify which was more incoherent, since he or she would be familiar with the particular incoherence introduced by the head. The proposed experiment is much like Kulkarni’s sensitivity experiment involving head-related transfer functions (HRTFs) (Kulkarni et al., 1999). In that paper, listeners were asked to distinguish empirically-measured HRTFs from those that were reconstructed according to a model. The experiments reported here asked listeners to compare “head-shifted” stimuli modeled on Kuhn’s theoretical sphere calculations with a constant-ITD coherent noise. Frequency-independent amplitudes are matched for both intervals. Note that Equation 1 can also derive Interaural Level Differences (ILDs) in addition to ITDS. However, the following experiments focused attention on the phase dispersion by eliminating all other cues, including level differences. Previous studies have also found that these head-related ILDs have little if any influence on the perceived 208 accuracy of an headphone-reproduced HRTF (Kulkarni et al., 1999; Kistler and Wightman, 1992). Also, the model calculations allow isolation of head dispersion from pinnae cues and reflections off the shoulders and torso, which would necessarily be included in any measured HRTF. Thus, the discrimination task is based solely on the difference in phase shifts. The constant-ITD (frequency-independent delay) stimulus is the representation of perfect coherence under lab conditions, using headphones. Such a stimulus approximates what the auditory system would detect if the listener’s head were transparent to sound. This chapter is intended to answer the question: are listeners sensitive to the incoherence introduced by head dispersion? Experiment 1: Discrimination at two incident angles Method Experiment 1 featured two incident angles for the head-dispersed noise: 30 and 45 degrees. Both have an interesting ITD characteristic (refer to Figure V.2). Among the various angles presented, the 30-degree plot exhibits the steepest change in delays according to frequency (in the region of 800 Hz), while the 45-degree calculations exhibit the most extreme ITD shift within its range (from 563 us to 411 us). These two stimuli should be the most distinguishable from uniform ITD, and thus have the best chance of identification by listeners. The experiment tested listeners at three different bandwidths, identified as broadband, narrowband, and narrow-narrowband. The purpose of each was to provide the listener with a different perspective on the head-shifted interval, and establish if one 209 of the bandwidths made it easier to identify. Since the head-shifted noise was produced digitally, the band-pass filtering was simple to achieve and absolute. The broadband (BB) case included all head-shifts as calculated by our spherical-head program, for a range of frequencies from 20 Hz to 3 kHz for both angles. Note that the 45-degree plot on Figure V.2 only includes data up to 2 kHz--this was remedied by assuming that the behavior after the last known ITD value was asymptotic, and filling the region from 2-3 kHz with that value. These upper limitations were likely of no consequence, since consensus in the literature agrees that ITDs above 1.5-2 kHz have little effect on the perceived image (Hartmann, 1996; Kulkarni et al., 1999; Wightman and Kistler, 1992; Brungart et al. , 1999). Thus, the experimenters felt confident in performing this experiment without head-shift data above 3 kHz. Narrowband (NB) limits were defined by the region where ITDS went from a maximum to a minimum value, presenting listeners with only the band of greatest change. For the 30-degree data, that band was 400 Hz to 1200 Hz, while the 45-degree equivalent was 400-1800 Hz. Finally, narrow- narrowband (NN B) stimuli were limited to the steepest part of the ITD change, the region of greatest slope. These bands were 800-1000 Hz for 30-degree data, and 600-800 Hz for 45-degree data. To check that the head-dispersed interval did indeed contain the appropriate ITD characteristics, as calculated according to frequency, it was tested with a digital delay line and an HP spectrum analyzer, both of which were independent of the Tucker-Davis equipment used to generate the stimulus. The input was generated by delaying the left- ear signal by X us, inverting it and adding on the right-ear signal. The resultant spectrum, which was of equal amplitude, exhibited sharp dips at frequencies where the 210 head-shift ITD was X us in the right ear. This allowed the identification of several expected delays, and verified that the head—shifted stimulus was produced correctly. To present a rigorous test of the perceptual similarity (or dissimilarity) of head- shifted vs. coherent stimuli, one must eliminate all non-coherence cues that could interfere with the planned task. The bandwidth and amplitude characteristics of both stimuli were matched. Also, one would naturally choose to eliminate position cues from the task by selecting an ITD for the coherent interval that placed it in an equivalent lateral position with the head-shifted interval. Experiment 1 actually used a range of ITDS, so that on each trial, the coherent noise had a random delay chosen from a set of 10 experimenter-defined values. Thus, the coherent noise would not be easy to identify simply because the listener recognized its lateral location after a few trials. For example, one head-dispersed interval was generated using the 30-degree data from 20-3000 Hz (the upper limit dictated by calculation limitations, as noted above), with equal amplitudes across the band. The experimental program provided a coherent-noise stimulus of identical bandwidth and amplitude, with an ITD selected from the set of 260, 280, 300, 320, 340, 360, 380, 400, 420, and 440 microseconds. Note that 260, 420 and 440 us are all outside the array of ITDs included in the 30-degree plot. These conditions were included as a test, to establish how well the listeners could use a lateral cue. Both constant-ITD and head-shifted intervals were generated digitally by a Tucker-Davis Technologies Array Processor. The processor constructed both in frequency space, setting the upper and lower band limits and filling between them with equal-amplitude random-phase noise. For a head-shifted interval, the processor would then introduce the pre-calculated interaural phases. For a constant-ITD interval, it 211 imposed one of 10 evenly-spaced ITD values, ranging from just below to just above the array of head-shifi values for the corresponding incident angle and bandwidth. Below are the ITDs selected for the six different conditions: 30-degree, BB/NB: 260, 280, 300, 320, 340, 360, 380, 400, 420, 440 us 30-degree, NNB: 290, 298, 306, 314, 322, 330, 338, 346, 354, 362 us 45-degree, BB/NB: 400, 420, 440, 460, 480, 500, 520, 540, 560, 580 us 45-degree, NNB: 505, 510, 515, 520, 525, 530, 535, 540, 545, 550 us The four listeners were each identified by a single letter. 0 T, male age 24, normal hearing, no previous listening experience a W, male age 60, common middle-age high-frequency hearing loss, extensive listening experience 0 X, male age 26, normal hearing, previous experience in listening studies 0 Z, male age 27, normal hearing, previous experience in listening studies During runs, listeners sat in a double-walled Acoustic Systems soundroom listening with Sennheiser HD—480 II headphones. The experimental runs consisted of 100 two-interval forced-choice (21FC) trials (ten for each coherent ITD value). On each trial, the program would present the listener with two 500-ms intervals in random order. One was a constant-ITD stimulus, and the other included head-shifts. Both intervals had rise/ fall times of approximately 30 ms, and had simultaneous onsets/offsets in both ears. The lack of onset cues made it necessary for listeners to identify dispersion in the steady- 212 state sound, if possible. Also, this decision was justified by the work of Tobias and Schubert (1959), who found that onset ITDS are superseded by steady-state ITDS when localizing noises that exceed 125 ms in duration. After presenting both intervals, the program prompted for a response, and the listener pushed one of two buttons to answer which interval was head-shifted. Listeners were instructed to identify the head-shifted stimulus by its incoherence, probably using the Auditory Source Width (ASW) of its image as was done in Chapter II’s experiments. The ASW refers to diffuseness, or “broadness”, of the intracranial image. Coherent stimuli are very compact, producing a small ASW, while incoherent stimuli result in a large ASW. Following several training runs, during which the program provided feedback indicating the correct answer after every trial, listeners participated in four runs at each combination of incident angle (30 and 45 degrees) and bandwidth (BB, NB, NNB). Results The experimental results are found in Figures V.3-10. The data are averaged over runs, showing percent of trials answered correctly (counted as selecting the head-shift interval) versus the ten ITDs applied to the coherent-noise stimulus. The bars on each point represent two standard errors of the mean. Standard error is defined as the standard deviation normalized by the square root of the number of data points minus one (in this case, we have four data points). Each graph includes three plots, one for each bandwidth condition. The horizontal bars labeled “BB/N B” and “NNB” range delineate the set of ITDS included in head-shifted stimuli of those bandwidths. The dashed vertical line 213 ' corresponds to the low-frequency ITD limit for sound dispersed about the head, which was represented by the open circles in Figure V2. The dashed horizontal line indicates the 50% correct level (guessing), which is the performance expected if listeners could not distinguish between constant-ITD and head-shifted intervals. Figures V.3-6 are 30- degree results for listeners T, W, X, and Z, respectively. Figures V.7-10 are 45-degree results for listeners in the same order. Taking all these complex data into account, there are two features of note. Listeners W and Z could indeed distinguish head-shifted intervals from constant-ITD, achieving very high success rates for ITDs that were close to or beyond the extremes for that particular head-shifted stimulus. However, when the constant-ITD fell nearer the center of the ITD range, performance reduced to guessing. Listeners T and X were not as successful at the task, and their results hovered near 50 percent. They did meet or exceed 75 percent for a few ITDS. X did so when the ITDs were 400 us in the 30-degree NB condition, 400 us in the 45-degree NB condition, and 400-440 us in the 45-degree BB condition. T did so for ITDs of 260, 300 and 320 us in the 30-degree BB condition, as well as 260, 420 and 440 us in the 30-degree NB condition. Note that all these ITDS are near the limits for their corresponding head-shifted stimuli. Discussion The results of Experiment 1 suggest the probability of a lateral cue. That is, some listeners may not have been performing the task using coherence discrimination, but rather learned to associate the head-shifted interval with a well-defined and absolute lateral location. The roving coherent-interval ITD only prevented the use of cues based 214 on relative lateralization. The range of ITDS was selected to include delays found within the head-shifted stimulus, with two or three lying outside that range. However, it seems that listeners W and Z tended to identify the head-shift image with a particular delay. If the ITD interval image was close to that delay, W and Z could do no better than guessing. However, when the ITD interval was significantly far (20-40 us) from that location, their rate of correct responses increased dramatically, reaching 100% once the delay had shifted 100 us or more (typically outside the range of delays found in the head-shift). Interviews with the listeners confirm that while they had been asked to discriminate based on “width” of the image, their training with the feedback led to lateral discrimination. Listeners X and T reported that they also concentrated on the lateral cue, but as their results show, they were not as successful at discriminating the image shifts. It is possible that they were not as astute as W and Z at discriminating lateral movement. They did, however, manage to distinguish the head-shifted interval from some constant ITDS, specifically those that were near the outer edges of the head-shift range. Experiment 1 did show that the auditory system has difficulty discriminating head-related dispersion from a constant ITD. However, the experiment also yielded different information than originally expected, because some listeners were able to follow an alternative cue to dispersion: image movement. Listeners W and Z demonstrated an inability to distinguish a head-shifted image from a constant-ITD band when the two were at the same lateral position. If the listeners had identified an additional cue, such as the image’s ASW, then the loss of lateral position might have reduced their hit rate to 80%. Because performance reduced to guessing, Experiment 1 provides strong evidence of listeners’ insensitivity to head dispersion. Another interesting result was the apparent 215 lateral location of the head-shifted stimuli. Note that they tended to be well-removed from the low-frequency limit, the delay that characterizes the stimulus without dispersion. While the listeners were unable to successfully distinguish the head-shifted intervals, they did see some effect from the pattern of dispersed phase delays. Experiment 2: Randomized angle Introduction The above results showed that listeners could not identify the head-shifted interval when its lateral position was near that of the constant-ITD interval. Unfortunately, that coincidence of location only occurred in 10-20% of trials, and listeners may have missed a subtle but useful cue while they were concentrating on the lateralization cue. Thus, the experimenters decided to design a new experiment that would truly eliminate the usefulness of lateral position, leaving listeners with only interaural correlation as a means for discrimination. Method The follow-up experiment used six different stimuli: a set of three head-shifted bands associated with various angles of incidence, and three corresponding constant-ITD bands. Each trial would select the two intervals randomly, one from each group so there would be a comparison between head-shified and ITD stimuli. In this manner, the lateral positions of the two would be randomized, and provide no consistent cues for the listener. Aside from the roaming image positions, Experiment 2 was identical to Experiment 1, 216 with the same task: identify the head-shifted interval, possibly by detecting the ASW of the image due to incoherence. The experimental design includes three angles for head-shifted stimuli based on their potential for success, and previous experience in Experiment 1. On Figure V.2, it is clear that 45-degree data exhibit the largest overall shift in ITD, producing the most decorrelation and the best chance for our listeners to distinguish from constant-ITD. Also, previous experimenters have found that Kuhn’s model best approximates real HRTFs for a 45-degree angle of incidence (Wightman and Kistler, 1986). According to results from Experiment 1, when listeners attempted to identify the 45-degree head-shift stimulus, the ITD of greatest confusion (where they were reduced to guessing; see Figures V.7-10) was 520 us. This particular delay occurred in the head-shift data at a frequency of 727 Hz. It was decided, due to prior study in lateral discrimination and the steep changes in accuracy as ITD varied in Experiment 1, that 30 us was a detectable shift in lateralization. Thus, to establish two other angles of incidence that were distinct from 45 degrees, the spherical-head program was used to see which angles had an ITD of 490 and 550 us at 727 Hz, effectively bracketing the 45-degree image. Calculations showed that 42 degrees has 490 us delay at 727 Hz, and 49 degrees is delayed 550 us at 727 Hz. Thus, the angles of incidence selected were 42, 45, and 49 degrees. The BB, NB and NNB limits were the same as for the 45-degree case of Experiment 1. Preliminary Experiment: “Equivalence of position ” To ensure the constant-ITD stimuli matched up perceptually with the three head angles, preventing any further identification of absolute position, listeners performed the 217 21F C task with a slight change to the decision-making process. Instead of identifying the most incoherent interval, the subjects were asked to listen for lateral movement, and respond according to the direction that the image seemed to move between intervals (what was termed the left-right task). Assuming that the three head-shift images matched up laterally with the corresponding ITDs, the percent of responses that listeners judge the ITD interval to be lefi of the head-shift interval would be as shown in Table V.l below. The three columns include the three possible constant-ITD intervals, while the three rows are the three possible head-shifted intervals. Each cell then represents a trial formed by one combination of intervals. Table V.1 - Ideal lateralization results ITD l ITD 2 ITD 3 Head-shift angle 1 50% ~25% 0% Head-shift angle 2 ~75% 50% ~25% Head-shift angle 3 100% ~75% 50% To interpret Table V. 1 , start with the diagonal. Since the head-shifi and ITD images should be at the same lateral position, then on a trial where both seem to be lateralized at the same place, one would be equally likely to hear the ITD image to the left or right of the head-shifted image. Thus, it would sound further left about half the time. At the lower left comer is the result for all trials that include the leftmost ITD image and rightmost head-shift image. This combination should provide a clear lateral motion. The ITD lateral position should clearly be left of the head-shifi image, and listeners should judge that way 100 percent of trials. Moving on to the table’s upper right corner, it is composed of the rightmost ITD image and leftmost head-shift image. It is clear that, due to symmetries, the listener should judge the ITD interval to be further right for every one of these trials. The other cells of the table must necessarily have some 218 percentage that interpolates the predictions for the comers and diagonal. If these predictions hold, and the experimental stimuli are sufficiently randomized, then the overall results should amount to guessing (50%). With this theory in mind, the experiment calibrated for each listener by running the above task. Calibration runs continued, with adjustments to the various ITD values, until the experimenter recorded two runs with results corresponding somewhat to the expected results (granted that lateral cues were eliminated) laid out in Table V. 1. Listeners T, X, W, and Z required various ITDs to achieve the goal of comparable lateralization, and the lateralization-task percentages for the resulting ITD values can be found in Table V.2. This table actually comprises twelve tables, arranged in four rows (according to listener) and three columns, one for each bandwidth condition: broadband (BB, from 20-3000 Hz), narrowband (NB, from 400-1800 Hz), and narrow-narrowband (NN B, from 600-800 Hz). Each table is labeled “lat “/0” because it contains the percentage of “ITD interval to the left” responses (hereafter called “ITD-left judgments) for all combinations of trial stimuli. The individual tables have this format: % 51 4 25 25 75 25 4 1 75 75 75 25 Trials consisted of a head-shifted band selected from the left column and a coherent band with one of the three ITD-shifts in the first row (in us). The numbers 42, 45, and 49 are the incidence angle in degrees represented by the head-shift stimulus. The ITD stimulus was presented randomly on the first or second interval. The listener’s percent of ITD-left responses for each combination of angle and ITD indicated on the 219 Table V.2. Experiment 2 lateralization data. This table comprises twelve tables, arranged in three columns (representing three bandwidth conditions) and four rows (one for each listener). The lateralization data are used as a tool to calibrate the ITDs for perfectly-coherent intervals (the three column headers in each table) so the images match lateral position with the three head-shifted stimuli (angles in degrees listed as the three row headers in each table). The overall performance at each ITD and angle is listed at the end of each column and row, respectively. The overall performance of a listener at one condition is listed below the individual tables. 220 Z Z .1 .1 Z 1 1 1H 1 1 1ZN.1o1N1W1ZZ1m1.N1>oZ1 1Z1 111Z1 111 L1 1 Z1N.1N1N11 1.,ZZINLNNOZ 1 1 Z11 1 1Z1 1 W 1 1 me 1Z__NLN1>10Z1 1 1 1 1 11111Z111 111 .1111Z111Z111Z1 .1111. 1.1111Z1111Z1111 11 Z111.1 .ZN.N1N11..NNNN1WN.N1.1NZ 1 1Z1 1 Z 11 1 .1N.N1N1 1.N.NN ZN1.NN 11.. 11 ZZN.N1N111Z.N..11N1N.NN1Z 11 1 1 N11.NN .N..N11 ZN.1N1N11Z1N.NN .NN ..1 1 11ZN..1N .1N..N1ZN.N1N Z1N.NN .NN1 1 11N.1N1N11 .N.1NN1 N.1NN11N.NN ZNN N.N1N1.ZN.NN1ZN.NN1ZN.NN1ZNN1ZN.1NN ..N.N1N11Z1N.NN .N.NN NN111Z 11 1.N.NN1 N.N11N ZN.NN Z1N1.1NN1ZNN 1 N.NN1L N..NN WN.NN WN1..N ZNN 1..1 1 1 1N.1NN11WN.N1N .1N.NN11.N1.NN111ZNN 1Z1 1 ..NNN 1ZN.NN1 Z11N.1NN ZN.1NN111.NN 11 1 1 11 1 1LZ1NMNN11 LZNNN11 .NN1N11Z N..NZ. 111Z1 111NNN11WN1Nm 1ZNNN1 _11N.1.N_W11 11W11.NNN11Z.N.1N11Z.N1NN1 Z.1 N..1N_Z 1.13:3. Z Z 1 11111Z111L1 1 11 1111.11 1111Z1 1 Z1NNN1Z....1>10..11 1 .111Z111Z11 W1NNN111ZZ...16W 1 1.1 11Z11111111Z1111ZNNN1WZ1_.1.1.1>101Z .111 11 .. Z Z Z 111. 1,111 . .111111Z1111Z11111111111L111Z L N11Z 1ZN.NN1 .1N.1NN WN.1N1N . Z 1 1.1 1 1 N.1NN ZN..NN11ZN.N1N 1Z1 1 1 1 1 1.11 N..NN ZN.NN1 ZN.N1N11. 11 1 NMNN ZN..NN .N.NN N..NN ZN1N . .N.oN1 N.N.1 ZN.NN N..NN .NN 1Z1 1 1 N..NN 1N.N1N N..N N..NN ZNN 1 1 1 1Z 1 1W 1 1 l. 1 1 . 1 1 . . N..NN11 .N.NN N..NN ZN.1NN ZNN 1 1 1ZNNNN..N.NN ZN.oN N..N1 ZNN . 11WN.N1N ..N.1NN ZN.NN N..NN 1 Z 1 1 1 N.1N1N .1NNN1 ZZNNN ZNNN WN1N1 . 1 1ZN.NN1.N.1NN ZN.NN11N.NN ZNN .Z 1 N.NN11Z.N.NN1N..1N 1Z1N.N. Z1NN 1 1 1 1 1 ZNNN NNN WNN1N . N.1.NZ1 1 Z 1 1 .NNN1 1ZN1.N1 1.NN1N 1. N..NZZ1 1 Z11 1 .NNN ZNNN NNN1 1 1 N..NZZ 5822. Z Z Z Z .Z Z 1 1 11 Z 1 1 Z1 1 1 1 1 11 1 11ZNN. 11.__1...>o. Z 11 ZN... 1Z__1...>N.1 . 111ZNN11N1 ZZ_.1N.6Z1 11 .1 1 1 11111Z1 Z Z Z 11Z1111111.11-1 1.1 1. 1Z111.111.111;1111- 11 .NN N..NN WN.NN Z .N.NN1ZZ.1N.NN11ZN.NN 1 11 Z1 ZN.NN ..N.NN1 LN..1N1Z111 L 1 1 N.NN N..NN N..NN N..NN. NN N..NN N..NN11Z.N.NN ZN.N1N ZNN 1 1 ..N.NN 1 .N.NN N..NN .N.NN.1ZNN1 Z N..NN NN. .NNN ZNNN ZNN N..NN N.N. LN.NN N..NN ZNN NNN N.N ZN..N 1.NN1N 1NN1 N..NN 1.N.N1 1.N.N1N 1ZN.NN ZN1N1 . _N.NN .N.1N .N.NN1 N.N11N .NN Z 1.N.N1N1 .NN .NN. N..NN N 1 . .NNN .NNN1 ZNNN 1 Z N..NZ. 1 .NNN1 .NN1N1 o1NN1. NLNZ..111 1 Z 1 1.NN1N1 NE 1 .NN1N 1Z1NL1NZ1Z 31.2.22. Z 1 1 1Z Z 1 . Z 1 ..1. Z 1 . 1 1 1 Z1 1 1 .Z 1 Z N..NN .ZZNLNS. , , WN.NN .ZZN.N>o. 1 Z1 1 1. N.1NN 1._1_1N1.1N>o... L 11 1 W ..111.. 1Z .11 Z1111 11.1 Z1 1.1 11111Z1111 l 1 1 N.N1N ZNNN ZN.NN . .1 1 1 .N.N.1 W1N.NN11.N.NN 1. 1 1.1 Z1 1 1 1.N.NN1 ZN.oN1ZN.NN1 Z 1 1 N.NN.N1.NN N.1NN N..NN. NN Z 1 ZNNN .N.NN ZN.NN ,N.No. .NN ZN.NN NNN N..NN .ZN.8. ZNN N..NN ZNN ZNNN N..NN. NN 1 ,N.NN .N... .N.NN1 .N.NN1 .NN1 ZN.NN 1N.N .N.NN N..NN ZNN 1 N.N1. N.o ZNN N..NN .NN 1 Nb .NN ZNN .N.NN .NN. ,1N.1N. .NN ZN.N N..NN .NN .NNN1ZNNN1 B... Z N. ...Z Z ZNNN ZEN _NNN N..NZZ Z NE NE .NNN .Z N. E NNCNNNZZ .1 Z N22. 1 . . . Nz. Z1 Z 1 Z 1 1 Z. mm... 1 Emu cozm~1=mLuumZ1 N «coEZLwaxm .N. >1 wank 221 table. The final row and column are the percent ITD-left judgments for trials totaled over a particular ITD or angle, respectively. They are not averages of the percentages found in this table, because each condition did not necessarily contain the same number of trials. The final result found under each table is the “Overall” ITD-left judgments. This value comes from the total number of ITD-left responses (in all nine conditions) divided by 198 trials. All final calibrations were the best approximation established by the experimenter, though one might note that some are better than others. Listener Z meets the lateralization criterion quite neatly across BB, NB and NNB conditions. Listener W does well also, though his NB results are slightly different from the ideal. Listeners T and X have the most trouble with the left-right task. While their results show a general trend towards the theoretical goal, they seem to find less difference between the three head-shift images than the other two listeners, especially when limited to NNB frequencies. Note that T’s results aren’t any closer to the ideal of Table V.1 than X’s. In order to achieve the listed data, it was necessary for the experimenter to increase the differences between constant-ITDS to 35 us in the BB case, and 45 us for the other two bandwidths. Apparently, listener T was not as sensitive to ITD shift as listener X, much less W and Z. It is difficult to predict what effect T’s and X’s lower ITD sensitivity would have on the coherence-discrimination results, since they are not necessarily tied to lateral position. The goal was simply to ensure that the listeners couldn’t use lateral position as a cue, and it seems unlikely that T or X would be able to do so. There is some additional support for the calibrated delays, provided by Experiment 1. Returning to Figures V.7-10, recall the constant-ITD delays that caused 222 the most confusion (reduced performance to guessing) with 45-degree head-shifted stimuli. Listeners did worst at approximately 520 us in the BB case, 510 us in the NB case, and 540 us in the NNB case. These delays coincide well with the calibrated values found in Experiment 2. One can be confident that the lateralization task has established dependable “effective ITDS” for the head-shifted stimuli. Experimental task, Results and Discussion Once the appropriate ITDs were established, listeners T, X, W, and Z participated in four runs of 99 trials apiece for three bandwidth conditions (BB, NB, NNB), for a total of twelve runs. Researchers gave them the task from Experiment 1: in a randomly- ordered trial, discriminate the head-shifted interval from the constant-ITD interval by any means possible. With the lateralization cue likely eliminated after calibration, it was expected they would depend solely on the stimulus dispersion. The actual experimental results can be found in Table V.3, which is laid out in the same manner as Table V.2. The “Overall” percent correct is the rate of correct responses for all 396 trials. For a ZIP C task, the “detection threshold” is 75 percent correct; above that level, the experimenter can conclude that the listeners are not choosing randomly, but have a clear method of identifying the target interval. One can sum up the Experiment 2 data quite generally by checking the overall percents correct, and noting that no listener exceeds the guessing threshold at any bandwidth. The highest overall percent correct is 53. On closer examination, none of the individual head-shift rows or ITD columns shows performance over threshold, although they do range from 30 to 62 percent. Three individual head-shift/ITD combinations do indicate the possibility of non-random 223 Table V.3. Experiment 2 discrimination data. This table comprises twelve tables, arranged in three columns (representing three bandwidth conditions) and four rows (one for each listener). The percentages indicate the fraction of trial that listeners correctly discriminated one of three head-shified stimuli (angles in degrees listed as the row headers in each table) from one of three perfectly-coherent intervals with a particular ITD (the column headers in each table). The overall performance at each ITD and angle is listed at the end of each column and row, respectively. The overall performance of a listener at a condition is listed below the individual tables. 224 T ll ll Ls? __920 l - L - N - lNLNN- L.sNN LNLsm-ll- - L ..st l-Lleml NsmN Ls... s. . MslNN Nwmml- lLR-F-mlL .lemlm-.l-mlN-ll.-L Nb? L.slmM L.sFN L.sNN NNl ,lL lll. omm leleLPmlNlL lL-llent. -ll. l.ll-ll-.lfllllll-lllllll- lL - - L- -LsL.Nl_l_mmN..oL- - - L- lllfill+ll+ -l l l \bNN lL.L.lsNHl L.s-olml- L.solm LleNm- -L.LNL.mN- Lns-Fm NNN NsN-N- Nsms- L.ss LNLL-sm-l LN- - ...sle- Llemm- L-L.s-FN- LLsNN- le l; .Lo-m-ml -LOle- LLNle -.s-E_L ll-Lll- T.-ll.bllL.- l-L_ L - stmelLEEo5 - - -L L L l. L.L.s8- 98... -LL.sle- L- - L Lsmml- ...smN- LL.smm LL.soml .NN -L.s-mm- LsmN ...-smm- LL.sNN -Lle .L.s-ow LL.sN-N -L.so.m- -L.L.som- LNNl- L- l L-oom - Loam - Nooml lesnt - - L - - -L - - - stm-L... LLL-mLsoL- - L NZNNLL-smN-Lst1H - sow- LL.sm-m- L.smN Lsmm- L2.- LsNN- L-L.smN L-LsNN LNLLE. LmN- - Nsz- LL.sFL. L.sON lLLsm-Nl- ..NN- - L L- - - Now-m.-- Homm- 8m LL.s-otL. - - LL - L L L- mlz-L L - L -L- -L-l- L. L. L L L L l-Ll - lL.l l-l gwfil __mLmNFlOf-ll l - LN l-lLL ll :Nmmm __Em>0 llLl - - - -T.- lLIlll ll lll ll-lLll-lLl ll . lllL illlLll - - - L l - L.s-mm Na..- N..NN-L- -- llL- ll L.slmN-lN-le- Ns-le-l- - - -..LNmmw-fwwlg-N- Ls; s..- - - L.s-el -LsN.mlLL.s-.-N LLsN-m-le-N- LiLs-N-mlLLLsm-ml Ls-Nm- L.sLLN- ..l-N - L. -. Lsm-.Lsmlm L-Lse llwsmmlmN-l- - ll - - ls-Nm- .NN-m»- ...me LL.som- l_ Nl L -L--N.m.m.-L-Nsml-N- Nsmml Ls-le WNll LN l. -_ l l-- LLMil-lflmF-lm lmmwl _LlLNwoltNL- llEmml o-lelalmvl. lbs-mt? H-me-Mmlg L l -:l . - - :-- llll ll -:l.- .l -l lll lilllll- lr.- .l L l lll-l l - Ll- - . ll LL.an-4_Lm@oL- -L- - LL. - - -Ll-l l LsmmllLENoNL-ol ll L - - - L . L L L L L A l- lll lLl- lT-.lLlIl l -l l-l l- .lll. l l L l-llLl-l-l lllll.l-lLl l l lL L - L L.s-mm lele LL.smm- -Ll - -L - l L.sle L.s-Flml-Lssml lLl - -L LL.sLNN. -Nlemm LL.sFNl L.sF-m- L-lmN . - L.soml Lst LL.sF-N- ash-ml mN- - - - -MLL.N..-N--mlLL.\L-.m-lm- sle smN- N-lmN llL - - LLsN-mmLLsom-m Ls-N-ml _LLsoml _mN l -L. - - - l! - - LlLs-Nm- L.smm- sew-NNmNLl N. -L - L.ssm L.s-mN-lsmm L-Lse film - - L- - - LmNm-- LNm-F-m-L - mmN l iL.s-o-F_L - - .LL - - lLo-mm-l ON-m llflom-Nl Ls-oNLLm 523. l L -- L - l - l ll-ll, - -...l- - l .l l - l- l l l l lll l l l. l. - lll l l - . - - -L - - lLL-L.\L.N...m- l__mambo-L-L lL- - - L- - - l L.sNMl Lesa-kw”- l l L- l - - -- L - - l- - L .L- - --l l-l l- --l -l-l - -, - - L - Nsmml- LLsmN- LL.sm-N- -.T. --L w - L l - .stle Js-Nm- LL.s.le.- - - .L - - - - - Ls-LF-m- L-Ls-Nm-l L.-L.smm LLs.No- -LmN- Les-mm- Lsmlm- l-Lsm-m L.smml LNN- --L L-Ls-mm -LLNL.le :st Les-Fm- mN- LLsF-m- LL.sovl L-Ls-NN- .LLNmNm lm-Nlll- - - - - -LL.soN-.Ns-owlLLsN-m-1-LsmN- NN - - LsN-m- ..s-mN lem-m - L.smml -L.N.Nl--_l - . Lmle m-Nm- - 8N.- LlL.\-L.- oH-LL L-omml ONm LomN Lso-FLL Sac-.NN; F L l_ llf -l LllllLLllllflll-llll l . l l Ll l l r - .- -L- - -l -.lllF l-ll l.- ,-l - l l-l - w - L . - ‘ -LL.-\L.om-L__em->o -L - - L - - - - lLsNLlelNmew-oLl- -L- l - - - - ..Li .LllL-ll-. -L--l--l- l-lLllL- ----- -L- - w - L-slel LL.sNN- LL-.s-Fm- -L w - . - l lLLNwNm. l l - l LNstl N L.som- LL.som LL.s-om- L.som- LNN- - -. ,L-LsN-m LLNLNmm LsLN-N L.smm- LmN- - L.sN-m- LL-snm L.smN- L.smm L-mN - - - Nsmm- L-L.slmN -LL.st -.L.L.sN-m- m... - - . - - - L.smN- .LsN-m-l LL.s-FN- som- L-NN- , - - L.smN- 3st .-LLNom-N LsF-m- NN - L . . l l . lL T l l l l l - Lem- .LoFm- - Lo-NN - .Lsa-NlNLL L LON-m- .LEm LSN - Ns solNLL. lNL-mLNQNLL- L L L mzL L L L L L lmmN - - .---L -- L L -L --L-- -l-- l L- l - l l l -L l- L- ll- l l- -L- l l_L .l l l L l l. L llllilL L L 3m 3.5 59m. .l.N...>m_m-_...F 225 selection. They are located in the NNB condition for listener W, who was able to answer correctly 77 percent of trials for one combination, and 24 and 25 percent of trials for two other combinations. However, the isolated nature of these successful trials, combined with the lack of consistent performance over the entire NNB condition, support the conclusion that listener W couldn’t reliably identify the head-shift interval. Generally, Experiment 2 leads to the conclusion that listeners are unable to distinguish between a head-shifted stimulus and one with a constant ITD over all frequencies. These findings are consistent with those of Hartmann and Wittenberg (1996) and Kulkarni et al. (1999). Experiment 3: “Wrong” phase shifts Introduction The original goal of the experiment was to see if head-shifted stimuli were indistinguishable from constant ITD. Both Experiments 1 and 2 showed insensitivity to the dispersion introduced by the spherical-head model. This inability to discriminate head-shifted stimuli from constant-ITD could result from the small range of ITDS included in the head-shift profile, resulting in a slight incoherence that falls below discrimination threshold. Alternatively, there may be some “plausibility” of the spherical-head phase shifts. That is, the apparent similarity between constant-ITD and head-shift intervals might be due to listeners’ experience with real-world head shifting of perfectly discrete sources. The presence of a head ensures that a broadband sound cannot be completely coherent. Thus, the auditory system might associate a slightly-incoherent 226 sound (that fits the head-shift profiles of Figure V2) with a coherent image representing the source’s distinct nature. Method To test whether the shape of the head-shifi delays mattered in Experiment 2, or if any similar range of ITDS would also be indistinguishable from a constant-ITD stimulus, Experiment 3 implemented an “inverted” head-shift curve and tested in the broadband case. The stimuli were designed by starting with the values for 45-degree incidence as found in Figure V.2, and simply reversing the trend: where the original delays decrease, the inverted delays increase. Refer to Figure V.11, which is a simplified version of Figure V.2 including only the 45-degree shifts (solid line) and the new “wrong 45” inverted curve (dashed line). This produces a stimulus with the same ITD range as the head-shifts, but in an entirely different (and unnatural) pattern. The same process applied to the 42- and 49-degree stimuli as well, to produce three perceptually-separate “wrong head” images. Aside from changes in the stimuli, the new experiment ran exactly the same as Experiment 2: beginning with calibrated runs to match up constant ITDS to the “wrong head” images. As one might expect, these ITDS were higher than those for the corresponding head-shift patterns. Compare the ITD values in the first row of Table V.4 with those in Table V.2. The same four listeners performed four runs of 99 trials to form the data set. Only the broadband case (20-3000 Hz) was used in this test. See Table V.4 for lateralization results (the format is identical to Table V.2 except that only broadband data were collected, and the expected results are in Table V. 1). 227 Table V.4 - W 5 5 l3 istener T BB % 5 4 73 45 5 4 7 Results and Discussion lateralization data W BB % 53 4 43 4 7 1 4 1 73 istener X, BB % 53 4 5 45 6 4 66 Experiment 3 results are found in Table V.5. Table V.5 - W istener BB % 53 4 53 45 4 61 5 istener T BB % 4 53 45 6 4 54 5 discrimination data istener W BB % 53 4 21 3 45 55 54 4 5 5 44 4 4 istener BB % 53 56 55 4 43 4 1 4 228 Table V.5 follows the same format as Table V.3, due to the similarity with Experiment 2. Again, the overall percent correct shows little deviation from guessing. Only listener W showed performance that was appreciably different from 50 percent (21 percent) at one of the nine conditions. The results of the “wrong head” experiment seem to support the likelihood that, while listeners cannot distinguish between constant-ITD and head-shifted stimuli, it is likely because the distribution of diffiaction-induced phase- shifts is not broad enough to exceed the just-noticeable—difference (jnd) for coherence. Head-shift coherence, and delayed coherence discrimination Introduction Data from the spherical-head experiments made it clear that listeners could not distinguish between combinations of constant-ITD and head-shifted stimuli with equivalent lateral positions. Intrigued, the experimenters searched for an explanation of this result. Kulkarni et al. theorized that listeners established an “overall empirical ITD” for the complex stimulus using delays from the lower frequencies (below 2 kHz). Note that this ITD was set at a delay corresponding to the interaural cross-correlation peak. If both the “overall” ITD and the magnitude spectrum matched up in both intervals, then Kulkarni et al. argue they should be perceptually equivalent. The “Equivalence of position” experiment can establish the effective ITD for head-shifted images, allowing a test of this theory. That test is detailed in the section Analysis of peak cross-correlation lag values, using the results of Coherence calculations below. 229 Another possibility was that the combination of head-related phase shifts did not introduce enough incoherence to distinguish itself from the perfectly-coherent single— delay noise. The shifts were, in fact, limited to a narrow bandwidth, and bands above and below the sloped region (e. g. on the 30-degree plot of Figure V.2, frequencies fiom 400 to 1200 Hz) were perfectly coherent. This line of reasoning is explored in Discussion- Lateral position and the following sections. Coherence calculations To address both of the above explanations, the experimenters made recordings of the various stimuli featured in all three spherical-head experiments. A program using the Tucker-Davis Array Processor used one-second clips of the above recordings to calculate cross-correlations (CCs) over lag ‘C from —2 to +2 ms. Table V.6 indicates the conditions and bandwidths for which these calculations were performed by either a “Y” or a figure number for several plots representative of the various calculations. Table V.6 - Stimulus cross-correlations Condition BB NB NNB 30-degree head shift Y Y Y 42-degree head shift Y Y Y 45-degree head shift Figure V.12 Figure V.13 Figure V. 14 49-degree head shift Y Y Y 42-degree wrong head Y N N 45-degree wrong head Figure V. 1 8 N N 49-degree wrong head Y N N 520-ps constant ITD Figure V.15 Figure V.16 Figure V.17 Cross-correlations performed of experimental stimuli 230 The figures show cross-correlation value versus lag. One interesting feature of note: comparing the 45-degree BB Figure V.12 with the “wrong-head” 45-degree Figure V.18, the two curves appear similar, but reversed along the lag axis. This is due to the nature of the “wrong-head” stimulus, which uses an inverted 45—degree ITD curve. The relevant data are listed in the title at the top of each figure; see Figure V.12 for an example. The title shows the peak CC value that defines the stimulus coherence (maxCC:0.963) as well as the interaural lag of that peak (usz420). A summary of data from all CC plots is in Table V.7. The left column lists the stimulus condition, the middle contains the maximum cross-correlation obtained for that condition, and the right column indicates the interaural time lag where that maximum was located: Table V.7 - CC peak ma nitudes and lags Stimulus Max CC Lag 10.18) 30-degree head shift, BB 0.982 300 30-degree head shift, NB 0.988 340 30-degree head shift, NNB 0.998 340 42-degree head shift, BB 0.971 400 42-degree head shift, NB 0.971 420 42-degree head shift, NNB 0.999 500 45-degree head shift, BB 0.963 420 45-degree head shift, NB 0.972 460 45-degree head shift, NNB 0.999 520 49-degree head shift, BB 0.961 460 49-degree head shift, NB 0.975 500 49-degree head shift, NNB 0.999 560 520-us constant ITD, BB 1.000 520 520-us constant ITD, NB 1.000 520 520-98 constant ITD, NNB 1.000 520 “Wrong” 42-degree head shift 0.968 620 “Wrong” 45-degree head shift 0.964 660 “Wrong” 49-degree head shift 0.960 700 231 Analysis of peak cross-correlation lag values First analysis of the data in Table V.7 must ensure that the lags are reasonable. Since the head-shifted stimuli contain mostly delays between the low- and high- frequency limits defined in Equations 3 and 4, one expects an effective ITD within that range. Assuming 8.75 cm for head-radius a, and 343 m/s for speed of sound c, calculations found: Table V.8 - ITD limits by stimulus Stimulus Low-f limit High-f limit 30-degree head shift, BB 382 255 42-degree head shift, BB 512 341 45-degree head shift, BB 541 361 49-degree head shift, BB 578 385 520-us constant ITD, BB 520 520 “Wrong” 42-degree head shift 512 635 “Wrong” 45-degree head shift 541 671 “Wrong” 49-degree head shift 578 710 The peak cross-correlation lag for each stimulus in Table V.7 lies between delays for the low- and high-frequency limits in Table V.8. Within the head-shift and “wrong- head” groups, the peaks occurred in approximately the same relative position between limits. In addition, the bandwidths selected for NB and NNB cases did tend to favor higher ITDs, which explains the peak’s migration along the lag axis as each head-shift stimulus was filtered into narrower bands. Thus, these lags corresponding to peak cross- correlations appear to meet expectations. Discussion-Lateral position The cross-correlation data were used to check the experimental results against the “overall empirical ITD” concept of Kulkarni et al. Table V9 is a comparison of the 232 “overall” ITD predicted by the cross-correlation peak lag, and the calibrated ITDs (averaged over listeners) for constant-ITD stimuli established in the “Equivalence of position” experiment. Table V.9 - Lag/ITD commrison Stimulus Peak CC Calibrated 42-degree head shift, BB 400 484 42-degree head shift, NB 420 479 42-degree head shift, NNB 500 495 45—degree head shift, BB 420 515 45-degree head shift, NB 460 514 45-degree head shift, NNB 520 529 49-degree head shift, BB 460 546 49-degree head shift, NB 500 545 49-degree head shift, NNB 560 563 “Wrong” 42-degree head shift 620 523 “Wrong” 45-degree head shift 660 558 “Wrong” 49-degree head shift 700 593 Let the 45-degree BB stimulus be an example. Taking the lag from Table V.7, the “overall empirical ITD” concept predicts a 420-us delay for the listener’s internal image. However, the corresponding constant-ITD stimulus established in Experiment 2, calibrated to match position with the 45-degree broadband signal, had a average delay of 515 us. This difference in delay certainly exceeds the ITD just-noticeable-difference of around 20 us at a 500 us delay (Colbum and Durlach, 1978). The “Max CC” lags and calibrated delays do match up more closely as the bandwidth decreases, which may be related to the narrowband stimuli’s higher coherences. The 45-degree NNB stimulus exhibits peak CC at 520 us, while the matching constant-ITD was set to 529 us. In the BB and NB conditions, the constant ITDs were higher than the corresponding peak CC lags, tending to approach the low-frequency limits of Table V.8. This may indicate that 233 listeners were giving more weight to ITDs at the lower frequencies, which is expected in light of results reported in Chapters 11 and 111. If one was to interpret the data of Table V.9 according to “overall empirical ITD,” assuming that the lateral position of the stimulus corresponded to the cross-correlation peak, the listeners should still have been able to distinguish head-shifted from constant- ITD intervals using lateral position, at least in the broadband case. The experimenters have some confidence that the lateral cue was indeed eliminated, due to the extensive calibration. Therefore, results do not appear to agree with the concept of “overall ITD” as proposed by Kulkarni et al. Discussion-Correlation It is left to examine the cross-correlation data for cues on the basis of coherence. First, the result that NNB correlations were highest among head-shifted stimuli seemed surprising, since the NNB condition was designed to have the highest dispersion of ITDs across frequency and thus expected to have the lowest cross-correlation peak. On further reflection, this result is predictable because in the extreme NNB limit (a band so small that only a sine tone is left) the maxCC must go to one. Also, Trahiotis et al. (2001) showed that for a broadband distribution of ITDS, passing the stimulus through narrowband filter led to better ITD discrimination, which was likely due to an increase in correlation. Each trial, the auditory system made a coherence comparison between two stimuli. The constant-ITD stimulus was always 100% coherent (maxCC = 1). On the other hand, the various head-shift stimuli all had coherences less than one, allowing for a 234 possible cue. Granted, the narrow-narrowband cases showed peaks that should be indistinguishable from one (coherences 2 0.998). But previous experiments (see Chapter II) have shown that listeners are extremely sensitive to decorrelation. The pool of listeners exhibited just-noticeable-differences (jnds) as low as one percent, especially with respect to a coherence of one and a broadband stimulus. Gabriel and Colbum (1981) also discovered jnds of such magnitude. Thus, it would seem clear that at least the “wrong head” stimuli should be perceptually different from a constant ITD, based on decorrelations approaching four percent. There was no immediate reason to believe that BB and NB head-shift cases couldn’t be distinguished as well. According to current understanding of coherence sensitivity, listeners should have succeeded in discriminating head-shift intervals from constant ITD. It is possible that listeners’ coherence jnds were actually larger due to the imposed ITD in the vicinity of 500 us—leading to a test of off- center coherence sensitivity with a final experiment. Experiment 4: Off-axis coherence discrimination Introduction and Method While coherence discrimination is an easy task for images centered in the head, the auditory system loses resolution when the image’s interaural delay puts it off-center (Pollack and Trittipoe, 1959a&b). This concept is lent credence by the modeled high density of cross-correlation detectors at small delays, which becomes progressively sparser as the delay increases (Stern and Trahiotis, 1995). The “detector” neurons are crucial to ITD resolution, and thus, coherence discrimination. Indeed, Colbum and 235 Durlach (1978) state that once a coherent noise is delayed more than one ms (about the maximum delay one could expect for a real-world sound incident on the head), it “begins to lose its compactness”. In the region where lag r is between 10 and 30 ms, the coherent noise is indistinguishable from an incoherent noise. Experiment 4 repeated the study of Chapter II, testing decorrelation jnds for a broadband stimulus. The only change was the addition of a delay line to the left channel, so listeners found the intracranial image offset to the right. Three listeners (W and Z, who are previously mentioned and were subjects in Chapter II, and listener A, a 20-year- old male with normal hearing) participated in three experimental runs at each of four delays: 0, 500, 1000, and 4000 us. The SOO-us condition is particularly interesting, because it applies to Experiments 2 and 3 wherein most ITDS were set near 500. Results and Discussion The results of these runs are found in Figure V. 19, which shows that thresholds for listeners W and Z in zero-delay case (equivalent in every way to the BB case of the Chapter II task) matched well with their prior results. Then, as the interaural delay was increased, all listeners required a greater decorrelation to succeed at the task. The results lend some support to the explanation that decorrelation of an off-center image is more difficult to detect. They also make sense in light of the described “loss of compactness” for longer delays, and the well-known fact that coherence jnd’s increase rapidly as the reference correlation is reduced from one (Pollack and Trittipoe, 1959b). As initially postulated, coherence jnds are degraded at 500 us. However, they show that listeners could still distinguish a coherence of 096-098 from 1. The lowest coherence among all 236 the head-shift stimuli is 0.96, for the “wrong-head” 49-degree case. The results do not preclude our listeners from making a coherence discrimination, based on the sensitivity data of Experiment 4. The question remains: why did listeners fail to discriminate head- shift stimuli? The Nature of Incoherence The most likely explanation for the listeners’ inability to distinguish head-related dispersions from no dispersion (constant ITD) is the unusual makeup of the head-shift stimuli. Compare them to the slightly-incoherent signals used for the Chapter II experiment, in which listeners could identify small decorrelations. Those signals began with coherent noise, then had statistically independent noise of identical bandwidth added to one ear, reducing the overall coherence throughout the entire signal spectrum. By contrast, the broadband head-shift spectra were mostly coherent outside the “transition regions” of 400-1200 Hz for the 30-degree stimulus, and 400-1800 Hz for the 45-degree stimulus. Bands above and below these regions were set with relatively constant interaural delays. Knowing that models of the auditory system separate the signal into critical bands for analysis (Hartmann, 1997), all head-shift bands outside the transition regions would exhibit correlation peaks near one. Of course, the NB and NNB conditions do not contain regions of constant-ITD. But the character of ITDs that are a smooth function of frequency (i.e. are not discontinuous and not random) is to produce very little decorrelation, as demonstrated by the NNB cases that had consistently high coherences. 237 The cross-correlations for individual critical bands likely resemble that for the NNB case. Thus, while the overall coherence for BB stimuli is low, each critical band probably has a coherence nearer one, which would preclude discrimination on that basis. Summary and Conclusion After presenting listeners with a range of theoretical head-shift stimuli (covering four different angles, three bandwidths, and one inversion), this study concludes that listeners cannot distinguish a signal with head-related dispersion from one that is perfectly coherent. They were insensitive to the decorrelation introduced by the presence of a spherical head. It was possible for listeners to discriminate based on lateral position, as Kulkarni et al. stated. Surprisingly, experiments found that the head-shift stimuli had coherences sufficiently different from one that they exceeded the known coherence jnds. The experimenters explored reasons for the discovered insensitivity to incoherence, hypothesizing that coherence discrimination may be more difficult when the image is off-center. This hypothesis was proved correct, specifically at a delay of 500 us, which is representative of the lTDs involved in our experiments. However, the coherence jnds at that delay were still too small to explain the head-shift experiment results. A final postulation was that the problem dwelled within coherence measurement itself. The broadband coherences of head-shift stimuli were low, but cross-correlations within each critical band were much closer to one. Smoothly-varying ITDS do not produce as much decorrelation as the addition of statistically-independent noise, as 238 demonstrated by the NNB condition coherence values. Thus, it is clear that head-related dispersion does not introduce enough incoherence for listeners to distinguish it from a constant ITD. The analysis shows that listeners cannot identify head-related dispersion without the use of lateral position cues, simply because the change in coherence is too small for them to detect. 239 Appendix A While the results of Experiment 4 did not reveal why listeners were unable to discriminate head-shift stimuli from constant ITD, they did raise a concern of their own. The experimenters found it unintuitive that listeners could apparently judge decorrelation even at extreme ITDs. As the ITD increases, there should be a delay at which the image is perceived as incoherent, thus making the task impossible. Listeners A and Z continued their runs to higher delays—Z could still reliably discriminate coherence at 12-ms delays, while A did so up to 8 ms. While these upper limits show that the discrimination task can be disrupted, they appear well beyond the expected boundary. Interviews with the listeners revealed that as the delay increased, they began to detect a pitch embedded in the noise. As the decorrelation cue gradually failed them, they used the pitch as a cue to complete the task. This phenomenon may be caused by dichotic repetition pitch, or DRP. According to Bilsen (1995), the DRP is detectable for correlated white noise with an ITD between 3 and 20 ms, which happens to be in the range of delays plotted in Figure V.18. Bilsen reports that many (but not all) listeners are able to identify a pitch in the noise, with a period determined by the delay time. DRP could explain the listeners’ ability to perform the coherence discrimination task at high delays due to the presence or lack of a pitch, representing correlated and decorrelated stimuli, respectively. 240 References Bilsen, F .A. “What do dichotic pitch phenomena tell us about binaural hearing?” “Advances in Hearing Research,” Proceedings of the 10th International Symposium on Hearing, ed. G.A. Manley et al. (1995). Brungart, D.S., Durlach, N.I., Rabinowitz, W.M. “Auditory localization of nearby sources 11: Localization of a broadband source,” J. Acoust. Soc. Am. 106, 1956-1968 (1999) Durlach, N.1. and Colbum, H.S. “Binaural Phenomena,” Handbook of Perception vol. 4. ed. E. Carterette, pp. 365-466 (1978). Gabriel, K.J. and Colbum, H.S. “Interaural correlation discrimination: I. Bandwidth and level dependence,” J. Acoust. Soc. Am. 69, 1394-1401 (1981). Hartmann, W.M. “Auditory Filters,” Signals, Sound and Sensation, AIP Press (1997). Hartmann, W.M. and Wittenberg, A. “On the extemalization of sound images,” J. Acoust. Soc. Am. 99, 3678-3688 (1996). Kistler, DJ. and Wightman, F .L. “A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction,” J. Acoust. Soc. Am. 91, 1637-1647 (1992). Kuhn, G.F. “Model for the interaural time differences in the azimuthal plane,” J. Acoust. Soc. Am. 62, 157-167 (1977). Kuhn, G.F. “Physical Acoustics and Measurements Pertaining to Directional Hearing,” Directional Hearimg, ed. W.A. Yost and G. Gourevitch, pp. 3-25 (1987). Kulkarni, A., Isabelle, SK. and Colburn, H.S. “Sensitivity of human subjects to head- related transfer-function phase spectra,” J. Acoust. Soc. Am. 105, 2821-2840 (1999). Pollack, I. and Trittipoe, W.J. “Binaural listening and interaural noise cross correlation,” J. Acoust. Soc. Am. 31, 1250-1252 (1959). (a) Pollack, I. and Trittipoe, W.J. “Interaural noise correlations: Examination of variables,” J. Acoust. Soc. Am. 31, 1616-1618 (1959). (b) Stern, RM. and Trahiotis, C. “Models of Binaural Interaction,” Hearing ed. B. Moore, pp. 347-386 (1995). Tobias, J .V. and Schuber, E.R. “Effective onset duration of auditory stimuli,” J. Acoust. Soc. Am. 31, 1595-1605 (1959). 241 Trahiotis, C., Bernstein, LR. and Akeroyd, M.A. “Manipulating the straightness and curvature of patterns of interaural cross correlations affects listeners’ sensitivity to changes in interaural delay,” J. Acoust. Soc. Am. 109, 321-330 (2001). Wightman, F .L. and Kistler, D.J. “Headphone simulation of free-field listening,” J. Acoust. Soc. Am. 85, 858-867 (1986). Wightman, FL. and Kistler, DJ. “The dominant role of low-frequency interaural time differences in sound localization,” J. Acoust. Soc. Am. 91, 1648-1661 (1992). 242 Figure V. 1. Depiction of a human head and an incoming sound wave. The wave is assumed to be planar and incident at azimuthal angle 0,", from the forward direction. The “directly irradiated ear” receives the sound without interference, while the “shadowed ear” is separated from the sound by the head. The radius of the head is labeled “a”. (Kuhn, 1987) 243 l! l 4 u >¢<3 wz4hum¢_o C\ U2— w > «(nzaom 30953 Figure V.l 244 Figure V.2. Head-related dispersion according to Kuhn’s spherical head model. This graph plots the ITD (y-axis) as a function of log frequency for sound incident on a spherical head from various angles. The dashed line indicates ITDs that equal one-half period for a particular frequency. The open circles on the left are the resultant ITDs from a low-frequency approximation. The closed squares on the right are a high-frequency approximation which is simply two-thirds the ITD values of the open circles. The open triangles further right are the predictions of the Woodworth formula. 245 I I I I I [V.. '.U' I... "1' '.I' 'III "I' I... x to I . I I I I , l II .1 x I cu ,. l I ,,-’ j ' l ,. ’ / t x .2" d A ’ N - .1; >‘ o c (D 3 0' a) L LL. 0 .......... O J on d: on 0') 0': U) a) co <11 a) co co '0 '0 “o '0 '0 13 O O LP 0 C CD CD LO v z «'2 cu -:- ‘0... Iqll liq. Illliqlll IOI‘I IAQIjAUI O o o o o o o o 0 ON 0 o o o o o o o 00 I\ (.0 to <1- m N -—~ (571) 83U9J8J._.“p aw” |DJnDJaIUI FigureV.2 246 Figure V.3. Results of Experiment 1: Discrimination at 30 degrees. This graph plots the percentage of trials in which listener T correctly identified the head-shifted stimulus versus the ITD of the perfectly-coherent interval (ten values from 260 to 440 us). The three curves represent results from broadband (BB), narrowband (NB), and narrow- narrowband (NN B) cases. The dashed horizontal line at 50 percent correct indicates guessing level. The thin horizontal line marks the range of ITDs included in the BB and NB head-shifted stimuli, while the thick line indicates those ITDs included in the NNB head-shifted stimulus. The dashed vertical line marks the low-frequency limit ITD for a sound incident at 30 degrees. Error bars on data points represent two standard errors. 247 “m3 at _oiflg :8 9% omv 8- 8m 8m Sm 8m 8m cam 08 [Ill l'1ll . _ . _ . _ 4 _ 4 _ 7 _ . _ . _ . _ . _ I I m wmcoc .. I. _UIIIIU m L22 I .l 0 ...... 0m JI war—UL m2\mm l 3 Z._. .m: _ _ _ l I l I 0.0 0.3 odm 0.0m odv 0.0m 0.0m odm odw 0.0m 0.03 IDSJJOD IUGDJBCI Figure V.3 248 Figure V.4. Results of Experiment 1: Discrimination at 30 degrees. This graph plots the percentage of trials in which listener W correctly identified the head-shifted stimulus versus the ITD of the perfectly-coherent interval (ten values from 260 to 440 us). The three curves represent results from broadband (BB), narrowband (NB), and narrow- narrowband (NN B). The dashed horizontal line at 50 percent correct indicates guessing level. The thin horizontal line marks the range of ITDs included in the BB and NB head- shifted stimuli, while the thick line indicates those ITDs included in the NNB head- shifted stimulus. The dashed vertical line marks the low-frequency limit ITD for a sound incident at 30 degrees. Error bars on data points represent two standard errors. 249 $3 0.: 6222 £8 91. omv 84 8m 8m 9% 0mm 8m 8m 8m _d_._._._d_._.41_._ 86.. szm mmcoc mZZ 0.0 0.00 0.0m 0.00 0.0V 0.00 0.00 0.0N 0.00 0.00 0.000 roeaaoo tuaoaed Figure V.4 250 Figure V.5. Results of Experiment 1: Discrimination at 30 degrees. This graph plots the percentage of trials in which listener X correctly identified the head-shifted stimulus versus the ITD of the perfectly-coherent interval (ten values from 260 to 440 us). The three curves represent results from broadband (BB), narrowband (NB), and narrow- narrowband (NN B). The dashed horizontal line at 50 percent correct indicates guessing level. The thin horizontal line marks the range of ITDs included in the BB and NB head- shifted stimuli, while the thick line indicates those ITDS included in the NNB head- shifted stimulus. The dashed vertical line marks the low-frequency limit ITD for a sound incident at 30 degrees. Error bars on data points represent two standard errors. 25] $3 a: _oiflg £8 9% omv 2:. 8m 08 gm 8m 8m SN 08 T...-_J_._._4_._.d.1 coco; mzz 88.. m2\mm ---I-«--L|_ ILI 1 I. I III I 1 [Ll 0.0 0.00 0.0m 0.0m 0.0V 0.0m 0.00 0.0m 0.00 0.00 0.000 toaaooo tuaooad Figure V.5 252 Figure V.6. Results of Experiment 1: Discrimination at 30 degrees. This graph plots the percentage of trials in which listener Z correctly identified the head-shifted stimulus versus the ITD of the perfectly-coherent interval (ten values from 260 to 440 us). The three curves represent results from broadband (BB), narrowband (NB), and narrow- narrowband (NN B). The dashed horizontal line at 50 percent correct indicates guessing level. The thin horizontal line marks the range of ITDS included in the BB and NB head- shifted stimuli, while the thick line indicates those ITDs included in the NNB head- shifted stimulus. The dashed vertical line marks the low-frequency limit ITD for a sound incident at 30 degrees. Error bars on data points represent two standard errors. 253 33 a: .2..ng £8 9% omv 8*, 8m 08 oz.” 8m 8m 8m 8m J_._._._4_._._._._._ n_ _ ace mzsm m _ _ mEE mzz - l ILJ 0A0 HHO_ “HON naom AHov “H00 “800 “How “H00 “H00 “Hoofi toeauoo tuaoaad Figure V.6 254 Figure V.7. Results of Experiment 1: Discrimination at 45 degrees. This graph plots the percentage of trials in which listener T correctly identified the head-shifted stimulus versus the ITD of the perfectly-coherent interval (ten values from 400 to 580 us). The three curves represent results from broadband (BB), narrowband (NB), and narrow- narrowband (NN B). The dashed horizontal line at 50 percent correct indicates guessing level. The thin horizontal line marks the range of ITDS included in the BB and NB head- shifted stimuli, while the thick line indicates those ITDs included in the NNB head- shifted stimulus. The dashed vertical line marks the low-frequency limit ITD for a sound incident at 45 degrees. Error bars on data points represent two standard errors. 255 33 a: .282 :8 8m 8m 8m 8m 8m 8- 8... 9% omv 8... I'Tll I _ .—._._._._afiJ_._4q._ I mm P n-Ilum_ZF o ...... omZZ ._u mmcoc mZ\mm wmcoc mzz Ill 0A0 0a: “How “H00 “How fi%0m “H00 “How “€00 “H00 “H000 toauaoo tuaoqad Figure V.7 256 Figure V.8. Results of Experiment 1: Discrimination at 45 degrees. This graph plots the percentage of trials in which listener W correctly identified the head-shifted stimulus versus the ITD of the perfectly-coherent interval (ten values from 400 to 580 us). The three curves represent results from broadband (BB), narrowband (NB), and narrow- narrowband (NN B). The dashed horizontal line at 50 percent correct indicates guessing level. The thin horizontal line marks the range of ITDs included in the BB and NB head- shifted stimuli, while the thick line indicates those ITDs included in the NNB head- shifted stimulus. The dashed vertical line marks the low-frequency limit ITD for a sound incident at 45 degrees. Error bars on data points represent two standard errors. 257 as a: _oiflg :8 00m 000 0vm 0mm com 00v 00v 0: omv 00¢ dJ_._._._._._._. IllI mm 3 D.||D m2 3 if: c ...... omZZ 3 _ mmcoc mZ\mm _ mmco mzz / ...... /-- --.-----------.----.-- / I / / / I I ll/ I ILlll 0.0 0.00 0.0m 0.0m 0.0V 0.0m 0.00 0.0m 0.00 0.00 0.00“ 4394403 rueoaad Figure V.8 258 Figure V.9. Results of Experiment 1: Discrimination at 45 degrees. This graph plots the percentage of trials in which listener X correctly identified the head-shifted stimulus versus the ITD of the perfectly-coherent interval (ten values from 400 to 580 us). The three curves represent results from broadband (BB), narrowband (NB), and narrow- nan'owband (NN B). The dashed horizontal line at 50 percent correct indicates guessing level. The thin horizontal line marks the range of ITDs included in the BB and NB head- shifted stimuli, while the thick line indicates those ITDs included in the NNB head- shifted stimulus. The dashed vertical line marks the low-frequency limit ITD for a sound incident at 45 degrees. Error bars on data points represent two standard errors. 259 $3 a: 629.2 :8 8m 8m 9% 8m 8m 84 84 9% 84. 84 '1“! l ___._4_._._._._.A._._ lllI mm x . n-Iln m2 x t O ...... omzz x I. mmcoc m2\mm wmcoc mZZ I/ l l \ I / I + % l x / 0A0 “H00 “How “H00 “Wow “How ”How “How “H00 “H00 “Hoofi 4394403 lua34ad Fumevs 260 Figure V.10. Results of Experiment 1: Discrimination at 45 degrees. This graph plots the percentage of trials in which listener Z correctly identified the head-shifted stimulus versus the ITD of the perfectly-coherent interval (ten values from 400 to 580 us). The three curves represent results from broadband (BB), narrowband (NB), and narrow- narrowband (NN B). The dashed horizontal line at 50 percent correct indicates guessing level. The thin horizontal line marks the range of ITDs included in the BB and NB head- shifted stimuli, while the thick line indicates those ITDs included in the NNB head- shifted stimulus. The dashed vertical line marks the low-frequency limit ITD for a sound incident at 45 degrees. Error bars on data points represent two standard errors. 261 93 a: 9:22 :8 8m 08 gm 0% 8m 2:. 84 9;. 84. 84. l I lerll Figure V.10 ....1._._.4.4._..._._. n.0n0 aflofi “How nfiom n%0v “How “H00 nwow “H00 “flow ”Hoofi 4394403 4U9349d 262 Figure V.11. Head-related dispersion according to Kuhn’s spherical head model, and the inverse. This graph plots the ITD (y-axis) as a function of log frequency for sound incident on a spherical head from a 45-degree angle (solid line) and also includes frequency-dependent ITDs that have an opposite shift (“wrong 45” dashed line). The open circle on the left is the resultant ITD from a low-frequency approximation. The closed square on the right is a high-frequency approximation which is simply two-thirds the ITD value of the open circles. The Open triangle further right is the prediction of the Woodworth formula. 263 <1 I I'II x 'U'UI'II'I r‘II'IIII TII' I'l' '1r m l- , . m a V ‘0 1 0’ Ln - 'C V .X I0 (\I \L ‘3 \ \ ‘ x I \ ‘fi D \ I _ \ \ . A I . N ‘ I - \ ‘ V I ’ I ‘ > I U . I . C 4 (D , 9 . ' . a) ' L ' LI. I I I . 8 . I .... I I I . | . b I I b I lllljllll llll llllillll III. III. All 0 (\l 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I\ 0 to <1- 0 (\l -—- ($71) 93u949444p 9wg4 404n0494u1 800 Figure V.11 264 Figure V.12. Cross-correlation of broadband model head-dispersed noise at a 45-degree angle. The cross-correlation of the experimental 45-degree BB stimulus is plotted versus internal delay Tau. The values “maxCC” and “us” at top indicate the peak value of the correlation function (i.e. coherence) and the location of that peak along the delay axis, respectively. 265 A03 :8. 000m 0000 0000 00m 0 00ml 000T 009i 000ml [quid—..-u—d-u-_..dd—uudq—u-Jd—ufiu-A-fi-d ‘>>>>>> <<<<<< Inn-n—pn—p—n..b—pLL-—Lb_-—-Pr—-r-p—_-- 899398.900on .o.:._\mm3 96.39.82 00.0! mndl 00.0! mmdl 00.0 mmd 0m.0 mud I 00.0 u0440494403 93043 Figure V.12 266 Figure V. 13. Cross-correlation of narrowband model head-dispersed noise at a 45-degree angle. The cross-correlation of the experimental 45-degree NB stimulus is plotted versus internal delay Tau. The values “maxCC” and “us” at top indicate the peak value of the correlation function (i.e. coherence) and the location of that peak along the delay axis, respectively. 267 000m 03 30._. 009 0000 00m 0 00ml 000T 00mfil 000ml [- unq—uuni—uquu—qdd-_uu-J_-qd-—-uu—nu.- ) g > > I IULLL_...___..T_...._...F_....—_.-._....l 839303.000on .o.:._\mm3 98.99.82 00.0I m~.0l 00.0! mmdl 00.0 mmd 00.0 mmd 00.0 00440494403 39043 Figure V.13 268 Figure V.14. Cross-correlation of narrow-narrowband model head-dispersed noise at a 45-degree angle. The cross-correlation of the experimental 45-degree NNB stimulus is plotted versus internal delay Tau. The values “maxCC” and “us” at top indicate the peak value of the correlation function (i.e. coherence) and the location of that peak along the delay axis, respectively. 269 A03 :2. 000m 009 0000 00m 0 00ml 000T 00mfil 000ml l-u-q—uJ-—--—-.u—dudu—uuqu—uq-udqu- T d J [pk-n—p-nn—pnpp --—--PFF-—_--—-h- l - Aommumjdmmdnooon .0.~J\mm_uv _ucmdcmvvom; 00.7. mndl 0m.0l mmdl 00.0 mN.0 0m.0 mud 00.0 00440494403 39043 Figure V.14 270 Figure V.15. Cross-correlation of broadband coherent noise at 520 us ITD. The cross- correlation of the experimental 520 Its-ITD BB coherent stimulus is plotted versus internal delay Tau. The values “maxCC” and “us” at top indicate the peak value of the correlation function (i.e. coherence) and the location of that peak along the delay axis, respectively. 271 00.0! mmdl 0m.0l mwdl 00.0 mmd 0m.0 mud A03 sch 000m 009 000 0 00m 0 00ml 000 0 I 000 H I 000ml __II- . q u — q q q u d 4 J u u — q u d d 4 JJ - u — - - - d A u - q u — - d d Jl I I < < < < I .4 .I. L I- p p - — - n n b — p n n n n p - - _ n b b p — — b n n —L n - n — b n n ...I... Aommumzdoofiooon .0.fit_\mm3 0:0.000Nm0t 00.0 u0440|94403 3304:) Figure V.15 272 Figure V.16. Cross-correlation of narrowband coherent noise at 520 us ITD. The cross- correlation of the experimental 520 Its-ITD NB coherent stimulus is plotted versus internal delay Tau. The values “maxCC” and “us” at top indicate the peak value of the correlation function (i.e. coherence) and the location of that peak along the delay axis, respectively. 273 00.0I mmdl 0m.0l mmdl 00.0 0N0 0m.0 mud A03 :8. 000m 00m“ 000 0 000 0 00ml 000 0 I oom 0 I 000ml .Il- : u - fl - u q u — J u d u — q 4 - u — u q - - — q d - u — - d 1 u — q - - -l I - 200 0 com 000 IJdJ-J 00.0 00.0 00.0 0.0 p|0L4394L4_4_ 93U949I403 Figure V.19 280 GENERAL APPENDICES 281 GENERAL APPENDIX A The Staircase Method Unlike the method of constant stimuli, where listeners make several comparisons between two unchanging signals, the adaptive staircase method of Levitt (1977) provides an adapted stimulus. This method allows experimenters to make meaningful comparisons of various conditions with fewer data sets. The adaptive strategy is quite simple. Following a two-interval forced-choice trial (e. g. a level comparison where the listener is asked to identify the louder interval), the experimental program checks whether the response was correct. If not, the program will alter one aspect of the stimulus, the “staircase variable”, to make the task easier (increasing the level difference between intervals). This change is called moving “up the staircase”. If correct, the program does not change the staircase variable unless it was the third correct response in a row. Then, the program makes the task more difficult (decreasing the level difference), termed moving “down the staircase”. The staircase variable and the amount it changes (the “step size”) depend on the nature of the experiment. This method of stimulus variation tends to target a difficulty level for which the listener is equally likely to move up or down the staircase. Thus we can find the probability of one correct answer at this level: 282 P(up) = P(down) =% P(down) = P(correct)3 l 1 3 P(correct) = (2] = 0.7937 In this manner, the program identifies the listener’s 79.4% correct level for a particular stimulus condition. Making an effective measurement of that level requires a reasonable understanding of the staircase behavior. For instance, the listener may require some time before settling into a region of consistent performance. Even then, the staircase may vary widely. To accurately estimate the level of the staircase variable leading to 79% correct performance, the program keeps track of “turnarounds”, or values of the staircase variable where the listener changes staircase direction (from moving up to moving down, and vice versa). At each turnaround, the value of the staircase variable (in our example, intensity) is recorded, and the run continues until an experimenter-set quantity of turnarounds is completed. This requirement means that runs have no preset number of trials. The first few turnarounds are discarded to allow for early variability, and the average of the remainder establishes the 79% correct level at that condition. The (n-l) standard deviation of those turnaround values measures uncertainty in the result. By varying the stimulus during the run, the staircase method allows experimenters to effectively test multiple conditions at once. The method of constant stimuli would require runs at each value of the staircase variable to estimate the listener’s 79% correct level. Even then, the estimation would likely be less accurate than the average taken in a staircase run, which is directly comparable with results from other staircase runs featuring different experimental conditions. 283 GENERAL APPENDIX B Signal Detection Theory The practice of psychoacoustics has been greatly influenced by signal detection theory (TSD). The generality of TSD’s approach, combined with a foundation in probability statistics, provides a necessary framework for understanding the implications of how the auditory system responds to stimuli. TSD also allows researchers to compare the results of various experimental tasks on common mathematical ground. TSD is based on signal-to-noise ratios, where the noise can either be added by the experimenter, or inherent to the system. To apply TSD to psychoacoustics, we refer to the work of Green and Swets (1966). They define a decision variable d ' as a signal-to- noise ratio: the magnitude of the auditory system’s response to a signal divided by the amount of standard deviation introduced by the response to included noise. Both responses are measured along some internal coordinate that is a quantitative representation of the task at hand. See the top half of Figure A. 1 , borrowed from Hartmann (1997). This graph shows the response of the auditory system in a two-interval forced-choice (21F C) task, where the listener is asked to identify the interval that includes the signal. The decision coordinate is r, which could represent interaural coherence (as in an MLD experiment) or a more basic quantity like a coincidence counter output. The leftmost probability density curve (fN) represents how the auditory system is stimulated by the noise-only interval, with a peak position we define as zero and a standard deviation (due to external or internal noise) of 0. The rightmost probability density curve 284 (ng) represents the signal+noise interval, which has shifted the peak along r by an amount 41. We assume that the introduction of the signal has no effect on the size of the uncertainty, so the standard deviation is unchanged. The listener will choose the interval with a greater r-value, which is determined by samples from each probability function. If Ar = rgN — m is greater than zero, then the listener will select the correct interval. The probability of this occurrence is determined by the difference (essentially a convolution) of the two probability density curves, establishing athird (fD = ng — fN). The new curve is shown in the bottom half of Figure A], also from Hartmann (1997). The peak probability for Ar is located at 41, which is expected because the most likely values for rSN and m are 41 and zero, respectively. Standard deviation 09 is determined by the nature of convolved distributions, such that 09 = N/(o2 + 02) = 0V2 (Hartmann, 1997). The area of the shaded region represents the probability of a positive Ar, which would lead to a correct response. The d-prime for such a task is ,u d'=— 075 To understand how SDT relates to experiments in the lab, we can associate d- prime with a corresponding percentage of correct judgments (PC). For the above example, the PC equals the cumulative normal (the shaded area) for all Ar > 0. If 44 increases (raising d-prime), clearly the area in the positive region will increase, leading to a higher PC. Conversely, if the standard deviation increases (lowering d-prime), then more of the curve will extend below zero, decreasing PC. For example, the staircase method described in these appendices targets a PC of 79.4%, corresponding to a d-prime of 1.16. When considering the 21F C task shown in Figure A.1, a d-prime of one leads to 285 a PC of 76%. However, in a task where the listener is given three alternatives (31F C), that same d-prime yields a PC of 63%. Thus, if one were interested in comparing detection results from 2- and 3IFC tasks, one would have to consider those conditions where performance was 76 and 63 percent correct respectively. This method extends to other tasks as well, giving the experimenter a powerful tool in data analysis and evaluation. 286 References Green, D.M. and Swets, J .A. Signal Detection Theory and Psychophysics, J. Wiley, New York. Hartmann, W.M. Signals, Sound and Sensation. AIP press, Woodbury, New York, 1997. p. 542 Levitt, H. "Transformed up—down methods in psychoacoustics," J. Acoust. Soc. Am. 49, 467—477 (1971). 287 Figure A]. Graphical representation of signal detection theory (TSD). (Hartmann, 1997) 288 fN fSN Figure A.l 289