. ‘ F: .uw . 93». t . h: .375 ... 1Y3“... .314 a. I: ‘ . .i .. kfiiflmfin . ‘ . ‘tal: u.» t 1 I4 gnaw}. r . ,nafliumwf 1., usury.” ‘ :8! a... u - .- ...ha«na . , A 4m. 1.39%. , . gfii 1..."...hxwfl ‘ xv _ ‘2' , , d2...» flfiambw , . _ u‘vuuvdm:3§m :1 s .4ng . n q I 3 fiskfihlnn-flc (”amt-mans A add.- 5 ‘ I. . ’31 :1]. mi...» "1%... ., Exofihit . 3‘ . Lia-affiefiafiauuufiam .. -4333. V ‘ hflpdufinm... ..5..5:n§, n .fih Jurrii... . , «2...th a... 2.3.7 : , is . ¢§wkv . . ‘ £71131]! fir lei-trivauu A $.32?! ....4.3..3..1. . 11 ran 6.. 3, n1 $141.3 E. 3.. . I: , LIBRARY Michigan §tate Univemty This is to certify that the dissertation entitled Speech-on-Speech Masking in a Front-Back Dimension and Analysis of Binaural Parameters in Rooms Using MLS Methods presented by Neil L. Aaronson has been accepted towards fulfillment of the requirements for the PhD. degree in Physics m, min AM / Majo? Professor's Signature fiVQ-(A/Cty Zflfld/ flDate / MSU is an affinnative-action, equal-opportunity employer —-.--—’--.-.--.-l--‘-.--—-—.—.—.---v-.-.—~—.--.-.—-~— PLACE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE SPEECH-ON-SPEECH MASKING IN A FRONT-BACK DIMENSION AND ANALYSIS OF BINAURAL PARAMETERS IN ROOMS USING MLS METHODS By Neil L. Aaronson A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Physics 2008 ABSTRACT SPEECH-ON-SPEECH MASKING IN A FRONT-BACK DIMENSION AND ANALYSIS OF BINAURAL PARAMETERS IN ROOMS USING MLS METHODS By Neil L. Aaronson This dissertation deals with questions important to the problem of human sound source localization in rooms, starting with perceptual studies and moving on to physical measurements made in rooms. In Chapter 1, a perceptual study is performed relevant to a specific phenomenon — the effect of speech reflections oc— curring in the front-back dimension and the ability of humans to segregate that from unreflected speech. Distracters were presented from the same source as the target speech, a loudspeaker directly in front of the listener, and also from a loudspeaker directly behind the listener, delayed relative to the front loudspeaker. Steps were taken to minimize the contributions of binaural difference cues. For all delays within i32 ms, a release from informational masking of about 2 dB oc- curred. This suggested that human listeners are able to segregate speech sources based on spatial cues, even with minimal binaural cues. In moving on to physical measurements in rooms, a method was sought for simultaneous measurement of room characteristics such as impulse response (IR) and reverberation time (RT60), and binaural parameters such as interaural time difference (ITD), interaural level difference (ILD), and the interaural cross— correlation function and coherence. Chapter 2 involves investigations into the use- fulness of maximum length sequences (MLS) for these purposes. Comparisons to random telegraph noise (RTN) show that MLS performs better in the measurement of stationary and room transfer functions, IR, and RT60 by an order of magnitude in RMS percent error, even after Wiener filtering and exponential time-domain fil— tering have improved the accuracy of RTN measurements. Measurements were taken in real rooms in an effort to understand how the re- verberant characteristics of rooms affect binaural parameters important to sound source localization. Chapter 3 deals with interaural coherence, a parameter impor- tant for localization and perception of auditory source width. MLS were used to measure waveform and envelope coherences in two rooms for various source dis- tances and 0° azimuth through a head—and-torso simulator (KEMAR). A relation— ship is sought that relates these two types of coherence, since envelope coherence, while an important quantity, is generally less accessible than waveform coherence. A power law relationship is shown to exist between the two that works well within and across bands, for any source distance, and is robust to reverberant conditions of the room. Measurements of ITD, ILD, and coherence in rooms give insight into the way rooms affect these parameters, and in turn, the ability of listeners to localize sounds in rooms. Such measurements, along with room properties, are made and analyzed using MLS methods in Chapter 4. It was found that the pinnae cause incoherence for sound sources incident between 30° and 90°. In human listeners, this does not seem to adversely affect performance in lateralization experiments. The cause of poor coherence in rooms was studied as part of Chapter 4 as well. It was found that rooms affect coherence by introducing variance into the ITD spectra within the bands in which it is measured. A mathematical model to predict the interaural coherence within a band given the standard deviation of the ITD spectrum and the center frequency of the band gives an exponential relationship. This is found to work well in predicting measured coherence given ITD spectrum variance. The pinnae seem to affect the ITD spectrum in a similar way at incident sound angles for which coherence is poor in an anechoic environment. —A © 2008 Neil L. Aaronson All Rights Reserved This work is dedicated to my loving parents, who never stopped supporting and believing in me. ACKNOWLEDGMENTS It is a pleasure to acknowledge the many people who have aided and supported me in the course of my graduate studies. I would like to especially thank my adviser, Dr. William M. Hartmann, who has put up with me far longer than anyone but a blood relative should have to. He has gone to epic lengths to support my studies and shape me from a student into a scientist. His patience, calm, and sense of humor have been remarkable, especially that time I made fun of him in front of everyone at the Binaural Bash. I have found in him not just an adviser or a teacher, but a role model. I would like to thank Dr. Brad Rakerd for his gracious support and advice in all aspects of my studies and professional development, especially in generously allowing me to use his lab pretty much as I please. I also thank Dr. Barbara Shinn- Cunningham, Dr. Richard Freyman, and Dr. Leslie Bernstein for their guidance and expertise in our many discussions. I would like to thank Alex Azima for guid- ing me in my development as a teaching professor. I would be remiss not to thank my undergraduate mentors, the physics faculty at the College of New Jersey. I especially thank Dr. Ronald Gleeson, Dr. Raymond Pfeiffer, and Dr. Thulsi Wickramasinghe. They are responsible not just for my early development as a scientist, but for convincing me to become a physicist in the first place. Through them I gained a love and fascination for physics that made me want to go on to graduate school. Students such as myself are often confounded by the intricacies of academic bureaucracy. Too often forgotten are those who guide us through the details we usually forget about later when recounting the trek toward academic accomplish- ment. I owe special thanks to Mrs. Carole Calu and Mrs. Debbie Simmons for being absolutely amazing secretaries. vi Many of my friends and family know that moving from New Jersey, where I had lived for the first 22.8 years of my life, to Michigan for graduate school was a difficult thing for me to do. I would like to thank those who reminded me that I was missed back home, especially my parents, Rae, Mom-Mom, Joe, Marty, Ed, Rob, Jen, Steve, Jamie, T.J., Ian, Lisa, Kevin, and Kristin. Similarly, I must thank the dear friends I have made while in Michigan who have managed to convince me that the Midwest really isn’t so bad, even though it’s hard to find good Italian food: Kim, Erin, Carol, Luke, Catherine, Ana, Johanna, Dan, the other Dan, Shannon, Sean, Chris, Josh, and the enigmatic Steve. Finally, I would like to thank my dissertation committee members not previ- ously mentioned for their helpful advice and genuine interest in my education: Dr. Kirsten Tollefson, Dr. Norman Birge, Dr. SD. Mahanti, and Dr. Wayne Repko. vii TABLE OF CONTENTS LIST OF TABLES .................................. xi LIST OF FIGURES .................................. xiv 0 An Introduction to Spatial Hearing and Speech Segregation ....... 1 1 Release From Informational Masking in the Front—Back Dimension . . 11 1.1 Introduction ................................. 11 1.2 Experiment 1: Front-Back Presentation with Speech Distracters . . . 15 1.2.1 Listeners ............................... 15 1.2.2 Anechoic room and experimental layout ............ 15 1.2.3 Listener's chair ........................... 16 1.2.4 Loudspeaker alignment ...................... 17 1.2.5 Stimuli ................................ 17 1.2.6 The Task ............................... 18 1.2.7 Front-Only baselines ........................ 19 1.2.8 Front—Back ADD experiment ................... 20 1.2.9 Results and Analysis ........................ 20 1.3 Experiment 2: Front-Back Presentation with Speech-Shaped Noise Maskers ................................... 23 1.3.1 Experimental setup ......................... 24 1.3.2 Stimuli ................................ 24 1. 3. 3 Results and analysis ........................ 26 1. 4 Experiment 3. Front-Front Presentation with Speech Distracters . . . 29 1 ..4 1 Experimental design ........................ 29 1. 4. 2 Results and discussion ....................... 30 1.4.3 Delay—and-add filtering ...................... 32 1.5 Experiment 4: Spectral Structure - A Follow-Up to Experiments 1 and 3 ..................................... 33 1.5.1 Experimental design ........................ 35 1.5.2 Results and analysis ........................ 37 1.6 Summary ................................... 38 2 The MLS Method and Applications to Binaural Measurements in Rooms 44 2.1 Introduction ................................. 44 2.2 Capabilities and Limitations ........................ 47 2.3 Fundamental Equations and Concepts .................. 48 2.3.1 Interaural Level Difference Measurements ........... 48 2.3.2 Coherence and Interaural Time Difference Measurements . . 49 2.3.3 Impulse Response and Transfer Function Measurements . . . 51 2.3.4 Reverberation Time Measurements ................ 53 2.4 Verification of the value of the MLS method ............... 56 2.5 Experiment 5: Comparison to Random Telegraph Noise ........ 59 2.5.1 Methods ............................... 60 2.5.2 Measurements using MLS and RTN ............... 61 2.5.3 Error calculation .......................... 62 2.5.4 Results ................................ 63 2.6 Experiment 6: Error Reduction Techniques for RTN Measurements . 66 2.6.1 Wiener filtering ........................... 69 2.6.2 Exponential windowing ...................... 70 2.6.3 Conclusions ............................. 74 2.7 Experiment 7: MLS and RTN measurements in a real room ...... 75 2.7.1 Methods ............................... 76 2.7.2 Results ................................ 77 2.7.3 Conclusions ............................. 80 Experiment 8: Measurements of Envelope and Waveform Interaural Co- herences With KEMAR ............................. 82 3.1 Computer Simulation of Noises and Coherences ............ 85 3.2 Methods ................................... 91 3.3 Analysis of Results Within Bands ..................... 93 3.4 Analysis of Data Combined Across Bands ................ 109 3.4.1 Trimming Data Based on Centrality of Waveform Coherences 112 3.5 Conclusions ................................. 116 Measurement of Binaural Properties in Rooms as a Function of Azimuth 120 4.1 4.2 Experiment 9: Measurements of Binaural Properties as a Function of Horizontal Angle in Anechoic Conditions .............. 120 4.1.1 Methods ............................... 123 4.1.2 ITD Results ............................. 126 4.1.3 ILD Results ............................. 132 4.1.4 Coherence Results ......................... 137 4.1.5 Incoherence due to Pseudorandom Dispersion ......... 158 4.1.6 Comparison of KEMAR Ears ................... 163 Further Experiments on the Coherence at 45° .............. 169 4.2.1 Variations on Methods of Coherence Measurement in KEMAR 169 4.2.2 - Localizability of Noise Bands at 45° by Human Listeners . . . 175 4.2.3 Measurements in KEMAR and Human Listeners with Probe Microphones ............................ 181 listeners ............................... 187 4.3 Experiment 10: Measurements of Binaural Parameters and Room Characteristics in a Highly Reverberant Environment ......... 192 4.3.1 Methods ............................... 193 4.3.2 Coherence Results ......................... 194 4.3.3 Reverberation Times ........................ 196 4.3.4 ITD Results ............................. 199 4.3.5 ILD Results ............................. 203 4.4 Experiment 11: Measurements of Binaural Parameters and Room Characteristics in a Normal Room .................... 205 4.4.1 Methods ............................... 208 4.4.2 Coherence Results ......................... 208 4.4.3 Reverberation Times ........................ 212 4.4.4 ITD Results ............................. 214 4.4.5 ILD Results ............................. 219 4.5 Summary and Conclusions ........................ 222 APPENDICES .................................... 225 A Test for Interaural Differences ......................... 225 B Separated-Source Presentation with Speech Distracters .......... 228 C Wiener filtering .................................. 231 D Guide to Acronyms ............................... 234 BIBLIOGRAPHY ............................... ~ . . . 236 x 4.2.4 Localization of noises with coherence as measured in human 1.1 2.1 2.2 3.1 3.2 LIST OF TABLES Experiment 4: Percentage of correct responses (plus or minus one standard deviation) for each listener by condition. (a) Front-only baseline condition. (b) and (c) Simulated delay-and-add filtering with a 2 ms delay. (d) Front-Front condition with 2 ms delay - data taken from Experiment 3. Averages and standard deviations for each listener are calculated across three runs. Averages and stan- dard deviations across listeners are shown in the bottom row. Average percents RMS error, plus and minus one standard devia- tion, for single measurements of the TF at each order j (correspond- ing to the length of the signal). The header label of each column in- dicates the type of signal used (MLS or RTN). Quantities under the R19) correspond to RTN measurements made including a posteriori Wiener filtering of the IR, and quantities under the RED correspond to RTN measurements made with exponential windowing in time of the IR. ................................... Percents error calculated from the average TF calculated across nine trials using various methods and signals. The header label of each column indicates the type of signal used (MLS or RTN). Quantities under the Kg) correspond to RTN measurements made including a posteriori Wiener filtering of the IR, and quantities under the Kg] ) cor- respond to RTN measurements made with exponential windowing in time of the IR ................................ Values of the power parameter n from a fit of the coherence data within each 1/3-octave band to Equation 3.5 for randomly gener- ated Gaussian noises. The bounds about each n value are 95% confi- dence intervals. As expected, the RMS error decreases as the center frequency, and thus the width of the noise band, increases. The er- rors indicate a general trend toward a better fit to Equation 3.5 with increasing frequency. ............................ Values of the power parameter n with 95% confidence intervals for the data in each 1 /3-octave band measured in room IOB at distances of 0.5 and 1.0 m. These values were found from the best fit of the data Within that band to Equation 3.5, and the RMS differences be- 37 65 90 tween the data and the best fitting line are given ............. 103 xi 3.3 3.4 3.5 3.6 3.7 3.8 4.1 Values of the power parameter n with 95% confidence intervals for the data in each 1 / 3-octave band measured in room 103 at distances of 3.0 and 5.0 m. These values were found from the best fit of the data within that band to Equation 3.5, and the RMS differences be- tween the data and the best fitting line are given ............. Values of the power parameter n with 95% confidence intervals for the data in each 1/3—octave band measured in a reverberant room at distances of 0.5 and 1.0 In. These values were found from the best fit of the data within that band to Equation 3.5, and the RMS differences between the data and the best fitting line are given. . . . Values of the power parameter n with 95% confidence intervals for the data in each 1 /3-octave band measured in a reverberant room at distances of 3.0 and 5.0 m. These values were found from the best fit of the data within that band to Equation 3.5, and the RMS differences between the data and the best fitting line are given. . . . Summary of average values of the power parameter n calculated across fits to Equation 3.5 within 1 / 3—octave bands in both room 103 and the reverberant room at each distance. Values of 11 within each room that are not significantly different from each other are grouped by brackets. ................................. Values of the power parameter n and the RMS error for the best fit of Equation 3.5 to envelope versus waveform coherence data com- bined, at each position, across all 1 / 3-octave bands. .......... 95% confidence intervals about the mean of the power parameter 7: for different source distances in room 108 and the reverberant room. The parameter n was found by nonlinear regression of Equation 3.5 to data combined across 1 / 3—octave bands after points with wave- form coherences 7w < 1]) or 7w > (1 — 47) were removed. Thus, these values of n and the related RMS errors are determined only by data with abscissae near 0.5. Increasing values of 4) remove increas- ingly many points near the boundary values of 7w. In the case of a 1.0 m source distance, most data points are grouped on the high end of the best-fit curve (see Figure 3.14), and a cutoff of 1p = 0.25 left too few points in the data set to perform a regression. ......... Coherence and ITD of BOO-Hz noise bands recorded through KEMAR at incident angles of 145° in an anechoic environment. . . . 104 . 105 . 106 109 110 115 176 A 4.2 Just-noticable differences (JNDs) for coherent and incoherent noises moving to the left or to the right. JNDs for sounds moving to the left are on the left side of the table and JNDs for sounds moving to the right are on the right side of the table. One degree on the table corresponds to an ITD of 5.1 ys. Incoherent sounds at 45° had an interaural coherence of 0.7303 and those at —45° had a coherence of 0.6865. These JNDs tend to be significantly larger for the inco- herent noises than for coherent noises, indicating greater difficulty of listeners to localize incoherent noises under otherwise identical conditions. .................................. 182 4.3 Interaural coherences in the 500-Hz band from signals measured with probe microphones in KEMAR and human listeners E, N, and 2.188 4.4 Just-noticable differences (JNDs) for coherent and incoherent noises moving to the left or to the right. JNDs for sounds moving to the left are on the left side of the table and JNDs for sounds moving to the right are on the right side of the table. Incoherent sounds at 45° and —45° had coherences for each listener identical to that measured in that listener with probe microphones. The specific coherences for each listener can be found in Table 4.3. There is no significant difference in JNDs for the coherent and incoherent noises. ...... 191 3.1 Separated—source Experiment: Percentage of correct responses (plus or minus one standard deviation) for each listener by condition. (a) Front-Only baseline condition. (b) Front-Only condition with +4 dB target level relative to a single distracter. (c) Target in front and dis- tracters in back. Averages and standard deviations for each listener are calculated across five runs. Averages and standard deviations across listeners are shown in the bottom row. .............. 230 l 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 LIST OF FIGURES An overview of the extent to which release from informational and energetic masking was found to occur in the horizontal plane across a wide range of delays by Rakerd et al. [76]. The release from infor- mational masking is approximately 10 dB in magnitude throughout the range of delays from —32 ms to +32 ms. The magnitude of the release from energetic masking is only about one-third as strong (ap- proximately 4 dB) and is limited to a range of delays from —2 ms to +2 ms. .................................... Arrangement of speakers relative to a listener for Experiment 1. The delay between speakers is represented by ‘r, positive if the dis- tracters from the back loudspeaker lead those from the front and negative if the distracters from the front loudspeaker lead those from the back. ................................ A schematic of the full set of CRM sentences that were possible in this study. The listener is instructed to listen for ”Laker.” The target talker always speaks this call sign. .................... Results of Experiment 1, the Front-Back experiment with speech dis- tracters, for each of the four listeners and the average across listeners as a plot of percent correct versus delay, I ................. Average amplitude spectra of four different female voices. Each panel represents the average spectrum for one talker’s voice, aver- aged over five utterances. The vertical axis has arbitrary amplitude units, the same for all four .......................... Individual and average results of Experiment 2, the Front-Back ex- periment with noise distracters, similar in form to Figure 1.4, as a plot of percent correct versus delay, I. For listener N, who showed no variation within three runs in performance of the boosted Front- Only condition, a dashed straight line represents the average per- cent correct. ................................. Arrangement of speakers relative to a listener for Experiment 3. The delay between speakers is represented by r ................ Individual and average results of Experiment 3, the Front-Front ex- periment with speech distracters, similar in form to Figure 1.4, as a plot of percent correct versus delay, 1'. .................. 16 18 22 25 30 31 1.9 1.10 2.1 2.2 2.3 2.4 Solid lines show the theoretical amplitude response of an ideal delay—and-add filter with delay 1' = 2 ms. Dips occur at 250 Hz and every additional 500 Hz thereafter. Peaks occur at every integer mulfiple of 500 Hz. Superimposed in dashed lines is the average amplitude spectrum of one of the four female CRM voices, which is taken from the upper left panel of Figure 1.5 ............... The top graph shows the amplitude response of a filter imitating the first dip in a delay-and-add filter with delay I = 2 ms. This is a FIR filter of order 336. The bottom graph shows the amplitude response of a filter imitating the second dip in a delay-and-add filter with delay I = 2 ms. This is a FIR filter of order 392 .............. A linear-feedback shift register with five bits and taps at bits 1, 2, and 4. The bits a,- can take only binary values. An XOR gate per- forms modulo 2 addition of the output bit, a0 and al, the result of which becomes one input for the next XOR gate, which ”taps” (12 for its other input. This is then fed as an input to the XOR gate which taps £14. The last XOR gate in the sequence feeds its output back to the first bit, a4. The resulting output sequence, 5 [n] will be a MLS of order 5 ..................................... Cross-correlations between signals measured in some 1 /3-octave bands in an anechoic environment between signals measured in the left and right ears of a KEMAR in an anechoic environment. The KEMAR was facing a direction 45° from the direction of the inci- dent sound. A vertical dashed line indicates the location of the peak cross-correlation (the waveform coherence) in each panel, which oc- curs at the ITD. The value of the ITD in each 1 /3-octave band shown is indicated by the number in the lower-right corner of each panel. A simple depiction of the MLS measurement method. A MLS, x is played through a loudspeaker in a room with transfer function h. The signal recorded by the receiver is y. The recorded signal y is related to the MLS x via the transfer function by y = h * x. ...... The first 400 ms of the integrated impulse decay curve (IIDC) for a small, roughly square room (of dimensions 5.7 m by 4.5 m by 3.5 m high) is shown as a solid line. A dashed line shows a linear trend line fit to the first decay mode of the IIDC. The fit line reaches a level of —60 dB at a time of 323 ms, thusly indicating the 60-dB reverber- ation time, RT60, of the room. ....................... 34 36 45 52 53 55 2.5 2.6 2.7 2.8 2.9 2.10 2.11 The impulse response of the electronic 1 /3-octave band equalizer as measured by a MLS of order 19 ....................... The transfer function of the electronic 1 /3-octave band equalizer as measured by a MLS of order 19. Numbers 1 through 6 count the number of valleys and shoulders over the audible range. ....... The accepted TF, plotted as a dashed line, and the TF as measured by a MLS of order 9, plotted as a solid line. There is almost no dif- ference between the two at most frequencies, and thus the order-9 MLS measurement obscures the dashed line corresponding to the accepted TF. ................................. The accepted TF, plotted as a dashed line, and the average TF mea- sured by nine RTN of orders as indicated in each panel, plotted as a solid line. The jaggedness of the TF as measured by the RTN tends to obscure the dashed line .......................... The accepted impulse response, plotted as a continuous line, and the exponential window in time, plotted with a dashed line. The exponential window shown here has a rate parameter of b = 0.05 and starts its decay at time re = 1.5 ms. ................. The TF of a small, roughly square office of dimensions 5.7 m by 4.5 m by 3.5 m high. In each case, the TF was calculated from the average IR calculated over five measurements. The top panel shows the TF as measured by MLS. The middle panel shows the TF as mea- sured by RTN. The final panel shows the TF calculated when the av- erage IR measured by RTN was filtered by a Wiener filter before. the TF was calculated ............................... Measurements of the 60-dB reverberation time in the same space as that referred to in Figure 2.10. Open circles represent times cal- culated from MLS measurements, exes mark times calculated from RTN measurements, and open diamonds mark times calculated from Wiener-filtered RTN measurements. ................ xvi 57 58 67 68 72 78 79 3.1 3.2 3.3 3.4 3.5 3.6 3.7 A frequency histogram showing the distribution of the amplitude of samples in an order-18 MLS after the signal was passed through a gammatone filter with center frequency 500 Hz and a 1 /3-octave bandwidth. The histogram is enveloped by a Gaussian distribution with zero mean, standard deviation 0.229, and scaled by a factor of 0.067, shown as a dashed solid line. This attests to the Gaussian nature of the filtered MLS. ......................... ' Plots of envelope coherence versus waveform coherence for simu- lated Gaussian noise pairs in six different 1 /3-octave bands. Each band contains 1001 coherence pairs, each of which is a single data point. A best fitting line to Equation 3.5 is shown as a thick solid line in each plot, though it may be obscured by the data points. The value of the power parameter n for each set of points is shown in each panel ................................... RMS Error as a function of center frequency. The trend line shows a decrease in RMS Error proportional to 1 / f 0'35. ............. Plots of waveform coherence versus envelope coherence as mea- sured through KEMAR in room 10B at a distance of 0.5 m to the loudspeaker. Each panel shows the data measured in a particular 1 / 3-octave band with center frequency indicated in the top left cor- ner of the panel. A best fit line is shown in each panel, which fits the data within the indicated band to Equation 3.5. The best fit value of the power parameter n for the data in each band is shown in the upper left of each panel. Note that the horizontal and vertical scales differineachpanel..........................._... Plots of waveform coherence versus envelope coherence as mea- sured through KEMAR in room 10B at a distance of 1.0 m to the loudspeaker, in a manner similar to that of Figure 3.4 ......... Plots of waveform coherence versus envelope coherence as mea- sured through KEMAR in room 103 at a distance of 3.0 m to the loudspeaker, in a manner similar to that of Figure 3.4 ......... Plots of waveform coherence versus envelope coherence as mea- sured through KEMAR in room 103 at a distance of 5.0 m to the loudspeaker, in a manner similar to that of Figure 3.4 ......... 86 89 91 94 95 96 97 3.8 Plots of waveform coherence versus envelope coherence as mea- sured through KEMAR in a reverberant room at a distance of 0.5 m to the loudspeaker, in a manner similar to that of Figure 3.4 ...... 98 3.9 Plots of waveform coherence versus envelope coherence as mea- sured through KEMAR in a reverberant room at a distance of 1.0 m to the loudspeaker, in a manner similar to that of Figure 3.4 ...... 99 3.10 Plots of waveform coherence versus envelope coherence as mea- sured through KEMAR in a reverberant room at a distance of 3.0 m to the loudspeaker, in a manner similar to that of Figure 3.4 ...... 100 3.11 Plots of waveform coherence versus envelope coherence as mea- sured through KEMAR in a reverberant room at a distance of 5.0 m to the loudspeaker, in a manner similar to that of Figure 3.5 ...... 101 3.12 RMS Error as a function of 1/3-octave band center frequency for coherences measured in the reverberant room at a source distance of 3.0 m. ................................... 108 3.1 U.) A plot of values of n measured in rooms 103 (above) and the rever- berant room (below) as a function of 1/3-octave band center fre- quency. Different lines are plotted for different source distances. There are no clear trends evident in the trends across either fre- quency or distance to the source. ..................... 110 3.14 Combined envelope versus waveform coherence data across all 1 /3- octave bands measured in rooms 103 and the reverberant room at each source distance. Trend lines show the best fit of the data in each panel to Equation 3.5, and the value of the fitting parameter n is shown in the upper-left corner of each panel .............. 111 3.15 Plots of Equation 3.5 with three different values of the power para- meter, n. There is very little difference between the curves for these values of n ................................... 113 3.16 Plots of the combined data at a source distance of 5.0 m in room 10B and in the reverberant room. Each panel shows the data remain- ing after points with waveform coherence 71” < 1]) and those with 7;» > (1 — up) have been removed. The best fit of Equation 3.5 to the remaining data is shown as a solid line and the power parameter n is given in each panel. ........................... 114 xviii I 3.17 The 95% Bound curve (closed circles) gives the absolute error in en- velope coherence for which 95% of the simulated coherences deviate by that amount or less from the coherence predicted by Equation 3.6. The Mean curve (open circles) gives the mean absolute error relative to Equation 3.6 for simulated envelope coherences. Both curves are third-order polynomials fitted to the absolute errors of the binned data ....................................... 119 4.1 A schematic view of the alignment of the KEMAR head relative to the fixed loudspeaker and some of the angles 0, indication the di- rection of the KEMAR nose. When the KEMAR is rotated in place and the loudspeaker remains stationary, the angles of incidence are positive for clockwise rotations and negative for counterclockwise rotations .................................... 124 4.2 A depiction of the locations of the KEMAR and louspeaker rela- tive to the surroundings of the anechoic room in Positions 1 and 2. In each position, the relative separation of the KEMAR and loud— speaker remains the same, though their positions within the room change ..................................... 126 4.3 ITD in milliseconds as a function of angle in 1 / 3-octave bands, plot- ted in the same format as Figure 4.11. ITDs for the remaining higher frequency bands are shown in Figure 4.4 ................. 128 4.4 ITD in milliseconds (ms) as a function of angle in 1 / 3-octave bands, plotted in the same format as Figure 4.12. Closed symbols represent original data before corrections for period errors were made. ITDs for lower frequency bands are shown in Figure 4.3. .......... 129 4.5 Head radii a as a function of frequency, calculated by fitting Kuhn’s low-frequency approximation (Equation 4.3) and the Woodworth formula (Equation 4.5) to ITD data in 1 / 3-octave bands. The results of fitting the Woodworth equation are shown by closed triangles and the results of the low-frequency approximation are represented by open circles. The low-frequency approximation predicts a head radius of about 8.0 cm in the low-frequency bands, and this is in close agreement with the predictions of the Woodworth formula in high-frequency bands. ........................... 130 xix 4.6 4.7 4.8 4.9 4.10 4.1 ._| The linear parameter b as a function of frequency, calculated by fit- ting Equation 4.6 to ITD data for frequencies at and above 2500 Hz. This parameter describes the relative contribution of the linear term 0 in Woodworth’s equation (Equation 4.5) when the head radius is fixed to be 7.25 cm. ............................. 132 ITD data from the 8000 Hz 1/3-octave band is plotted across fre- quency as open circles. A fit of Equation 4.5 to this data, shown by the solid line, is made when the fitting parameter (the head ra- dius) is a = 8.2 cm. The dashed line shows a fit to this data of Equa- tion 4.6, in which the head radius is set to a = 7.25 cm, and the fitting parameter is found to be b = 1.22. .................... 133 ILD in decibels (dB) as a function of angle in 1 / 3-octave bands, plot- ted in the same format as Figure 4.11. ILDs for the remaining higher frequency bands are shown in Figure 4.9. In each panel, the vertical scale changes, as shown by axes labels on the left and right sides of the figure. .................................. 135 ILD in decibels (dB) as a function of angle in 1 / 3-octave bands, plot- ted in the same format as Figure 4.12. ILDs for lower frequency bands are shown in Figure 4.8. In each panel, the vertical scale changes, as shown by axes labels on the left and right sides of the figure ...................................... 136 Coherence 'y as a function of frequency for positive angles in 30° increments. The top panel shows measurements made at Position 1, and the bottom panel shows measurements made at Position 2 in the anechoic environment .......................... 138 Coherences as a function of angle in 1/3-octave bands for each of two different positions in the same anechoic room. Four frequency bands are shown per plot, with results for position 1 on the left and for position 2 on the right. Shown in this plot are coherences for fre- quency bands from 80 to 1000 Hz. Filled-in symbols represent origi- nal data before corrections for period errors were made. Coherences for the remaining higher frequency bands are shown in Figure 4.12. . 139 XX 4.12 4.13 4.14 4.15 4.16 4.17 Coherences as a function of angle in 1/3-octave bands for each of two different positions in the same anechoic room. Four frequency bands are shown per plot, with results for position 1 on the left and for position 2 on the right. Shown in this plot are coherences for frequency bands from 12500 to 16000 Hz. Filled-in symbols repre- sent original data before corrections for period errors were made. Coherences for the lower frequency bands are shown in Figure 4.11. . In open circles, a plot of average difference in coherence between those measured at 0° and 45° across 1/3-octave frequency bands from 160 to 3150 Hz in anechoic conditions. The trend line and its accompanying equation represent a nonlinear least squares regres- sion to the data. ............................... Coherences in a 500 Hz 1/3—octave band measured in KEMAR for sound at incident angles of 9 = {—45°,0°,+45°}. Symbols rep- resent averages over five measurements and error bars extend one standard deviation in each direction. Square symbols represent the ”true” interaural coherence, measured between unaltered signals recorded in the left and right ears. Circles represent coherences when the ILD was made to be constant within the frequency band, and thus are representative of the effect of varying ITDs within the band. Triangles represent coherences when the ITD was made to be constant within the band, and thus are representative of the effect of varying ILDs within the band ........................ The ITD spectra in the 10000 Hz-centered band (left plots) and the 500 Hz-centered band (right plots). The three sets of plots show the variation in the ITDs within the indicated 1 / 3-octave frequency band when the KEMAR is oriented —-45°, 0°, and +45° relative to the direction of the sound source. At :l:45°, the large variability of the ITDs in the 500-Hz band may lead to significant incoherence. A plot of the cross-correlation function as described by Equa— tion 4.11 with a center frequency of 1000 Hz, a 1/3-octave band- width, and an ITD of 0.27 ms. The dashed line shows the sinc func- tion enveloping the cosine .......................... A linearly varying ITD over some finite frequency band with cen- ter frequency we and bandwidth 26. The parameter C describes the maximum difference between the ITD at any given frequency and the average ITD, T o .............................. xxi 140 144 ..146 149 149 4.18 4.19 4.2 O 4.21 4.22 4.23 4.24 Plots of the Fresnel integrals C (2) (solid line) and 5(2) (dashed line). The ITD shift AT and the coherence '7 for various bands with center frequencies indicated on the vertical axis and flat 1 /3-octave band- widths as a function of the dispersion parameter E. Data points for various values of 6 are shown in open symbols (circles for AT and diamonds for 'y), and best fitting curves to Equations 4.19 and 4.20 are plotted in solid lines. As either g or center frequency increases, these equations do not hold, and so no such fit lines were plotted for the highest frequency band. In principle, only one plot is necessary because 7 is a function of (it: (see Equation 4.23) ............. Best fit parameters a, b, and m across band center frequency. These parameters fit ITD shift data and coherence data to equations of the form of Equations 4.19 and 4.20 ....................... The parameter a relating AT to the dispersion parameter (I as a func- tion of the bandwidth parameter ,3. An approximate relationship between a and [3 is given by Equation 4.24 to be a = B/ 3. ....... Coherence in various 1 / 3-octave bands as a function of the standard deviation of the noise added to the ITD of the signals. Best fitting curves corresponding to Equation 4.26 are drawn with the data. . Measurements of the ITD taken at 0° (circles) and 180° (triangles) in an anechoic environment in 24 1/3-octave bands from 80 to 16000 Hz. Measurements were taken at two different positions in the anechoic room, each with the same separation between KEMAR and loudspeaker. Measurements taken at position 1 are indicated by open symbols and are shown in the top panel. Measurements taken at position 2 are indicated by filled symbols and are shown in the bottom panel. ................................ Measurements of the interaural coherence taken at 0° (circles) and 180° (triangles) in an anechoic environment in 24 1 / 3-octave bands from 80 to 16000 Hz. Measurements were taken at two different po- sitions in the anechoic room, each with the same separation between KEMAR and loudspeaker. Measurements taken at position 1 are shown in the top panel, and measurements taken at position 2 are shown in the bottom ............................. xxii 151 153 154 157 . 160 165 166 4.25 Measurements of the ILD taken at 0° (circles) and 180° (triangles) in an anechoic environment in twenty-four 1 /3-octave bands from 80 to 16000 Hz. Measurements were taken at two different positions in the anechoic room, each with the same separation between KEMAR and loudspeaker. Measurements taken at position 1 are indicated by open symbols of either type and are located in the top panel. Measurements taken at position 2 are indicated by filled symbols and are located in the bottom panel. ................... 168 4.26 Coherences measured in 1/3-octave bands in an anechoic envi- ronment using several variations of the experimental apparatus to record the signals in both ears. Each variation is labeled with a dif- ferent letter and is plotted separately with different symbols. The variations are A: measurements through headphone-mounted mi- crophones placed on KEMAR; B: recordings through KEMAR ears but with pinnae removed; C: as variation B, but with cardboard ”ear canals;” D: as variaton B, but with only the left pinna removed; E: as variation B, but with only the right pinna removed; F: record- ings through microphones mounted on a JVC foam dummy head; G: recordings made through KEMAR normally with both pinnae present ..................................... 172 4.27 Coherences measured in 1 /3-octave bands in an anechoic environ- ment normally through KEMAR with pinnae in place (variation G, + symbols), and though a KEMAR head, detached from the KEMAR torso (variation H, up-arrow symbols). ............ 174 4.2 00 Percentage of ”right” responses as a function of delay for noises at an angle of 45° for four different listeners. Responses corresponding to coherent noises are plotted with filled circles. Responses corre- sponding to incoherent noises are plotted with open circles. Thresh- olds for discrimination are plotted as solid horizontal grid lines. A shift of one sample corresponds to a perceptual shift of approxi- mately one degree. ............................. 179 4.29 Percentage of ”right” responses as a function of delay for noises at an angle of —45° for four different listeners. Responses correspond- ing to coherent noises are plotted with filled circles. Responses corresponding to incoherent noises are plotted with open circles. Thresholds for discrimination are plotted as solid horizontal grid lines. A shift of one sample corresponds to a perceptual shift of ap- proximately one degree. .......................... 180 xxiii 4.30 Coherences measured in three human listeners and a KEMAR at in- cident angles of sound of :l:45° and 0°. Averages and standard devi- ations are calculated over three measurements made at each angle. Error bars span one standard deviation in each direction. Error bars smaller than the size of the data points are invisible ........... 184 4.31 Coherences measured at 45° from recordings made through KEMAR Zwislocki couplers and electronics (+ symbols) and through ER—7 probe microphones (open squares). ........... 186 4.3 N Percentage of ”right” responses as a function of delay for noises at an angle of 45° for four different listeners. Noises presented to each listener had interaural coherences equal to that measured in that particular listener’s ears in the 500 Hz 1 /3-octave band. Responses corresponding to coherent noises are plotted with filled circles. Re- sponses corresponding to incoherent noises are plotted with open circles. Thresholds for discrimination are plotted as solid horizontal grid lines. A shift of one sample corresponds to a perceptual shift of approximately one degree .......................... 189 4.33 Percentage of ”right” responses as a function of delay for noises at an angle of —45° for four different listeners. Noises presented to each listener had interaural coherences equal to that measured in that particular listener’s ears in the 500 Hz 1/3-octave band. Re- sponses corresponding to coherent noises are plotted with filled cir- cles. Responses corresponding to incoherent noises are plotted with open circles. Thresholds for discrimination are plotted as solid hor- izontal grid lines. A shift of one sample corresponds to a perceptual shift of approximately one degree. .................... 190 4.34 Average coherences across all measurements in 1/3-octave bands made in a reverberant room at a distance of 3.05 m from the sound source. Error bars extend one standard deviation in each direction. . 195 4.35 The ITD spectrum of the 1/3-octave band centered at 1000 Hz, recorded through KEMAR in the highly reverberant environment. The interaural waveform coherence within this band is 7 = 0.2453. The standard deviation in ITD across the band is a = 251 ms, which gives a coherence of ('y) = 0.2883 when Equation 4.31 is employed. . 197 xxiv 4.36 60-dB reverberation times, averaged across all measurements from 4.37 4.3 (X) 4.39 4.40 4.4 H both ears of a KEMAR in 1/3—octave bands, made in a reverberant room. Error bars extend one standard deviation in each direction and indicate excellent agreement across different positions in the room and between the two ears ....................... 198 ITDs measured between two ears of a KEMAR in a reverberant room. Data points indicate average measured values, with error bars spanning one standard deviation in each direction. Averages are calculated across 30 measurements, made at six different posi- tions in the reverberant room, and standard deviations are calcu- lated across average values from each position. At each position, the KEMAR and sound source were separated by 3.05 m with the KEMAR facing the sound source. Small dots indicate the ITD of pe- riod errors that occurred in at least one trial. Dashed lines indicate the theoretical ITD for one, two and three period errors respectively as a function of frequency. ......................... 200 lTDs measured between two ears of a KEMAR in a reverberant room, presented in a manner identical to that of Figure 4.37. ITDs are calculated by finding the lag of the peak in a cross-correlation function between the signals recorded in the left and right ears, with a maximum lag of i2 ms. ......................... 202 ILDs measured between two ears of a KEMAR in a reverberant room. Data points indicate average measured values, with error bars spanning one standard deviation in each direction. Averages are calculated across 30 measurements, made at five different po- sitions in the reverberant room, and standard deviations are calcu- lated across average values from each position. At each position, the KEMAR and sound source were separated by 3.05 m with the KEMAR facing the sound source ...................... 203 ILDs measured in a reverberant room at six different positions. The data points for each position are labeled with different symbols. Er- ror bars extending one standard deviation in each direction are all smaller than the size of the data points, and so are not shown. . . . . 204 Average coherences across all measurements in 1/3-octave bands madein room 103. Error bars extend one standard deviation in each direction .................................... 209 XXV A 4.42 A plot of waveform coherence 7 versus the ratio of direct to total 4.43 4.44 4.45 4.4 O sound intensity in room 103. Direct sound intensities were mea- sured in an anechoic environment and total intensities were mea- sured in room 10B. The location of each point is found from the coherence and intensities measured in one of nineteen 1/3-octave bands with ISO center frequencies between 160 Hz and 10 kHz. Points from bands with center frequencies less than or equal to 1 kHz are marked as solid circles, and points from bands above 1 kHz are marked as open circles. This plot is generated from data pooled across all frequency bands and measurements taken at sev- eral locations in the room. The linear trend line is shown and the associated Pearson product-moment correlation coefficient r is noted. 211 60-dB reverberation times, averaged across all measurements from both ears of a KEMAR in 1 / 3-octave bands, made in room 103. Error bars extend one standard deviation in each direction. Small error bars in most bands indicate excellent agreement across the two ears and across different positions in the room. Errors at low frequencies are similar to those seen in Figure 4.36 for the reverberant room. . ITDs measured between two ears of a KEMAR in room 10B. Data is presented in a manner like that of Figure 4.37. Averages are indi- cated by open circles, with error bars spanning one standard devia- tion in each direction. Small dots indicate the ITD of period errors that occurred in at least one trial. Dashed lines indicate the theo- retical ITD for one, two and three period errors respectively as a function of frequency ............................. ITDs measured between two ears of a KEMAR in room 10B,Spre- sented in a manner identical to that of Figure 4.44. lTDs are calcu- lated by finding the lag of the peak in a cross-correlation function between the signals recorded in the left and right ears, with a maxi- mum lag of i2 ms. ............................. Average ITDs (open circles) plus and minus one standard deviation, and sizes of the standard deviations (closed symbols) measured in room 103. ITDs are measured in a 800 Hz band with a bandwidth as indicated on the abscissa as a fraction of an octave. ITDs approach the expected value for sounds incident at 0° of 0 ms and the vari- ances across measurements decrease as the bandwidth is increased. . . 213 215 216 218 4.47 ILDs measured between two ears of a KEMAR in room 10B. The presentation of data is the same as that used in Figure 4.39. Data points indicate average measured values, with error bars spanning one standard deviation in each direction. This represents the aver- age of the points shown in Figure 4.48, with error bars extending one standard deviation in each direction. The trend in ILDs is similar to that found in the reverberant environment. ............... 220 4.48 ILDs measured in room IOB at six different positions. This figure is akin to Figure 4.40 for ILDs measured in the reverberant room. The data points for each position are labeled with different symbols. Error bars smaller than the size of the data points are not shown. . . 221 xxvii CHAPTER 0 An Introduction to Spatial Hearing and Speech Segregation The present chapter is meant to serve as a short, informal introduction to the science of sound localization and its role in speech segregation, and an overview of what is covered in this thesis. To encourage a flowing and continuous narrative, it is free of citations, as those are more appropriate and helpful in the context of the following chapters. All the important information contained within this chapter is reiterated, albeit in a more formal manner, in other parts of this thesis, where it is also throughly cited. Readers unfamiliar with or new to the science of binaural hearing would benefit from reading this chapter, while those familiar with the field can safely move on to Chapter 1. In our everyday lives, we experience many different auditory environments and face the challenges of listening and understanding the soundscape around us. A common experience is one in which people must comprehend and attend to speech in an enclosed space (i.e. a room). Often, we are faced with the task of attending to the speech of a particular person while filtering out other speech signals in the room. This situation, often referred to as a ”cocktail party," is one that has been studied by the field of psychoacoustics. The narrative that follows is meant to describe a ”cocktail party” environment. It will introduce some of the terminology common in the field of psychoacoustics and describe how our auditory system localizes different kinds of sound. Imagine that you are just arriving at a swanky cocktail party at the art museum to celebrate the new Martin Dulak exhibit. Only ten minutes late, you are one of the first to arrive. As you wander aimlessly around the room, casually searching for someone you know, you are surprised by someone asking you a question. “Can I get you something to drink?” Just two meters off to your right stands a waiter. You were able to turn directly to the waiter as soon as he spoke to you though you had not seen him there before. You knew from the sound right where he was. How are you able to do this? Our auditory system uses several sources of informa- tion (often referred to as cues) in the task of localization. The waiter was standing off to your right, and so the sound of his voice arrived at your right ear before it arrived at your left. This difference in arrival time is called the interaural time differ- ence (ITD). This cue is especially important for low-frequency sounds (below about 2 kHz). Also, because your head is an obstruction for sound arriving at your left ear, the level (or intensity, or amplitude) of the sound will be less at your left ear than it was at your right ear. This difference in level is called the interaural level dif- ference (ILD). Your head is a greater barrier as sounds increase in frequency, and so the ILD cue is quite important for high-frequency sounds (above about 2.5 kHz). Also, note that the waiter was only a couple of meters away when he caught your attention, so reflections from the room were not especially distracting. You thank the waiter and order a Manhattan with Maker ’ 3 Mark. As you are waiting for the waiter to return with your drink , you hear a soft tone ring out from the front of the room and looking TL— around you, see a horn player warming up near the other side of the room. The player holds the note for a few moments. Apparently , the entertainment for the evening is a brass band. The horn player is on the other side of the room. Most ITD and ILD cues are unreliable in this case because of reflections in the room. Your auditory system encounters many copies of the same sound. This can lead to a sense that the sound is not localized in one compact point in space, but instead seems spread out — a perception of ”auditory source width” (ASW). ASW is related to the interaural waveform coherence, 7w, which is a physical quantity that describes how similar the signals arriving in your left and right ears are. Sounds with high coherence (near 1) generally seem compact, whereas sounds with lesser coherence tend to sound spread out in space. The reverberant characteristics of rooms can affect the coherence of sounds, as the room does to the horn sound in the example above. The way in which rooms do this is explored as part of Chapter 4 of this thesis. How then were you able to localize the sound of the horn player at the other side of the room at all? The sound traveling directly from the sound source to your ears (the direct sound) is usually what your auditory system listens to for localiza- tion purposes. The direct sound follows the shortest route from the sound source to your ears, and so it arrives first. You also register reflections of the direct sound from the walls, floor, ceiling, and other surfaces within the room, and these come later in time. Reflections are discounted by the auditory for purposes of sound localization. Most of the time, you will not be confused about the location of the sound source because our auditory system suppresses these reflections and pays attention to only the first instance of the sound it hears — the direct sound. This phenomenon is called the ”precedence effect” or the ’law of the first wavefront.” The horn player stops warming up his instrument , and as he does , you become aware of a noise you had not noticed before. It is distant, but you are able to localize it far to your left. You eventually realize that it is the sound of water running out of a faucet from the sink at the bar. You didn’t hear the faucet running until the horn stopped playing, which means you didn’t hear its onset. Thus, the precedence effect cannot help you. How then were you able to localize the sound of the water at all? Because the sound was broadband (had many frequency components), you were able to still make use of ITD cues in the envelope of the sound. This relies on detection of the interaural envelope coherence, 7e — the coherence between the envelopes of the signals arriving in your left and right ears. The envelope coherence is important if one is to make use of ITD cues above about 1.5 kHz. The relationship between envelope and waveform coherence is the subject of Chapter 3. The bartender turns off the faucet, and you then notice yet another sound you had not picked up on before. It’s some sort of continuous whine, like from a failing old computer monitor or a squeaky motor. It seems to be coming from someplace off to your left. You decide to try to locate it, but as you move off to your left, you become uncertain that the sound really was coming from that direction. Listening again, it seems off to your right now, and maybe somewhere above you. Turning your head in that direction and listening as carefully as you can, you find that you still can’t quite pinpoint the source of the noise. Why can’t you localize the whining sound when so far you've been able to lo- calize every other source by some means or another? You did not hear the onset of the sound, so the precedence effect can’t help you. The source is distant, so reflec- tions due to the room are prevalent and ITD and ILD cues fail. Also, the source is spectrally sparse (made up of few frequency components), and so envelope cues fail as well. From across the room, you hear your name called by a voice you recognize. Turning until you believe you are facing the direction of the voice, you head to a small group of people. Your friend is among them, a fact that you become more certain of as you approach. Your friend welcomes you to the event and introduces you to a financial analyst, an eccentric artist, and a famous opera singer, all at least as swanky as yourself. As you greet them, you take note of the increasing number of people filling the lobby. You enter into an entertaining conversation with this interesting bunch of people. The noise level in the room is increasing, as more people arrive. A few minutes later, the brass band starts playing. You find that you have no trouble maintaining conversation with your companions, though some of the background sounds you could pick out before you can no longer hear. Eventually, you come to realize that the ambient noise is louder than the voices of the people with whom you are speaking, and yet you are still able to understand their speech, and they seem to be able to understand yours. Why do you have no trouble with understanding the conversation even though the brass band is louder? When the brass band started playing, it became harder to hear certain sounds in the room, such as the clinking of glasses and the ongoing series of cellular phone ring tones. The task of understanding the speech around you became a bit more difficult, but was not much more difficult than it was before the band started playing, though no one started talking any louder than they had been. Ambient speech and noise can both distract you from the person you are trying to pay attention to (the ”target talker”), but they do so in different ways. In order for ambient noise to interfere with your ability to hear and understand speech, it first must spectrally overlap the speech. That means that the noise must have power in the same part of the frequency spectrum as the speech. For instance, a noise that only has power in very low frequencies, say between 10 and 100 Hz, is unlikely to interfere with speech because the power in most male speech is above 100 Hz. If instead the noise was broadband and contained power in all the fre- quencies between, for instance, 100 and 2000 Hz, then its spectrum would overlap that of either male or female speech. In that case, the noise could make the speech impossible to hear. This is known as ”energetic masking.” You become engrossed in a conversation with the financial analyst. As you are trying to figure out exactly what it is a financial analyst does , you suddenly become aware of the raised voices of the opera singer and the artist. They are nearly face-to-face, talking at the same time , gesticulating wildly. You try to figure out what they are saying, but have no luck. Eventually, they tire of their argument . The artist turns to your friend to complain about the narrow-minded opera singer, and the opera singer turns to the financial analyst to similarly complain about the idiosyncratic artist. In turn, you are able to listen to the ranting of either the opera singer or artist and are just getting the notion that the argument was about the purity of various art forms and their value in society. Something similar is happening here to the case of energetic masking because there is likely spectral overlap between the speech of the two, especially if they are of the same gender. If you try to pay attention to just one of them, the task is very difficult. There is something else happening here that is much different, however. While broadband noise will make speech more difficult to understand, these two people taking at the same time makes it very difficult to understand anything at all of what either one of them are saying. In fact, if we wanted to make the noise completely mask the speech (such that you could not understand what was being said), it would have to be quite a lot louder than the speech. The artist and opera singer, however, can talk at the same level and you won’t be able to understand either one of them. This is an example of speech-on-speech masking, of which ”informational masking” is a part. The speech of the artist player does an effective job of masking the speech of the opera singer because the masker (the distracting speech) contains information. In fact, it doesn’t even have to be information that is meaningful to you; if one of the two talkers were speaking in a language you didn't understand, the task would be nearly as difficult. The effects of speech-on-speech masking is the basis for the studies presented in Chapter 1 of this thesis. Just as the opera singer and artist are cooling down from their heated debate , A voice from far behind you begins speaking. It is the curator, welcoming everyone and beginning his speech. You are clearly able to identify that the sound was coming from directly in back of you. You turn around and politely listen. How did you know that the sound was coming from directly behind you? For a source located directly in back of you, the expected ILD and ITD are both zero. This is the same for a source in front of you, or directly above you, or in any way an equal distance from both of your ears. How can you tell the difference between them? Such sound sources lie in an imaginary plane that bisects your body, sepa- rating left from right, called the median sagittal plane (MSP). Such sound sources are localizable thanks to the behavior of your pinnae (the part of your ear that sticks out from the side of your face). Your pinnae are asymmetric and rather oddly shaped. Their odd shape allows them to respond differently to sounds based on the location of the sound in the MSP. Sounds located directly in front of you are fil- tered by the pirmae differently that sounds coming from directly behind you. Pin- nae are thought to be especially effective when sounds have power in frequency components above about 6 kHz. MSP localization is also facilitated by various anatomical differences between the left and right sides of your body such as asym- metries in the shape of your head. Thus, you benefit from a third type of cue: a spectral cue. The Cocktail Party Effect In the narrative described above, many people are talking simultaneously, but you are able to carry a conversation with one individual. Even if the sound level of the surrounding din of ”distracting speech” is significantly higher than that of the person with whom you are conversing, you have no problem understanding what they are saying (unless the noise becomes very loud or the crowd becomes tightly packed). Your ability to do this is often called the ”cocktail party effect.” The cocktail party effect depends on the ability of a listener to determine where a sound is coming from (localization) and to separate that sound source perceptu- ally from any other distracting sources sources of speech (segregation). There are other ways for the cocktail party effect to take place as well, all of which depend on some fundamental difference between the distracting and target speech. Spec- tral, and bandwidth differences may all contribute to the cocktail party effect. The cocktail party effect is manifested even when the differences between the voices we are trying to segregate are due to voice quality, pitch, timbre, and even the logical syntax of the speech (e.g. we can segregate talkers based on what they are talking about). When the opera singer and the artist were talking face—to-face, you became unable to separate their voices by any of these means, and so your ability to un- derstand what they were saying diminished. When their voices were separated in space, even when they were both speaking at the same time, you were able to pay attention'to one or the other because you could localize their voices in space, filtering out sounds localized elsewhere. Acoustical Measurements So far, many aspects of binaural hearing have been discussed in this introductory chapter. Quantities such as coherence, ILD, and ITD can all be measured by per- forming calculations on signals recorded at the ears. If we are to understand the ways in which rooms and our anatomy affect sounds, we indeed must take such measurements. Our auditory anatomy analyzes sounds by spectrally breaking the sound up into approximately 1 / 3-octave bands, which are fed independently to a central processor. We wish to imitate this method of analysis. The most efficient method of completely characterizing the acoustics of a room is with a broadband noise that has a flat frequency spectrum (i.e. a white noise). One particular class of noises that fit this description are called ”maximum length sequences” (MLS). MLS have a perfectly flat frequency spectrum, and we can eas— ily create different MLS that are independent of one another. MLS also have sev- eral other properties that are desirable in the course of making acoustical measure- ments. Details on the generation and properties of MLS, as well as a comparison of MLS to other types of noise is the subject of Chapter 2 of this thesis. Organization This thesis is organized into four chapters. Chapter 1 deals with a perceptual ex- periment involving speech-on—speech masking when both target and distracting speech is located in the median sagittal plane. Chapter 2 treats the issue of acousti- cal measurements with MLS by qualitatively comparing the performance of MLS and random telegraph noise (RTN) in making room and binaural measurements. Chapter 3 moves on to physical measurements of interaural waveform and enve- lope coherences, and their relationship in rooms. In Chapter 4, measurements of ITD, ILD, and interaural waveform coherence are taken in rooms alongside of the reverberant characteristics of the rooms, and the ways in which rooms affect those binaural parameters is studied. 10 CHAPTER 1 Release From Informational Masking in the Front-Back Dimension 1.1 Introduction In a room full of talking people, one can focus on and converse effectively with a particular person, despite the large volume of speech going on elsewhere. The ef- fectiveness of this ”cocktail party effect” depends on a number of factors, including the differences between the various talkers’ voices (such as pitch and. loudness), the content of the messages being spoken, and the talkers’ locations relative to the listener [21, 26, 72, 97]. When several talkers are near to each other, the speech of any one of them can be hard to understand, but the intelligibility of a single target talker improves if the distracting talkers are separated in space from the target. Thus, spatial separation gives rise to a release from masking of the target speech [26]. Freyman and colleagues achieved a release from masking by creating a shift in the perceived location of the distracting speech [41, 38, 39]. As a baseline condition, a single loudspeaker directly in front of the listener presented the speech of multi- 11 ple talkers. The listener’s task was to attend to a single target talker and to ignore all others, which were the distracters. A second loudspeaker was placed an equal distance from the listener’s head, but to the listener’s right. This loudspeaker pro- duced a copy of the distracters, shifted forward in time by 4 ms so as to lead the distracters in the front loudspeaker. Although the addition of this second loud- speaker increased the overall level of the distracters, the intelligibility of the target speech was greatly improved. This experimental design shall be referred to as an added-delayed-distracter (ADD) experiment. Freyman et al. [41, 39] also showed that when the distracting speech was re- placed by distracting noise (i.e. by an energetic masker), no such release from masking occurred. This result was interpreted as a fundamental difference be- tween energetic masking, usually thought of as noise of some sort that overwhelms or suppresses the target, and informational masking, usually thought to cause con- fusion between target and masker [61, 64, 92]. A delay in which the added distracters lead the distracters presented with the target is defined here as a positive delay. A delay in which the added distracters lag will be defined as negative. Freyman et al. [41] found a significant release from masking in an ADD experiment conducted with both positive and negative de- lays, though with a negative delay (——4 ms) the auditory image of the distracters was near the target, in front. A small perceptual difference in the locations of the target and distracter speech, along with a relatively diffuse auditory image of the distracters due to interaural disparities in the two-loudspeaker presentation ap- parently provided an adequate binaural basis for differentiating the target talker from the distracters [41, 23]. In studies such as those by Brungart et al. [23] and Rakerd et al. [76], the ADD experiment of Freyman and colleagues was thought of as a model for a single re- flection of distracting speech in a room. This interpretation led to experiments 12 in which the delay was varied over a wider range of positive and negative values than had been previously explored. This study revealed a release from masking for all positive and negative delays within the 50—ms region delimited by Haas [46], outside of which Haas reported that delayed speech was perceived as an echo. For delays in the region of echoes, the masking release disappeared. A similar experi— ment employing speech-shaped noises as maskers showed release from energetic masking only for short delays between plus and minus 2 ms, and even then with considerably reduced release. The range and degree of informational and energetic masking release found in the ADD experiment by Rakerd et al. [76] are depicted in a general way in Figure 1.1. These results supported the contention that the release from masking in the speech distracter condition is different from a release from energetic masking. They also showed that the masking release found in the ADD paradigm cannot be explained only as a relocation of the distracter caused by the precedence effect [46, 66, 91] because substantial release occurs for a large range of negative values of the delay. For large negative delays, the precedence effect would perceptually shift the distracters to the physical location of the target, providing no perceived spatial separation between the two. It has been suggested that spatial effects other than spatial separation or timbre cues may lead to a release from masking for negative delays [23, 6]. Thus far, the release from masking in ADD experiments has been explained in terms of binaural cues and delay-and-add filtering [2]. In the present study, sound sources are moved to the median sagittal plane, where localization cues are mainly spectral in nature and binaural cues are minimized [15]. The present study investigated the ADD paradigm for a front-back geometry where special steps were taken to minimize binaural cues. 13 1 Release (dB) '/ . // : Delay (ms) Figure 1.1. An overview of the extent to which release from informational and en- ergetic masking was found to occur in the horizontal plane across a wide range of delays by Rakerd et al. [76]. The release from informational masking is ap- proximately 10 dB in magnitude throughout the range of delays from —32 ms to +32 ms. The magnitude of the release from energetic masking is only about one- third as strong (approximately 4 dB) and is limited to a range of delays from —2 ms to +2 ms. 14 1.2 Experiment 1: Front-Back Presentation with Speech Distracters Experiment 1 was designed to measure release from masking in an ADD exper- iment where the sound sources were directly in front or in back of the listener. The goal was to study release from informational masking when binaural cues are reduced as far as possible in free field. Directly front and back are two locations that can be discriminated by listeners because of spectral differences [15]. These locations are rarely confused by normal hearing listeners in broadband localiza- tion experiments [93]. In what follows, the conditions of the front-back geometry will sometimes be referred to as median sagittal plane (MSP) conditions to empha- size the unimportance of interaural differences. Methods used here were similar to those used in the previous study of masking release by Rakerd et al. [76]. 1.2.1 Listeners Four listeners, three male (listeners B, N, and S) and one female (listener K), partic- ipated in the experiment. Listeners K, N, and S were in their mid-twenties; listener B was 52. All four listeners had normal hearing (pure tone thresholds S 15 dB HL at 0.5, 1, 2, and 4 kHz). Listener N is the author of this dissertation. 1.2.2 Anechoic room and experimental layout Testing took place in an anechoic chamber, 3.0 m wide by 4.3 m long by 2.4 m high (IAC 107840). A listener was seated near the center of the chamber in a special chair, described below. One loudspeaker was placed directly in front of the lis- tener, at ear height, 1.5 meters from the center of the listener's head. Another loudspeaker was placed directly behind, also at ear height and 1.5 meters from the 15 - A > : + 1.5 m A 1.5 m . .\ , I a --------------- as») ---------------- h Target 8. Distracters \C/ Distracters-Only Figure 1.2. Arrangement of speakers relative to a listener for Experiment 1. The delay between speakers is represented by T, positive if the distracters from the back loudspeaker lead those from the front and negative if the distracters from the front loudspeaker lead those from the back. head. This layout is shown in Figure 1.2. It is referred to here as the Front-Back geometry. 1.2.3 Listener’s chair Rigorous measures were taken to prevent head motion and to ensure that each loudspeaker was equally distant from the listeners’ ears. A wooden bite bar, 53 centimeters long, was attached to the chair, running parallel to the back of the chair. This bar was given a dark center line around its circumference at the center of its length and aligned approximately with the center of the chair. To ensure a constant alignment of the head, listeners were instructed to bite lightly on the bar and to maintain contact throughout the test facing the front loudspeaker. Prior to the test, listeners aligned the center line of their top incisors with the center line drawn on the bar using a small hand mirror. Given this procedure, it is reason- able to expect that each listener’s ears were symmetrically located with respect to the loudspeakers with a deviation less than 5°. Simple calculations based on the geometry of the setup relate this angle to a corresponding interaural time differ- ence (ITD) of less than 65 us. 16 1.2.4 Loudspeaker alignment The loudspeaker azimuths were aligned individually with the goal of minimiz- ing interaural differences. Two small microphones were attached to the bite bar, one at each end. A sine tone was presented from the loudspeaker to be aligned. The outputs of the two microphones were observed simultaneously on a dual- channel oscillosc0pe outside the anechoic room, and the loudspeaker position was adjusted until the oscilloscope traces showed two sine tones with the same phase. A low frequency was used initially, and then successively higher frequencies were used for finer adjustments. After the final adjustment with a tone of 10 kHz, the expected maximum error in angle was less than 05°, assuming the phases of the tones could be aligned to within an eighth of a period. This error in angle corre- sponds to a maximum difference in arrival time of the sound between the left and right ends of the bar of less than 13 us. The loudspeaker distances were determined with a tape measure. This method had an expected error of less than 1.5 cm, corresponding to a difference in arrival time to the ears between the front and back loudspeakers of less than 44 us. In a delay-and-add filter, this delay corresponds to a dip in the spectrum at over 11 kHz - well above the frequencies used in this experiment. A test of the perceptual ef- fects of misalignment appears in Appendix A. This test indicated that the exper- iments adequately suppressed binaural difference cues, barring those inherent in the asymmetries of the listeners’ heads. 1.2.5 Stimuli The stimuli used for both targets and distracters were sentences taken from the Co- ordinate Response Measure (CRM) corpus [19]. For this experiment, each stimulus consisted of three female voices (the target and two distracters) issuing commands 17 {Talker}: “Ready {call sign} go to {color} {number} now.” Target “Ready Laker go to ’ Blue ‘ ’ One ‘ now.” Distracter l Hopper 4 Red > 4 Two > ” _ “Ready Ringo 3° to White Three now. Distracter 2 Charlie kGreen/ Four Figure 1.3. A schematic of the full set of CRM sentences that were possible in this study. The listener is instructed to listen for ”Laker.” The target talker always speaks this call sign. that follow the format ”Ready {call sign}, go to {color} {number} now. ” A chart of call signs, colors, and numbers allowed in these experiments is given in Figure 1.3. The target talker always used the call sign ”Laker.” The voices of the three talkers were randomly chosen from among the four female voices available in the CRM. In any given stimulus, no two talkers shared any of the attributes of call sign, color, number, or individual female voice. With four colors and four numbers, the chance of guessing correctly becomes 1 in 16, or approximately 6 °/o. 1.2.6 The Task Listeners were instructed to listen for the call sign ”Laker” on each trial and to determine the color/ number combination in the associated sentence. A stimu- lus with the ”Laker” call sign was presented from the front on every trial. An LCD computer monitor was mounted below the front loudspeaker. To help min- imize any head motion, the listeners used a wireless gyroscopic mouse to control a pointer on the monitor without the need for a mouse pad or other surface. The listeners responded to each stimulus by clicking on the appropriately numbered button within the field of the appropriate color on the monitor screen. A response 18 was considered correct only if the selected number and the color were both correct. A single run consisted of five practice trials without feedback followed by 30 test trials. The listeners went through three runs for each experimental condition, with the order of runs randomized differently for each subject. 1.2.7 Front-Only baselines In a condition referred to here as the Front-Only baseline (FO), the target and the distracters were presented exclusively from the front loudspeaker, with the level of each talker’s speech fixed at 65 dB SPL. Thus, the level of the target was 0 dB relative to the level of each distracter. For a second reference point, the Front-Only baseline experiment was repeated with the level of the target talker raised to +4 dB relative to the FO condition. This +4 dB signal-to-noise ratio (SNR) reference (the FO+4 dB condition) provided a way to express performance changes using a dB scale. As this experiment was designed to search for an effect over a wide range of de- lays, it was desired that all listeners start with similar baseline performance, based on the expectation that similar performance would lead to comparably useful re— sults over a wide range of delays for all listeners. In a pilot test, it was found that three of the four listeners performed the baseline test near 30 % correct, which was well above chance (6 %). Listener K also performed above chance, but less well than the other listeners (~ 15 % correct). The difference in baseline performance was eliminated by increasing the target level by 2 dB for listener K. Data for lis- tener K below reflect this 2 dB increase. For listener K, the condition where the target is at 67 dB and each distracter is at 65 dB is the ”0 dB SNR” condition, and other values of SNR will be referenced to this baseline. For instance, the +4 dB condition corresponds to a target at a level of 71 dB and each distracter at 65 dB for listener K. Thus, values of SNR will be reported relative to the initial level of the 19 target, which is 65 dB for listeners B, N, and S; and 67 dB for listener K. 1.2.8 Front-Back ADD experiment For Front-Back ADD tests, the target and the distracters were presented from the front loudspeaker as for the Front-Only baseline tests. In addition, the distracters were presented from the back loudspeaker using the same level as in front (65 dB). There was a delay, T, between the front and back distracters. The delay was varied over a wide range over the course of the experiment. The set of delays employed was: 1' = {i32, :l:8, :l:2, i0.5, 0} ms. Positive delays indicate that the back loud- speaker (distracters only) led the front. Negative delays indicate that it lagged. Zero delay corresponds to synchrony in the presentation of the distracters from the two loudspeakers. 1.2.9 Results and Analysis Figure 1.4 shows the results of the Front-Back experiment for each listener, with the average across listeners given in the bottom panel. For the individual subjects, percent correct scores, averaged over three runs of 30 trials each, are given as a function of the delay, 1. Error bars represent the standard deviation over runs and are one standard deviation wide in each direction. A diagonally-hatched rectan- gular stripe near the bottom of each panel shows the subject’s average results for the Front-Only (single speaker) geometry, where the level of the target equals the level of each distracter voice (0 dB SNR). The width of the stripe represents a 95% confidence interval about the average baseline score. Another stripe (drawn with vertical hatch lines) is given for the Front—Only test in which SNR was set to +4 dB (see right-hand axis of the figure). The bottom panel of Figure 1.4 shows percent correct scores averaged across the four subjects. In that panel, error bars and the 20 95% confidence intervals for Front-Only tests are based on the standard deviation over subjects. Analysis of variance (ANOVA) showed that Front-Back scores for all r, —32 g r S +32 ms, were significantly greater than the Front-Only baseline score (p < 0.05) except for r = +8 ms, where the score approached, but did not reach, significance (p = 0.08). Thus, there was strong evidence of release from masking in the Front-Back geometry for delays spanning the full range of delays tested. For each listener, the measured difference in percent correct scores between the O-dB and 4-dB Front-Only reference conditions corresponded to a release of 4 dB (Section C). These benchmark values were used to estimate the amount of un- masking in decibels on Front-Back tests by linear interpolation or extrapolation, as shown on the right-hand axes on Figure 1.4. The delay times of r = {+2, —2} ms showed the greatest release. Listeners displayed, on average, a release of 3.5 dB for T = 2 ms, and a release of about 2.5 dB for T = —2 ms. Measurable release of at least 1 dB extended out to T = 21:32 ms. These results agree with the experiments by Rakerd et al. [76] for sources in the horizontal plane, which found masking release for delays as long as 32 ms, but not for 64 ms. Figure 1.4 shows that the release in the front—back ADD experiment is smaller at 32 ms than at shorter delays. Therefore, it seems reasonable to conclude that about the same range of delays elicits a release from masking in both the hori- zontal and the median sagittal planes. The 50-ms speech echo boundary found by Haas [46], which Rakerd et al. held responsible for the loss of masking release for long delays, is apparently insensitive to the source geometry. However, at every value of the delay, the average release from masking is smaller in the MSP than in the horizontal plane. In the horizontal plane with two distracters, a maximum release of 11 dB was found, compared to about 4 dB in the present experiment in the MSP. 21 100 - l - 75W! Hmmm: II“ "lnlli1!lt’l_:+4 cB ’ _'+2 CB 50 '- .I// Hi; {Ix I/l/,I . Uri/l, ' 'l,”.//I,I’.I(,"'/”/:,/ // o a .7 / / 25 _ , 77W 777 7777777777 ///777. - ' Distracters-Only Distracters-Only -2 (B 0 - Log T Lead « t 4. t t t t i t 100 - I ~ 75 __ IIIIHIIIIIIHIIHII Hllllllll I Ililllllllllllllllll L+4 a 50 ~ -+2 <8 25 b .0 (B 0 - “-2 a § 100 " I ‘ g 75- '0- 50 - ,,, . s b ////II/./I/; ////// ll/L/ I'll/U ’ll/i/ll /// [IL/WIN //////// /// ‘0 a g 25: 1-2 :8 n. 0 _ T - 100 t. l _ 75: HI..H‘.+4 0.716 in a two-sample t-test of zero difference). It is reasonable to conjecture that the release seen in Experiment 2 at T = O and the release seen at :l:0.5 ms occurred for different reasons. It is possible that the release from energetic masking at T = O is a localization effect, where the noise maskers are localized separately from, or are simply more diffuse than the target speech because they are identically presented from different locations. The release from energetic masking at T = {—0.5,0.5} ms could be attributed to delay-and- add (comb) filtering [47]. This may be the origin of some of the masking release with speech distracters seen in Experiment 1 as well. Because of delay-and-add filtering, the masking noise has a broad spectral val- ley from O to 2 kHz, centered at 1 kHz. In an informal ADD experiment where listeners adjusted the delay of noise maskers, a delay near 0.5 ms was found to be particularly effective in unmasking target female speech. The conjecture that dif- ferent mechanisms lead to release for the different delays might explain why all lis- teners exhibited a release at T = {—O.5, 0.5} ms, but, unlike the other listeners, lis- tener K failed to show a release at T = O . According to the conjecture, all listeners were able to utilize the spectral mechanism that leads to a release at T = :l:0.5 ms, but listener K failed to make use of the localization mechanism that other listeners used to achieve a release from masking at T = 0. Implications of these results are further addressed in section B of the Summary for this chapter. 28 1.4 Experiment 3: Front-Front Presentation with Speech Distracters Interaural differences were minimized in Experiment 1, which means that the masking release for speech distracters found there was likely due to spectral ef- fects. These spectral effects may have had either or both of two origins. One pos- sibility is that they arose from the spatial nature of the Front-Back layout. The head-related transfer functions (HRTFs) for sources in front and back are quite different [16], and listeners may have gained some localization information from their HRTFs. The localization information in turn may have substantially medi- ated speech masking release. The other possibility is that spectral effects may have arisen from delay-and- add filtering. Experiment 3 was conducted to separate out the contributions of HRTFs and delay-and-add filtering. To do this, a new layout was established, re- ferred to here as the ”Front-Front” geometry, which deprived the listeners of any front-back spatial cues but retained spectral cues caused by delay-and-add filter- ing. 1.4.1 Experimental design The layout in Experiment 3 was the same as in Experiment 1, except that the back loudspeaker was placed on top of the front loudspeaker so as to make them nearly collocated. This arrangement, the Front-Front geometry (FF), is depicted in Fig— ure 1.7. The loudspeaker alignment, method of data collection, and the target and masking stimuli were the same as in Experiment 1. The distance between the centers of the two speakers was 8.9 cm, which corre- sponded to a difference in vertical angle of 33°. Therefore, the arrival time discrep- ancy between the the signals from the two speakers was less than 10 us, producing 29 Target 8. Distracters A .fl------;‘;§.m--- 3 Distracters-Only v Figure 1.7. Arrangement of speakers relative to a listener for Experiment 3. The delay between speakers is represented by T. a negligible effect on the delay-and-add filtering. The four listeners were the same as for Experiment 1. The task, stimulus set, and set of delays remained the same as well. The Front-Only test was repeated here to provide a measurement of baseline performance contemporaneous with the Front-Front ADD test. 1.4.2 Results and discussion Figure 1.8 shows the results for the Front-Front ADD test. This figure is in every way parallel to Figure 1.4 for the Front-Back ADD test. The shape of the functions in Figure 1.8 was remarkably similar across the four listeners. All listeners exhibited substantial release from masking for the delays of T = {+2, —2} ms. This was most dramatically demonstrated by listener N, who showed nearly 6 dB of release for T = 2 ms. The smallest release seen at plus or minus 2 ms was for listener K who showed a release of 2.5 dB at T = —2 ms. No listener showed any evidence of substantial release for any other delay. Another point of agreement among listeners is that they all performed worse than baseline for a delay of T = 0 . In that condition, the distracter presentations perfectly coincided at the leading and lagging speakers and were therefore 6 dB more intense than at baseline. The average results in the bottom panel of Figure 1.8 clearly show significant 30 Percent Correct Figure 1.8. Individual and average results of Experiment 3, the Front-Front exper- iment with speech distracters, similar in form to Figure 1.4, as a plot of percent correct versus delay, T. I I I I I I I I P- I -1 r B < MIMI, illlilflllEH. 1mm :15 311mm: +4 cB _ _‘+2 <13 // I77 .///.I’//, ‘0 (B r «- *Distrocters-Only Distracters-Only ‘-2 (B *- Log and - I I I I I I I I I r I I T I I I _ llllllllfllllllliillllll; :llii|:l:|l.lli m'mlnn. :+4 a ‘+4 :8 +2cB .0 a “-2 G +4£ ‘+2 :8 31 release from masking at T = :l:2 ms and no release at any other delay value. The peak in performance for T = 2 ms corresponds to roughly a 3.5 dB release in mask- ing, while the peak for T = —2 ms corresponds to a release of nearly 3.0 dB. These two decibel values are essentially the same, indicating symmetry about 1' = O . 1.4.3 Delay-and-add filtering The improved performance for T = i2 ms in Experiment 3 is very likely due to delay-and-add filtering of the distracters in this experimental setup. The transfer function of a delay-and-add filter with a delay of 2 ms is shown in Figure 1.9, su- perimposed on the average spectrum of a typical female voice from the CRM. For a delay of 2 ms, peaks occur at integer multiples of 500 Hz, and dips occur at 250 Hz and every additional 500 Hz thereafter. The first dip in the delay-and—add spectrum for a delay of T = i2 ms may be especially important. As shown in Figure 1.9, this dip is close to the average fun- damental frequency of female voices, and so would remove energy from the fun- damental frequency of distracters. It may be that filtering in this way introduced timbre differences that helped the listeners distinguish between the target talker and the distracters, leading to a release from masking of the target. The release seen at 2 ms in Experiment 3 (no spatial cues) was as large as the release at 2 ms in Experiment 1 (front-back spatial cues). It seems likely that the peaks at i2 ms seen in Experiment 1 were the result of the delay-and-add filtering effect, as made evi- dent in Experiment 3, but spatial cues may also have been present in Experiment 1. By contrast, no release was seen at T = :l:2 ms with noise maskers (Experiment 2) though presumably delay-and-add filtering had a similar effect on the spectrum of the maskers in that experiment. Delay-and-add filtering of the noise maskers removes power from certain parts of the spectrum but adds power to other parts of 32 the spectrum. Therefore, delay-and-add filtering is not expected to lead to a release from energetic masking unless the delay is strategically selected. Unmasking the fundamental frequency of the target also does not lead to better intelligibility of the target speech, since the relevant information in speech is not contained in the fundamental formant. Support for the notion that a dip at the fundamental frequency caused by delay—and-add filtering contributes to a release from masking by speech is found in the results of Brungart et al. [22]. In that study, a similar experiment to the present one was performed in a virtual auditory environment with a single male speech distracter and a target taken from the CRM corpus (this was referred to as the F- F condition). Brungart et al. found a peak in release from masking for a delay of 4 ms, distinct from no release at 2 and 16 ms. This peak is similar to the peak seen in the present experiment for 2 ms, distinct from no release seen for 0.5 and 8 ms. A delay-and-add filter with a delay of 4 ms has a first dip at 125 Hz, close to the expected fundamental frequency for a male talker, as used by Brungart et al. By comparison, a delay of 2 ms places the first dip at 250 Hz, near the expected funda- mental frequency for female talkers, as used in the present experiments. Brungart et al. also reported release at delays of O, 0.25, 0.5, and 1 ms of magnitudes similar to that of the release at 4 ms, whereas no release for O or 0.5 ms was found in the present experiment. 1.5 Experiment 4: Spectral Structure - A Follow-Up to Experiments 1 and 3 The results of Experiment 3 strongly suggest delay-and-add filtering as an explana- tion for the release from masking for a delay of T = 2 ms when speech distracters are used. Experiment 4 tested this preliminary conclusion and tested the conjec- 33 I I I I I I I I I — . -I II 0:- H 1 . II . I ' I" I a . I “ . b ' . ‘ ’\ q 3 -10. .0 . ,' I‘,’ . I | ' l.\ . g _ I. 'v' ' “ ,Ix,‘ ' I 3 ’ “ I, I I“ I, :2: -20 I— . I' l x, ‘ («s ' “' ‘\‘ 1 §’ 5 'v " " 4 'V : 2 I ? -30P - . T=2 ms . - -I ‘40 I l I l I 0 500 1000 1500 2000 Frequency (Hz) Figure 1.9. Solid lines show the theoretical amplitude response of an ideal delay—and-add filter with delay I = 2 ms. Dips occur at 250 Hz and every addi- tional 500 Hz thereafter. Peaks occur at every integer multiple of 500 Hz. Superim- posed in dashed lines is the average amplitude spectrum of one of the four female CRM voices, which is taken from the upper left panel of Figure 1.5. 34 ture that attenuation of the fundamental component of the distracters, near 250 Hz, was responsible for the masking release. 1.5.1 Experimental design A digital filter was designed to ”mimic” the first dip in the amplitude response of a delay-and-add filter with delay T = 2 ms. This was a finite impulse response (FIR) filter of order 336, with 0.04% ripple in the passband. The amplitude re- sponse of this filter is shown in Figure 1.10a. The dip in the filter’s amplitude response was centered at 250 Hz and had a depth of approximately 32 dB. A sec- ond filter was designed with a dip at 750 Hz in order to mimic only the second clip in the delay-and-add filter. This filter (Figure 1.10b) was FIR of order 392, with 0.02% passband ripple, and a dip at 750 Hz of 39 dB. By comparison, the first two spectral dips actually measured for the front-front geometry in the anechoic room (through a free-standing condenser microphone) used for these experiments occurred at 250 Hz and 750 Hz, with depths of 22 dB and 12 dB respectively. The entire CRM stimulus set was processed with each digital filter to create two separate filtered corpora, one with energy removed at 250 Hz, the other with energy removed at 750 Hz. These stimuli were then used individually as the dis- tracters in two separate Front-Only experiments. For each experiment, listeners completed three runs consisting of five practice trials (without feedback) followed immediately by 30 test trials. To simulate the effective target-to-distracter level difference of Experiment 1, wherein both loudspeakers were actively producing distracting speech, the level of the target talker was reduced by 6 dB. The listeners in this experiment were the same as in all previous experiments. 35 I [IIIII'I l llllllll I n O V I Magnitude (dB) I 8 I c» O l I I'IIII I I lJJJll U L O -40 — _. l l l I I I I I l I I I I I I I I I 0 F =. a I 1 .0 '10 - . V I- d 0 'U 3 ~20 :- 1 it : 1 I = I . - -I z -30 P ‘l b -40 — — I l l l l I l l l 0 500 1000 1500 2000 Frequency (Hz) Figure 1.10. The top graph shows the amplitude response of a filter imitating the first dip in a delay-and-add filter with delay 1' = 2 ms. This is a FIR filter of order 336. The bottom graph shows the amplitude response of a filter imitating the sec- ond dip in a delay-and-add filter with delay I = 2 ms. This is a FIR filter of order 392. 36 (a) (b) (c) (d) Listener F-O 0 dB F-O 750 Hz F-O 250 Hz FF B 33 i 3% 24 :l: 8% 60 2t 3% 66 :l: 9% K 29 i 5% 33 j: 7% 51 :1: 8% 64 i 7% N 48 i 5% 47 :L- 0% 64 :l: 7% 83 i 6% s 38 i 2% 20 :l: 0% 59 i 8% 70 :t 6% Avg 37 j: 8% 31 i 11% 57 j: 9% 71 :l: 9% Table 1.1. Experiment 4: Percentage of correct responses (plus or minus one stan- dard deviation) for each listener by condition. (a) Front-only baseline condition. (b) and (c) Simulated delay-and-add filtering with a 2 ms delay. (d) Front-Front condition with 2 ms delay - data taken from Experiment 3. Averages and stan- dard deviations for each listener are calculated across three runs. Averages and standard deviations across listeners are shown in the bottom row. 1.5.2 Results and analysis The results of Experiment 4 are shown in Table 1.1 for each listener and for the average across listeners. ”FF” refers to the results of Experiment 3, speech-on- speech masking, with a delay of r = 2 ms, where the greatest release occurred. ”F-O 250 Hz” refers to the Front-Only test wherein the distracters have a notch at 250 Hz per the first dip in a delay-and-add filter with delay T = 2 ms, and similarly for the condition labeled ”F-O 750 Hz”. The ”F—O 0 dB” condition refers to the baseline condition where target and unfiltered distracters are presented from the front loudspeaker only. When the filter clip was at 250 Hz, all listeners showed an improvement in per- formance over the Front-Only baseline, F-O 0 dB. The average magnitude of the release was 2 dB. When the dip was at 750 Hz, none of the listeners showed any improvement in performance, and listener S performed consistently worse than in the Front-Only baseline condition. _ The release found for distracters with a notch at 250 Hz supports the idea that delay-and-add filtering was responsible for the release from masking demon- strated in Experiment 1. The lack of release found for distracters with a notch at 37 750 Hz suggests that only the first dip in the spectrum of delay-and—add filtering is responsible for this effect. However, since performance in this experiment in no case reached the level found in the Front-Front geometry for delay 1' = 2 ms, this first spectral dip may not be solely responsible for the release in masking shown in Experiment 3. An alternative interpretation is that the 6-dB target reduction used in Experiment 4 may have been an underestimation of the effective SNR ratio in Experiment 3. Experiment 4 suggests that the release from masking seen in Experiment 3 uniquely at delays of :l:2 ms was an energetic effect, the result of eliminating the fundamental component of the distracting speech. A report by Freyman et al. [40] supports this conclusion. Those authors began with a baseline condition wherein target and distracters were both high-pass filtered so as to remove the first few harmonics. When the fundamental was added back to the target, performance irn- proved by about 2 to 3 dB. The experiment by Freyman et al. is quite similar to Experiment 4 in that it demonstrated a release from masking that takes place be- cause of the lack of energy in the fundamental of the distracters compared to that of the target. 1.6 Summary This chapter describes added-delayed distracter (ADD) experiments in a front- back geometry, where care was taken to minimize interaural differences. Exper- iment 1 was the main experiment. It tested the ability of listeners to segregate target speech from distracting speech presented directly ahead of them when an additional, time-shifted copy of the distracting speech was presented behind them. Thus, this experiment continued a paradigm begun by Freyman et al. [41], extend- ing it into the median sagittal plane. This experimental arrangement also simu- 38 lated an acoustical situation with a listener standing in front of a wall conversing with a nearby target talker, and with intense distracting talkers in the distance. With distant distracting talkers, effectively on the midline, the direct sound and the reflection from the wall in back can be comparably intense. Subsequent ex- periments were done in order to gain insight into the results of Experiment 1. Experiment 2 measured energetic masking in a front-back ADD experiment like Experiment 1. Experiments 3 and 4 examined the role of delay-and-add filtering by eliminating the spatial component of the release seen in Experiment 1. A. Speech distracters: Experiment 1 showed that listeners experience a release from masking in an ADD experiment in the MSP for all delays between —32 and +32 ms. The magnitude of release was on the order of 2-4 dB (see Figure 1.4), and the peak release occurred for delays of T = :l:2 ms. Comparison to previous ADD studies in the horizontal plane [76] shows that release occurs over a similar range of delays in both planes, though the magnitude of the release in the MSP is sig- nificantly less than that seen in the horizontal plane (8-11 dB). Greater release in the horizontal plane is not surprising, since it seems likely that the segregation task could only be helped by binaural cues and better-ear acoustical effects, which were minimized in the present MSP experiment. That this release occurs for delays as long as i32 ms is an important result of this experiment. Though small binaural differences due to asymmetries in anatomy and orientation may have contributed to the unmasking seen in this experiment, binaural discrepancies cannot alone ac- count for these results. Further, performance was approximately independent of the sign of the delay. It did not matter which distracter led, either the front (coin- cident with the target) or the back, even for the longest delays. B. Noise maskers: Experiment 2 was identical to Experiment 1 except that con- tinuous noise maskers were used in place of speech distracters. In contrast to Ex- periment 1, the results of Experiment 2 (Figure 1.6) showed a release that occurs 39 for only short delays, '1' = O and T = :l:0.5 ms. The magnitude of EM release and the range of delays over which it occurred were similar to results in the horizontal plane [38, 76]. The results for noise maskers in Experiment 2 were also similar to the results of a previous study with noise using HRTFs and headphones to simulate a MSP geometry [23]. In that study, which used both male and female target voices, a significant release from EM was found for delays of T = {—0.5, +1.0, +2.0} ms. Brungart et al. [23] explained that poor performance with a noise masker in an ADD experiment is expected, since the second presentation of the masker adds noise power to the masker, but that some delays may lead to a release due to delay- and-add filtering. Short delays in a delay-and-add filter lead to broad spectral valleys through which a listener may perceive an unfiltered target. The release from energetic masking can be understood from some combination of several effects. One effect is delay-and-add filtering. A delay of T = :l:0.5 ms in the masking noise leads to a broad valley centered on 1000 Hz, giving the listener improved access to an important spectral region for the target speech. This is a plausible explanation for the release demonstrated by all listeners in Experiment 2 at T = :l:0.5 ms. However, at T = 0 , no delay-and-add filtering occurs, and the release seen in Experiment 2 at this delay must have some other explanation. A second possibility is that release from energetic masking at T = 0 ms, or for the entire range, [TI 3 0.5 ms, is a localization effect, caused by summing localiza- tion. Summing localization occurs for delays less than 1 ms, and is known to occur in the MSP [67]. Summing localization may shift the perceived location of the maskers away from the target, probably making the maskers more diffuse, leaving only the target with a clear localization. This effect may also account for some of the overall“ release from masking at small delays seen in the front-back ADD con- ditions (Figures 1.4 and 1.6). It is an effect that is absent in the front-front ADD 4O experiment (Figure 1.8). The latter possibility involving localization is not initially expected. As applied to Experiment 2, that possibility requires that summing localization produces a larger release from energetic masking than is produced by the law of the first wavefront. That law is a part of the localization precedence effect. It says that the location of the leading source dominates. The problem is that the law of the first wavefront for broadband noise, for instance at a delay of 4 ms, is very strong. However, other explanations for the observed release from energetic masking at T = 0 are difficult to find. One possibility is that spectral and phase differences in HRTFs for sources in front versus those in back of the listener may result in some cancellation. C. The role of delay-and-add filtering: Experiments 3 and 4 examined the role of delay-and-add filtering in the results of Experiment 1 by removing the spatial aspect of the ADD experiment. In Experiment 3, the back and front loudspeakers from Experiment 1 were placed together in front so that the target, distracters, and added distracters were collocated. Otherwise, Experiment 3 was identical to Ex- periment 1. In this way, the effect of delay-and-add filtering was separated from spatial effects. The results (Figure 1.8) show a release only for delays of T = :l:2 ms, matching the delays at which peak release occurred in Experiment 1. Experiment 4 used digital signal processing techniques to show that the release for T = :l:2 ms likely occurs because the first notch in the delay-and-add filter occurs near the fundamental frequency of the distracting speech. Severely attenuating their fun- damentals cause the distracters to be distinguishable from the target speech by their unusual timbre. D. Symmetry: The results of all the experiments showed a notable symmetry about the zero-delay condition. This symmetry speaks well of the quality of the align- ment of the front and back loudspeakers and the listener in the geometry of this 41 experiment. In Experiment 1, with a speech target and speech distracters, the av- erage data shown in Figure 1.4 were approximately symmetrical for all delays, 0.5 to 32 ms. Symmetry between positive and negative delays is not expected given a model in which localization by the precedence effect is the dominant effect. How- ever, for every delay showing appreciable release, the release was slightly stronger when the source in back led the source in front (positive delay). This small asym- metry might be attributed to the localization precedence effect. E. Implications 1. Previously, it was shown that a release from informational masking occurs in ADD experiments in the horizontal plane [23, 41, 38, 40, 76], and binaural cues were held primarily responsible for this masking release. It has been shown now that such a release occurs as well in the MSP when binaural cues are minimized to the extent that they are expected to be unimportant. When distracters are pre- sented from in front and in back the greatest release from masking occurs for a de- lay of i2 ms. Experiments with distracters only in front show that this peak owes much of its importance to delay-and-add filtering. Apart from that, the front-back experiment shows release from masking for a wide range of delays, at least out to 32 ms, and this release is likely caused by the ability of listeners to localize the distracters (or to delocalize them) using the localization cues that are available in the MSP, namely spectral cues. Localization in the MSP is weaker than in the hor- izontal plane, and thus it was not surprising to find that the release that can be achieved in an ADD condition is smaller in the MSP than in the horizontal plane. The presence of the ADD effect out to long delays, both positive and negative, re- veals a behavior similar to that noticed in the horizontal plane [76] and suggests that a similar general mechanism is at work to achieve a release from informational masking in both cases. 2. In many previous studies, the precedence effect [67, 66] has been cited as the 42 main mechanism by which release from informational masking is achieved in ADD experiments. By moving the perceived location of the distracters away from the target (i.e. for positive delays) localization of the distracters separately from the target allows for the perceptual segregation of the two. However, for large negative delays, the precedence effect will place the perceived location of the dis- tracters very near the target. By the action of the precedence effect on localization alone, this should result in no benefit to the listener trying to hear out the target speech, and yet this study, as well as studies in the horizontal plane by Brungart et al. [23] and Rakerd et al. [76], show a release from informational masking for an equally large range of negative delays. Furthermore, in both the horizontal plane and the MSP, the magnitude of the release is similar for both positive and nega- tive delays. Although the localization precedence effect is undoubtedly at work in ADD experiments, it appears to be only one of several contributors to the ADD effect. 43 CHAPTER 2 The MLS Method and Applications to Binaural Measurements in Rooms 2.1 Introduction In order to make measurements of binaural quantities such as interaural time dif- ference (ITD), interaural level delay (ILD), and interaural coherence ('y) in rooms, an accurate, reliable, and repeatable method of measurement must be utilized. The use of maximum length sequences (MLS) has the potential not only to measure bi- naural quantities accurately, but also to measure certain properties of the room itself such as impulse response (IR) and 60-dB reverberation time (RT60). A pro- gram was written to use MLS to simultaneously calculate both binaural quantities and properties of the room from a single measurement. The MLS is a binary signal (usually, in acoustics, a series whose values are 1 or — 1 rather than 1 or 0) which has a length of N 2 2i - 1. This length corresponds to exactly one period of the MLS. The parameter j may be any positive integer, known as the ”order” of the MLS. Any MLS has a perfectly flat frequency spectrum, and the MLS has an autocorrelation function that is a delta function with a small DC Figure 2.1. A linear-feedback shift register with five bits and taps at bits 1, 2, and 4. The bits a,- can take only binary values. An XOR gate performs modulo 2 addition of the output bit, a0 and al, the result of which becomes one input for the next XOR gate, which ”taps” a2 for its other input. This is then fed as an input to the XOR gate which taps a4. The last XOR gate in the sequence feeds its output back to the first bit, a4. The resulting output sequence, 5 [n] will be a MLS of order 5. shift [44, 80]. The usefulness of these prOperties of MLS will be described below. A MLS can be generated by a linear feedback shift register (LFSR), which is an arrangement of connected flip flops where the input bit is a linear function of its previous state. Consider a LFSR with five bits, as depicted in Figure 2.1. Each bit, al- starts with a random value, 1 or 0, where the only invalid set of initial values is that for which all bits start with 0. At each step, the values from the aith bit are moved to the ai_1th bit. The sequence 5 [n] is generated step by step by taking the value of the last bit in the register (as depicted, this is no). At each step, the output of the last bit is summed modulo 2 with the value held in at least one other bit. This other bit is said to have a ”tap,” which takes its value and feeds it to one input of a modulo 2 summer (XOR gate). This result flows to other taps as depicted in Figure 2.1 to generate a new input for the first bit. This process is repeated 2i — 1 times, where the order 1' equals the number of bits in the LFSR. The resulting series, 8 [n] is the desired MLS. For lower orders of MLS, there may be as few as a single tap required in the LFSR. In fact, the order 5 MLS is the lowest order of MLS that can be generated with a tap on mOre than one bit (other than the last). Also, for orders of MLS of 5 or above, the set of bits that can be tapped is not unique - there is more than one 45 possible set of locations for the taps that will still yield a valid MLS at the output. Higher orders of MLS can be generated with a number of possible sets of locations for taps [28]. For instance, an order-10 MLS can be made with a single tap on bit £13, or with three taps that can be placed in one of ten different ways, or with five taps placed in one of 14 different ways, or with seven taps placed in one of five different ways. For a 17-bit LFSR, there are 1348 ways to place a set of ten taps such that the output is a MLS [68]. The locations of the taps are critical in the generation of MLS by this method. A tap that is out of place will result in an output sequence 5 [n] that is not a MLS. The locations of the taps are related to the existence of irreducible primitive polynomials over the Galois Field GF(2) whose coefficients are only 1 or O, corre- sponding to the existence or absence of a tap, respectively [44]. The Galois field is one that contains only finitely many elements. The polynomials are of a degree equal to the number of bits in the register. For the register shown in Figure 2.1, the related primitive polynomial is x5 + x4 + x2 + x + 1, which corresponds to a tap after bits a0, a1, a2, and a4 since the coefficients of x0, x1, x2, and x4 are 1. Fortu- nately, these primitive polynomials are well known to very high order [68], and so the appropriate tap locations can be readily found. MLS has been often compared to other methods for making acoustical mea- surements such as swept-sine techniques [36], impulse burst techniques [90], and time-delay spectroscopy [90]. Each technique has advantages and disadvantages, but MLS remains a standard for comparison. MLS has been noted for good dis- tortion immunity in the resulting impulse responses [33]. The fact that different MLS of the same or different orders are relatively uncorrelated has been used for applications relating to the simultaneous measurement of several acoustical chan- nels [96, 95]. 46 2.2 Capabilities and Limitations Generation and use of MLS is facilitated by modern computer technology. A com- puter program was designed to generate and play a MLS through Tucker-Davis Technology (TDT) System 3 RP2.1 hardware to an external loudspeaker. The hard- ware then simultaneously reads two input channels, which are likely connected to microphones, such as those in the ears of a Knowles Electronics Manikin for Acoustic Research (KEMAR) [24]. The signals from each channel were stored in a serial buffer on the TDT, dumped into the program, and may then be analyzed and compared off—line in various ways. MLS of up to order 20 may be generated and played (which gives a MLS of length 1,048,575 samples), and playing at sampling rates dictated by the TDT hardware (approximately 6, 12, 25, 50, 100, or 200 kHz). At the present time, the or- der of MLS used in measurement is limited only by the the extent of the on-board memory of the TDT hardware. The MLS may be ”frozen” by starting with all ones in each bit of the bit-shift-register. Doing so will result in the same MLS each time, thus making it possible to repeat an experiment using the same MLS. It is also possible to pre-generate MLS signals and load them in at a later time, saving the significant amount of time required to generate higher order MLS. ”Silent signals” can also be generated of MLS-equivalent lengths, which are useful in measuring background and internal noise levels. Once a pair of signals was acquired, various forms of analysis were performed. For some calculations of binaural parameters, it was first useful to filter the signals into 1/3-octave bands. This was accomplished by convolving the signals with a bank of gammatone filters [47]. The gamamtone filters have center frequencies that can be defined by the user, but default to ISO standard center frequencies for 1/3-octave bands [47] from 80 to 20000 Hz (thus covering eight octaves). For the gammatone filter, an order of 17 = 4 and Cambridge 1/3-octave bandwidths [71] 47 were used. These parameters were selected as such because they are believed to give a good approximation to the shape of auditory critical band filters [43]. 2.3 Fundamental Equations and Concepts Certain equations are fundamental to the study of binaural hearing. They mathe- matically describe the binaural parameters that are understood to be most impor- tant in matters of sound localization. Others describe important acoustical char- acteristics of rooms, which are important because the characteristics of a listening environment impact binaural parameters. This section will provide a common mathematical basis for measurements and calculations presented in the rest of the body of this dissertation. 2.3.1 Interaural Level Difference Measurements The level difference between two signals of length N is computed by comparing the average power in each signal. Taking, for example, the signals to be from the left and right ears of a KEMAR, and labeling them x L and x R respectively, these powers, PL and PR, may be written as: PL = 873:1 xL2 [‘1 PR = fiziixlzzlt] (2.1) With the power of the signals in each ear, the interaural level difference (ILD) can be computed as: ILD = 1010g10(PR/PL) (2.2) 48 2.3.2 Coherence and Interaural Time Difference Measurements Cross-correlation functions can be computed, within or without a user-defined limit of the maximum lag (Tm), on the entire signals (broadband) and on signals filtered into 1/3-octave bands. Here, the lag is defined as a circular shift in time of one signal relative to the other signal against which the cross-correlation func- tion is being computed. The coherence, '7, is estimated by locating the maximum value of the cross-correlation function between the two signals. The interaural time difference (ITD) is estimated by finding the lag at which the maximum of the cross-correlation function occurs. For high frequency bands, the cross-correlation function is expected to vary rapidly, reflecting the periodicity of a sinusoid at that band’s center frequency. It has thus been hypothesized that the auditory periphery calculates ITDs in higher frequency bands by looking at the cross-correlation of the envelope of the signals in the left and right ears. This enve10pe of each signal can be calculated by taking the magnitude of the analytic signal. The envelope coherence, 'ye, may be found from the envelope of the signals in the same way that the coherence ’y is found from the signals themselves using the waveform cross-correlation chxR. Again labeling the signals x L and x R, the equations for these calculations are: 25:0 xL [t + T] x; [t] CxLxR [T] 2 NM (2.3) "Y = Max—TmSTSTm {CxLxR {Tl} (2-4) "Ye = Max—TmSTSTm {Cle+x.2e(xL)||xR+w(xR)|} (25) Here, .92” (x L) and .9? (x R) are the Hilbert transforms of the left and right signals, respectively. 49 ITD = Tmax Where CxLxR [Tmax] = ’7 (2.6) The limit for the range of ITDs in the ”physiological range” is from —1 ms to +1 ms [9]. Given the radius of the head, this is thought to be on the order of the maximum time difference for sounds arriving at the ears in free field. When making measurements on a KEMAR head, it is appropriate to set Tm = 1 ms, though it should be noted that humans are quite capable of correctly discrimi- nating larger ITDs [79, 18]. The number of samples within this time range in the cross-correlation function chxR is then limited by the sampling frequency. For in- stance, a sampling frequency of 50 kHz would yield 100 samples over a time of 2 ms, while a sampling rate of 100 kHz would yield twice as many samples over the same time span. The sampling frequency thus limits the resolution of the cross- correlation function. For any sampling rate PS, the time between subsequent samples of data is ts = 1/ F5. Consider then measuring the waveform coherence, 'y. The greatest error would occur when two samples are taken equidistant from either side of the true peak in the cross-correlation function that yields the coherence. The maximum error in ITD that could occur is then T = 711%. Thus, it is often preferable to use high sampling rates in order to achieve high accuracy in measurements of the coherence and ITD. Examples of the cross-correlations in certain 1/3-octave bands are shown in Figure 2.2, labeled with the center frequencies of the bands they represent. These cross-correlations for lags between +1 and —1 ms were calculated given the sig- nals arriving in the ears of a KEMAR that was facing 45° off-axis from the direction of the sound source. For these measurements, the KEMAR was placed in an ane- choic environment and the stimulus was an order 18 MLS played through a Mackie studio monitor situated 1.83 m from the KEMAR at ear level. Details about this 50 equipment will be described later. Waveform coherences '7 in each band may be found by finding the height of the peak in the relevant cross-correlation function. The value of the cross-correlation function at its peak is the waveform coherence, and the time at which that peak occurs is the ITD. The ITD for each 1/3-octave band shown is indicated by the time given in the lower left corner of each panel. 2.3.3 Impulse Response and Transfer Function Measurements Certain characteristics of a room such as the impulse response (IR) and transfer function (TF) may be measured, and these are important to understanding the acoustical characteristics of the room. To show the mathematical formalism of the method employed for taking such measurements, it is necessary to make use of the autocorrelation function of a signal y, Ryy, where: N Ryy [Tl = glylnly l" "' Tl The impulse response of the room may be found thanks to the autocorrelation property of the MLS of length N, x: N, T = 0 Rxx [T] :- —1, T 91$ 0 Thus, Rxx [T] x N 5 ('1'). With this, the impulse response of the room at each ear, h L and h R may be deconvolved from the measured signals by convolving (If!) each measured signal with the MLS. If y is the signal at the receiver, x is the MLS, and h is the IR of the system (see Figure 2.3), then: y = hi*x y*x = h*x*x = h*N(5 = Nh Cross-Correlations for 45° 1PI""r'r*r1*"'l"rd_T " I' l"' I_ ~ 125112 1- 315112 3 0.5: . ./ fl . 0». -1. .. -o.s: I 1 4; osom: 0.23m; ‘_r::::‘r:¢::§:::: “:::::: :....%4::+,q .5 ~ 800Hz _mr1: " 1 3.6 osi I f o; -1 - ii ; : . g-Ofi, "fill . .,L _ 020 .‘ 1_,::¢rr44:4§::: :10:¢:m:‘q_§:::::: § ::::*.L_‘ :5000Hz n,“ I 12500112 : 01-:é-MAA/V‘NM .1 L t 1 -05’ U U I 1 -1» U 00.19ms; , 9.19 mi -l -0.5 0 1-1 -0.5 0 0.5 l Lag(ms) Figure 2.2. Cross-correlations between signals measured in some 1 / 3-octave bands in an anechoic environment between signals measured in the left and right ears of a KEMAR in an anechoic environment. The KEMAR was facing a direction 45° from the direction of the incident sound. A vertical dashed line indicates the location of the peak cross-correlation (the waveform coherence) in each panel, which occurs at the ITD. The value of the ITD in each 1 / 3-octave band shown is indicated by the number in the lower-right corner of each panel. 52 x 1 9 Room Figure 2.3. A simple depiction of the MLS measurement method. A MLS, x is played through a loudspeaker in a room with transfer function h. The signal recorded by the receiver is y. The recorded signal y is related to the MLS x via the transfer function by y = h =1: x. It can be seen then that the result of convolving the measured signal with the MLS used to measure the signal is a scaled version of the IR, h. With the IR at hand, the transfer function of the system is only a Fourier transform away. Additionally, multiple trials may be performed, after which the averages and standard deviations across trials for the reverberation times, ILD, ITD, and '7 may be calculated. Average impulse responses, hL and hR, formed by averaging the impulse responses across trials in the left and right ears respectively are calculated. Single impulse responses may be derived from averaging the signals in each ear respectively across trials and then calculating an IR from the average signal, h L avg and h R avg. Such methods are particularly useful when making measurements in noisy environments. 2.3.4 Reverberation Time Measurements Reverberation time is the amount of time required for sound in a room to decay in level by a certain amount. Often cited in practice is the time required for a sound to decay by 60 dB. This is RT6O, the 60-dB reverberation time. This time is usually 53 defined for broadband signals, but may also be computed at specific frequencies or in frequency bands. Given the impulse response of the system, the reverberation times for the room may then be computed. This is done} by using a method of reverse integration de- veloped by Schroeder [81]. Schroeder makes use of the autocorrelation property of long white Gaussian noises (noises with instantaneous amplitudes that are nor- mally distributed and long-term amplitude spectra that are flat across all frequen- cies) to show that if such a signal is used to excite a reverberant system, then the ensemble average squared signal at the receiving point, , is related to the square of the impulse response, hz, by: <52 (1)) = N [too h2 (t') dt’ (2.7) Here, N is the noise power per unit bandwidth of the signal. In practice, the MLS used to measure the system is finite in length, and the length of the IR measure- ment will be equal to that of the MLS. After this time, the IR is assumed to be zero, and thus the upper bound on the integral in Equation 2.7 is set equal to the duration of the MLS. The level of the left hand side of Equation 2.7 as a function of time is computed relative to the maximum of the function (which must occur at time t = O). The resulting function is the ”integrated impulse decay curve” (IIDC) of the system, and tends to have a bi-linear (elbow) shape as depicted in Figure 2.4. The rever- beration time can be computed from this function by fitting a straight line to the first linear part of the decay. Schroeder’s method of calculating RT60 may be per- formed on the unfiltered IRs to find the broadband value of RT6O. Or, to find RT60 in 1 / 3-octave bands, the IR can first be filtered with the gammatone filter bank and then the reverse integration is performed. The first 400 ms of the broadband IIDC of a small, roughly square room, as measured using an order 19 MLS, is shown in Figure 2.4. 54 r r 1 —- IIDC -15 _ ----- Linear fit _ 8 3 -30- - .d‘ \ \ g 4 \\\ a 5 \x "‘ \ \\ \\ —60 _.\.\ --------- — RT60 —323 ms \ \\\ _75 1 1 1 100 200 300 400 t (ms) Figure 2.4. The first 400 ms of the integrated impulse decay curve (IIDC) for a small, roughly square room (of dimensions 5.7 m by 4.5 m by 3.5 m high) is shown as a solid line. A dashed line shows a linear trend line fit to the first decay mode of the IIDC. The fit line reaches a level of —60 dB at a time of 323 ms, thusly indicating the 60-dB reverberation time, RT60, of the room. 55 Schroeder several years later described the usefulness of MLS in measuring IR and RT60 in rooms [83]. In that paper, he suggested that MLS are superior to finite- length white Gaussian noises because their autocorrelation property allows for a better measurement of the IR. The deconvolution of the MLS from the room IR could easily be accomplished with the use of a fast Fourier transformation (FFT). Additionally, he noted that if a single MLS would suffice in repeated measure- ments and Fourier techniques were utilized in deconvolving the MLS from the IR, then storage requirements would only have to be met for the MLS itself and the phase angles of its Fourier coefficients (since the coefficients all have equal magni- tude). Schroeder also mentioned the signal-to-noise ratio benefit of using MLS. He noted that it is great enough that if repeated measurements were taken of the IR in a system, that the IR (and therefore RT60) could be accurately measured even in the presence of louder noise, as long as that noise is incoherent with the MLS. For example, this technique could be used even in a concert hall during a musical performance. During a musical recital of modest duration, hundreds of measure- ments of the IR of the recital hall could be made and averaged at the end. Assum- ing the music is incoherent with the MLS, the necessary level of the MLS could be so low as to be inaudible. Schroeder reported doing just this in a lecture hall while a lecture was taking place [83]. 2.4 Verification of the value of the MLS method As a simple test of the usefulness of the method, a MLS of relatively high order (19) was played through the electronic filter at a sampling rate of 97, 656.25 Hz and the impulse response (IR) and transfer function (TF) were derived as described previously. The IR derived from this measurement is shown in Figure 2.5, and 56 0.8 1 1 l 1 1 0.6 - _ 3 c 0.4 - — O o. (D 0) Er 0.2 1 b a, L” a 0 ~ “0.2 " P ’0-4 1 l 1 1 1 0 l 2 3 4 5 5 Time (ms) Figure 2.5. The impulse response of the electronic 1/3-octave band equalizer as measured by a MLS of order 19. the TF in Figure 2.6. The TF was then measured again using a more conventional method — employing an analog Hewlett Packard 3580A spectrum analyzer with tracking oscillator. To the eye, there was no discernible disagreement between the TF derived from the MLS measurement and that given by the analog spectrum analyzer. Thus, the MLS method seems to be accurate. As there are no standards for comparing the accuracy of different signals in measuring quantities such as TF, one was developed for use in this study. The TF given by an order 19 MLS, averaged over 10 measurements, each with a different MLS of order 19, was used in further experiments regarding the same electronic filter as a reference point against which all other measurements (using lower-order sequences) could be compared for the purpose of determining accuracy. As the order 19 MLS is of a length that is near the operational limitations of the hard- 57 I I l I II I I I I I I I I L I I I l ' .. "6 q _ fl 0'8 3 '5 E CL, 0.6 - - s 2 .E g 0.4 3 ~ 0.2 ‘ - 1 4 5 O I I I I I' I I I I I I I II I I I ' 1000 10000 50000 Frequency (Hz) Figure 2.6. The transfer function of the electronic 1/3-octave band equalizer as measured by a MLS of order 19. Numbers 1 through 6 count the number of valleys and shoulders over the audible range. 58 ware being used (due to physical memory limitations) and measurements using it agreed well with the analog spectrum analyzer measurement, it was used as the standard TF. Also, it is expected that any random noise should converge to the same result for the TF in the long-sequence limit, and so if measurements using, for example, random telegraph noise (RTN) do not approach the standard, then there will later be reason to question the validity of the MLS measurement. 2.5 Experiment 5: Comparison to Random Telegraph Noise A simple test of the effectiveness of the program as well as the efficiency of the MLS method was carried out. M.R. Schroeder initially advocated the use of sta— tionary white noise in the measurement of reverberation times due mostly to the fact that the autocorrelation of such signals is a delta function scaled by the length of the signal [81]. However, even the best stationary white noises only exhibit this characteristic in the long-signal limit. For signals of a more practical (finite) length, the autocorrelation of stationary white noise is not quite a delta function, and there is randomness in its autocorrelation at non-zero lags. This then becomes a source of error in measurements such as that of the impulse response that depend on the sifting property of the delta function when stationary white noise is used. How- ever, this error may be made vanishingly small with a sufficiently long noise. In principle, the MLS is more reliable and consistent than RTN, especially for short durations, in the sense that the non-zero-lag components of its autocorrela- tion (when normalized by the signal length) are all identically — 7%, where N is the total length of the MLS signal, for any order MLS. For MLS of even modest orders, the magnitude of the non-zero-lag components of its autocorrelation ap- proach zero, and thus the autocorrelation approaches a true delta function quite 59 quickly. For example, a MLS of order j = 7 has a length of N 2 2f — 1 = 127 samples, leading to an autocorrelation function with a peak at lag zero of height 1 and a magnitude at non-zero lags of — %7. It is thus expected that MLS will lead to more reliable estimation of the impulse response and pr0perties derived from the impulse response such as RT6O, of a given system. At the time of its development, the MLS method was heralded as being compu- tationally efficient [83, 20]. A MLS can be deconvolved from the impulse response (IR) by means of the fast Hadamard-Walsh transform (FHT) [28]. Since both the MLS and the kernel of the FHT are made of elements of only 1 or —1, this com- putation requires only addition operations. The modern era of computing has no such lack of computing power or storage space as Schroeder dealt with in his day, however; signals of almost arbitrary length may be readily generated, played, and simultaneously recorded using modern computers. Any random noise, in the long signal limit, shares the same autocorrelation property of the MLS, and so it becomes reasonable to question the value of using MLS. In this experiment, both MLS and random telegraph noise (RTN) were used to perform similar tasks and their results and effectiveness were compared. RTN, like MLS, is a pseudorandom sequence consisting only of the numbers 1 and —1. In this experiment, measures were taken to ensure that the MLS and RTN had similar statistical properties, and these are further discussed below. 2.5.1 Methods In this experiment, MLS and RTN were both used to measure the impulse response and transfer function of a McLelland model GE-31 one-third—octave band graphic equalizer, whose sliders were set such that there were six valleys and shoulders in the transfer function over the audible range. For each trial, a MLS was gener- ated with random initial inputs in the bit-shift register. A RTN sequence was then 60 generated by ”scrambling” the MLS. This was done by taking the MLS and ex- changing the value at each position with the value at some other random point in the MLS. This randomization process went through two full iterations. The RTN thus had the same length, same number of 1s and —ls, and same RMS power as the original MLS, but did not share the same autocorrelation property as the orig- inal MLS. However, for very long MLS, the autocorrelation of the corresponding RTN, like that of the MLS itself, will approximate a delta function. MLS and RTN signals were played by the program three times, with no pause between successive plays. The program began recording concurrently with the onset of the second repetition of the signal. Doing this allowed the system to ”sta- bilize” before data was taken. This also allowed for any delay between input and output of the system. In an electronic filter, this may come in the form of the inher- ent delay of the filter; in a real room, this appears in the time it takes the sound to travel from the source to the receiver. The program may then actually record most, but not the beginning of, the second repetition of the signal, and a small part of the beginning of the third repetition. Fortunately, this corresponds to a circular shift in the signal itself, under which the properties of the MLS and RTN are unchanged. Analysis of recorded signals using the methods described above depends on this continuity under circular shifting. 2.5.2 Measurements using MLS and RTN In this part of the experiment, a particular order of MLS and the same order RTN were played through the filter, and the response was recorded at the filter’s output. MLS and RTN of orders 9 through 18 were tested. This was repeated eight more times for a total of nine measurements with different MLS and nine measurements with different RTN each time for each order. The impulse response was calculated each time and transformed to find the TF for each calculated impulse response. All 61 TF5, including the standard, were computed from the impulse response using a 4096-point FFT. Each of these was compared to the standard TF, and an estimate of the RMS percent error based on the point-by-point difference between the two was calculated. Averages and standard deviations of percents error were computed across sets of nine trials for both MLS and RTN. 2.5.3 Error calculation This section describes the method of calculation of the RMS percent error between a measured TF and the standard TF. First, let kaj) be the ith frequency bin of the TF measured by a MLS of order j. This also refers to the TF measured on the kth trial, out of 11 total trials. All such transfer functions are individually normalized to a maximum of 1. Then, the ith element of an average TF across all 11 trials (here n = 9) with a MLS of order j is given by: . 1 n . M10) : 5k); kal) M?) is also normalized to a maximum value of 1. Note that the standard for com- parison described above is notated as M (19). The RMS percent error across all 11 MLS of order j, R0), is the RMS average sum squared error divided by the area under the standard TF, 819. . f n - 2 R0) = 100 x J; E Z; (I‘M?) — Mill”) /519 k=l i Similarly, the RMS percent error between the standard TF and the average mea- sured TF from all n MLS of order j is: . - 2 gm 2 100 x ¢E (Mff) — Mill”) /519 i 62 Here, as mentioned previously, Si is the root-sum-squared area under the stan- dard TF derived from measurements with MLS of order 19: 519=122>2 The RU) can be thought of as the average RMS percent error expected from a single measurement using a MLS of order j, and Q0) is then the RMS percent error expected from the average of n independent MLS measurements. It can be shown that the quantities R0) and Q”) are related to the variance across 11 trials, V0) by: V2 2 R2 _ Q2 These equations hold for measurements made with noise as well. For error mea- surements made with RTN and a posteriori Wiener filtering, a subscript w is added, and for RTN measurements made with exponential filtering in time, a subscript e is added. Note that the standard of comparison, however, remains M(19). 2.5.4 Results The relevant percent RMS errors, noted plus and minus one standard deviation where appropriate, are given in Tables 2.1 and 2.2. First, note that, at a sampling rate of 97, 656.25 Hz, the shortest MLS has a playing time of approximately 5.24 ms. ”Aliasing” of the impulse response (IR) occurs when the noise used to measure the system is shorter than the impulse response itself. The result of calculating the IR is that the IR seems to repeat itself in time. The signals used in this study are already significantly longer than the bulk of the impulse response of the filter, as shown in Figure 2.5. Thus, little aliasing of the IR is expected. Across nine MLS measurements, the average RMS percent error, R0), was at most 0.76%. Also, R0) was within one standard deviation of the RMS percent error inherent in the average TF, QU) at all orders. This suggests that there is little 63 advantage to be gained in taking multiple measurements with MLS at any order. In general, there is a trend towards lower RMS percent errors with increasing order. MLS RTN 14er RTNe Order (j) RU) RU) Kg) RE!) 9 0.764003 7145 6745 3548 10 0.39 4 0.04 76 4 6 75 4 5 25 4 6 11 03740.05 7643 6945 2047 12 0.38 4 0.016 78 4 4 60 4 5 14 4 3 13 0.076 4 0.0019 80 4 3 6O 4 3 10 4 1.6 14 0.0533 4 0.0008 70 4 4 39 4 3 9 4 1.8 15 0.045 4 0.002 60 4 4 31 4 6 8 4 2 16 0.0643 4 0.002 47 4 7 18 4 3 6 4 1.5 17 0.0276 4 0.0008 37 4 3 13 4 4 6.0 4 1.0 18 0.0301 4 0.0007 25 4 4 12 4 2 5.4 4 0.6 Table 2.1. Average percents RMS error, plus and minus one standard deviation, for single measurements of the TF at each order j (corresponding to the length of the signal). The header label of each column indicates the type of signal used (MLS or RTN). Quantities under the Kg) correspond to RTN measurements made including (1') a posteriori Wiener filtering of the IR, and quantities under the Re correspond to RTN measurements made with exponential windowing in time of the IR. At all orders, measurements with RTN echibited significantly greater percent RMS errors (often by a factor of greater than 100) than did measurements with MLS. Averaging the response across trials and then calculating a TF gave a sig- nificant improvement in the performance of the RTN measurements. The average improvement (measured as the difference of errors for RTN, RU) — Q0), found in the second columns of Tables 2.1 and 2.2) was 35.2%. This shows that repeated measurements are certainly necessary at any order at least up to 18 (which corre- sponds to a signal length of 2.68 s) when using RTN. Even in the best-case scenario - an averaged TF measurement using order 18 RTN - there is significantly more error (Q(18)) than that given by even a sin- 64 '9'! MLS RTN RTNw RTNe Order (j) Q0) Q0) 2(5) Q9) 9 0.755 42.0 26.7 11.6 10 0.360 52.6 31.3 12.7 11 0.374 33.6 39.1 9.69 12 0.384 42.0 26.2 5.77 13 0.0738 40.0 20.2 6.82 14 0.0527 33.0 11.5 6.72 15 0.0447 21.3 8.52 7.76 16 0.0642 17.2 6.41 5.49 17 0.0274 13.6 3.07 5.69 18 0.0300 8.28 2.53 5.12 Table 2.2. Percents error calculated from the average TF calculated across nine tri- als using various methods and signals. The header label of each column indicates the type of signal used (MLS or RTN). Quantities under the Kg) correspond to RTN measurements made including a posteriori Wiener filtering of the IR, and quantities under the Ry) correspond to RTN measurements made with exponential window- ing in time of the IR. 65 gle MLS of order 9 (Rm). To see this, consider Figures 2.7 and 2.8. It is clear that the measurement from a single order 9 MLS (plotted as a solid line in Fig- ure 2.7) shows almost perfect agreement with the reference accepted TF (plotted as a dashed line). The only serious discrepancies occur at low frequencies, perhaps because such a short signal (a MLS of order 9 sampled at the current sampling frequency is just over 5 ms long) does not capture enough oscillations of low fre- quency components of the spectrum to make an accurate measurement of them. Figure 2.8 shows a similar picture for the average measurement of the TF taken from nine RTN measurements of orders from 13 to 18. Clearly, the RTN mea- surements improve significantly at higher orders. However, it is plain to see that although the RTN have provided perhaps an acceptable representation of the TF at higher orders, its jaggedness makes it far less accurate than a MLS of even modest order. The results shown in this experiment confirm predictions made on the basis of statistical properties of both MLS and RTN. The autocorrelation property of the MLS is such that it looks more like an ideal delta function than that of a RTN at almost any order at which the signals are of a practical length. There are, however, noise reduction techniques which may prove useful in processing the IR given by RTN measurements, and these may lead to much improved results. It may also help to assume certain properties of the IR as well, such as the expectation that the IR should decay rather quickly. 2.6 Experiment 6: Error Reduction Techniques for RTN Measurements Some improvement in the performance of RTN in measuring the TF may be possi- ble if certain assumptions are made about the nature of the IR of the system being 66 IIIII' I IjIIIII' I I I Gain (normalized) 9 o .o A 05 m I I I I I I .o N I 0 JIIIII I I IlIIIll l I 1000 10000 50000 Frequency (Hz) Figure 2.7. The accepted TF, plotted as a dashed line, and the TF as measured by a MLS of order 9, plotted as a solid line. There is almost no difference between the two at most frequencies, and thus the order-9 MLS measurement obscures the dashed line corresponding to the accepted TF. measured. If the results are such that the error rates are closer to those demon- strated by MLS techniques, then the RTN may still be a viable method of measure- ment. A simple assumption that can be made regarding the nature of the IR is that the IR should disappear after a long time. For instance, Figure 2.5 shows that the IR of the electronic filter measured disappears by about 6 ms. The signal used to mea- sure it, an order 19 MLS, was 5.37 8 long, and so the entire measurement of the IR also extends to 5.37 s. In the tail of the IR then, we expect nearly no signal to exist. For example, the RMS signal level for the last second of the IR shown in Figure 2.5 is on the order of 10'9. During the last second of a single RTN measurement of order 18, the RMS signal level is on the order of 10‘6, three orders of magnitude 67 15‘ Order 13 “ Order 14 ‘1-5 1.- .1. 0.5L " o [ I o-‘Hfitl : :::::::‘ -'-r':::¢:+ .L : ”24:2: : ::J, o 13 .. 3 1-5 Order 15 ‘5 Order 16 1'5 -- r 1) . g ‘l- u— all 3 L 4’ q C V 0.5” ' .005 .s » 4 O (.9 0': :33} c 4 P4,.....: 0 : 4 .24: : : +30 15- Order 17 4 Order 18 ~15 1- 4 -1 L in I 0.5- 4 -0.5 r 4) 1 071...“: . . . . .1. 11.1..1 I I. . . '10 1000 10000 L 1000 10000 A Frequency (Hz) Figure 2.8. The accepted TF, plotted as a dashed line, and the average TF measured by nine RTN of orders as indicated in each panel, plotted as a solid line. The jaggedness of the TF as measured by the RTN tends to obscure the dashed line. larger than the standard. The RMS signal level in the averaged IR calculated over nine RTN measurements is less than one order of magnitude - about a factor of three - smaller. This noise in the tail of the measured IR may be at least partially responsible for the jaggedness seen in the average TF measured by the order 18 RTN and the otherwise gross errors at lower orders. Here, two fast methods are presented that target this potential source of error - Wiener filtering and exponential windowing in time. The Wiener filter is a widely-used method for reducing noise power in a signal and is based on statistical methods rather than spectral analysis. The expo- nential window is an a posteriori filter that operates on the measured IR in time in a way that forces the IR to go to zero at high time. First, the effect of Wiener filtering 68 was examined. 2.6.1 Wiener filtering The Wiener filter is a method of removing noise from a signal based on statisti- cal methods. The properties of the non-causal Wiener filter are well-understood (see, for example [85, 89]) and the main concepts are outlined in Appendix C. The Wiener filter requires information about the noise and signal. Assuming the noise and signal to be uncorrelated, it suffices to know the spectral power of the noise. Methods The noise power was computed as the variance of the signal in the last quarter of the IR. With this, the Wiener filter operated on the entire IR to yield a filtered IR. This IR was then treated as in Experiment 5 to find the TF and the RMS percents error, r8) and wg), for all previously studied orders j (9 through 18). Results The average RMS percent errors are shown in Tables 2.1 and 2.2. The column labeled rg) shows average RMS percent errors, plus and minus one standard de- viation, for any given single RTN measurement when Wiener filtering is applied a posteriori to the derived IR, before the TF is obtained. The column labeled qg) sim- ilarly shows the RMS percent error when Wiener filtering is applied to the average IR measured by RTN across all nine trials. For RTN of order 11 and higher, rg) is significantly less than r0) (1) < 0.01 in a paired t-test of absolute differences). By the same standard, 175].) is significantly less than 61(1). On average, Wiener filtering decreases the RMS percent error in the averaged TF measurement by about 13%. 69 Without Wiener filtering, a best RMS percent error among the average TF mea- surements with RTN was (208) = 8.28% for RTN of order 18. A similar RMS percent error was achieved after Wiener filtering for RTN of order 15, hence with a signal one eighth as long. At order 18, a single RTN measurement after Wiener filtering resulted in, on average, R88) = 12 :1: 2% error, and the averaged TF had only (2)018) = 2.53% error. Though this is still over three times the average er- ror of a single MLS measurement at order 9, it is a substantial improvement over measurements using RTN without Wiener filtering. 2.6.2 Exponential windowing As was noted above, the impulse response of a real room or filter is expected to go to zero after a long time. The presence of noise in the system or the measurement of a system with noise whose autocorrelation function is not very close to a delta function will leave noise in the long-time tail of the impulse response and this is a source of error when calculation the TF or RT60. A potential filter for reducing the noise in the long-time part of the IR would be one which exponentially attenuates the IR after a certain amount of time. This would leave the bulk of the structure in the IR unaffected, while forcing the later parts of the IR to die away. The main difficulty with this method is that it requires some amount of a priori information that can only be gathered a posteriori. That is, the filter needs to know at what time to begin the decay, but this requires some knowledge of the yet-unmeasured IR. Fortunately, since the main function of the window is to force the parts of the IR at long times to decay, the exact time at which the window begins is not critical. It is only necessary to leave the characteristic structure of the IR intact. 70 Methods Whereas the Wiener filter required knowledge of the noise power in the tail of the IR, the exponential window needs to know how long to let the IR pass be- fore beginning the exponential decay. This time must in practice be approximated, and a good guess could be made by examining the IR derived from a relatively low-order RTN measurement. Calling this estimated value T, the form of the ex- ponential window as a function of time, 8 (t) is then: 1 fort 3 Te 1’ :: L’( ) e—b(t—T) for t > “('3 Here, T3 is the user-defined time at which exponential windowing begins. The parameter b is a rate parameter which affects the rate of decay of the exponential window in time. In practice, an initial rough measurement of the IR must be made first in order to choose T such that most of the information in the IR occurs before T. For example, if the IR is seen to almost entirely disappear by a time of 0.35 s, then it is preferable to set Te = 0.35. The choice of the rate parameter b is much more arbitrary, as it merely affects the rate at which the exponential window goes to zero. Very large values of b will lead to an exponential window that looks like a square window in time from t = 0 to t 2 Te. An example of an exponential window tailored for the system measured in Experiments 1 and 2 is depicted in Figure 2.9 with b = 0.05 and Te 2 1.5 ms. These values were selected because they performed very well in generating an accurate IR. Results The RMS percents error for TFs measured after exponential windowing are shown in the bottom-rightmost parts of Tables 2.1 and 2.2. The average RMS percent er- rors, plus and minus one standard deviation, for each individual measurement us- 71 0.8, - 0.6 - '1. 0.4 - '1 0.2 -1 ‘ Impulse Response 1 1 1 F I 0 l 2 3 4 Time (ms) Figure 2.9. The accepted impulse response, plotted as a continuous line, and the exponential window in time, plotted with a dashed line. The exponential win- dow shown here has a rate parameter of b = 0.05 and starts its decay at time Te 2 1.5 ms. 72 ing RTN and applying an exponentially decaying window in time are given in the Ry) column. The RMS percent errors for the average TF formed from applying the exponential window to the IR derived from the average of all nine measurements using RTN are in the column labeled (29. Most notably, the exponential time window significantly decreased the average error of a single measurement using RTN at all orders tested when compared to the RTN measurement without filtering or with Wiener filtering (compare the Rev) column to the R0) and Kg) columns for the cases of no filtering and Wiener fil- tering, respectively). The fact that, for individual measurements, the exponential window performed better than Wiener filtering is likely due to the Wiener filter- ing destroying some fine structure in the low-time part of the IR. The exponential window in time left the main body of the IR unchanged while only diminishing the long-time part, which is physically expected to go to zero. In all cases, the error in the average TF after the exponential window (QED) was lower than the error without filtering (Qm). However, it did not perform better than the Wiener filter (Q?) at high orders. At high orders, the power of the signal remaining in the long-time part of the IR is quite small, especially in the IR averaged over all nine trials. The Wiener filter is adaptive to the noise power, and so it is weaker for measurements with high order signals, where the noise power is expected to be weaker as the autocorrelation of the RTN more closely approximates a delta function. A weaker Wiener filter is less likely to destroy fine structure in the low time part of the IR that it might otherwise ”mistake” for noise. At high orders, the exponential window, however, has less and less of an effect, since the long-time noise that it is designed to dampen is already quite small. 73 2.6.3 Conclusions The MLS technique has not only proven to be an effective and accurate method of measuring the acoustical characteristics of stationary systems, but it has been shown to be far more accurate than even long RTN signals. Assuming the signals are longer than the IR of the system being measured, even a single short measure- ment with MLS more accurately measures the IR and TF of a stationary system than a RTN several times longer, even if the average of several RTN measurements is used. Filtering the IR derived from the RTN measurements has proved to be quite useful in reducing the error in the resulting TF. Both Wiener filtering and the appli- cation of an exponentially decaying window in time to the IR were quite successful at improving the TF measurements. Which one should be used with RTN measure- ments then depends on the situation. If the signals to be used are long (here, longer than about 1.3 s) and a measurement of the IR is to be made by averaging the re- sponse from multiple trials, then Wiener filtering appears to be advantageous. In all other cases, the exponential window filter in time has the advantage. In no case do either of these filters improve the RTN measurements to such a degree that they are more accurate on average than even a single MLS measurement of the smallest length considered. However, these filtering methods may still be useful when only RTN data is available. In most studies of rooms, it is customary to require that the signals be of such a length that they are longer than RT60 for the room. This prevents aliasing of the important parts of the IR that contribute to the calculation of RT60. In even moderately reverberant rooms, this may be as long as 1 s. An order 17 MLS, of length 217 — 1 = 131071 at the current sampling frequency of 97656.25 Hz (set by the TDT hardware) has a length of 1.3 8. Since an order 16 MLS will be almost exactly half as long, an order 17 MLS is the shortest MLS at this sampling rate 74 useful for measuring such rooms without aliasing the IR. Also, real rooms tend to be, at best, pseudo-stationary. The acoustical characteristics of real rooms are dependent, for example, on atmospheric conditions such as relative humidity and temperature [9]. Thus, measuring real rooms necessitates repeated measurements and averaging. In such cases, Wiener filtering seems to be the best option. In some rooms, where RT60 may be very long, it may be difficult to generate and work with a MLS of sufficient order, especially if a high sampling rate is also required. In that case, it may suffice to fall back on RTN techniques, assisted by the filtering techniques presented here. These techniques may also be useful for cleaning up existing RTN data. No amount of filtering or correction seem to make the TF measured by RTN as smooth or as accurate as that measured by MLS. This alone should justify the small amount of extra processing time inherent in forming the MLS. The findings devel- oped in the process of filtering the RTN may also be used on MLS measurements in real systems where there is additive noise. 2.7 Experiment 7: MLS and RTN measurements in a real room Real rooms are a useful example of systems that are expected to have a certain amount of ”noise” in them. The acoustical properties of rooms are affected by noise sources such as ducts and fans, and these properties may change over time with a plethora of factors such as the room’s contents, temperature, and humidity. The purpose of this experiment was to compare the performance of MLS and RTN in the measurement of a real room. The measurement of real rooms usually involves more noise inherent in the system than what exists in a typical stationary filter. Given inherent noise in the system, one may once again question the necessity of 75 the MLS if the presence of significant noise disrupts its beneficial characteristics. That is, if there is noise inherent in the system, will it matter which test signal is used? This experiment is intended to address that question. 2.7.1 Methods Measurements were made with order 19 MLS and RTN in a roughly square room, measuring approximately 5.7 m by 4.5 m by 3.5 m high (this room is identified as 172GH, and is the same room whose broadband RT60 is shown in Figure 2.4). The Schroeder frequency [82] of a room is that approximate frequency above which there are many overlapping normal modes and the room will respond in a rever- berant way. Below this frequency the room will tend to produce discrete, well- separated normal modes. This frequency in Hz, f3, is given by: f5 4 2000 ——R;60 (2.8) Here, V is the volume of the room and the leading coefficient of 2000 has units of M. For this room, the Schroeder frequency was approximately )2 = 120 Hz. Results of measurements made in frequency bands centered below this frequency may be unreliable as there exists the possibility that the microphone was located near a node of a standing wave within that low frequency band. Above this fre- quency, measurements should be considered reliable since it is unlikely that stand- ing waves in the room dominate frequency bands centered above the Schroeder frequency. RTN measurements were made with and without a posteriori Wiener filtering. The IR was calculated as the average over all nine trials. The TF and RT60 in 1 / 3- octave bands were then calculated from this IR. Thirty 1 / 3-octave bands were used from 20 to 16, 000 Hz according to ISO standards [47]. Signals were played through a Mackie HR824 high-accuracy studio monitor 76 and recorded by a Shure KSM32 directional condenser microphone. The source and receiver were separated by approximately 2.6 m, and were turned so as to ”face” one another. Signals were played at a level of 91 dB SPL, and the level of the background noise inherent in the room was approximately 44 dB SPL (A- weighting). 2.7.2 Results The TFs measured via all three methods (MLS, RTN, and RTN with Wiener filter- ing applied) can be seen in Figure 2.10. The TF as measured by MLS and RTN appear to be quite similar. The shapes are quite similar, though perhaps the TF measured by RTN is a bit more jagged at low frequencies. The TF measured by RTN after Wiener filtering was applied looks quite different, however. Of partic- ular note is the peak near 26 kHz, whose magnitude the Wiener filtering reduces. Also, the TF as measured after Wiener filtering seems to be much more jagged at low frequencies up to about 10 kHz. A plot of RT6O measured in 1/3-octave bands by all three methods is shown in Figure 2.11. Open circles represent MLS measurements, exes represent RTN measurements, and open diamonds represent Wiener-filtered RTN measurements. Measurements of RT60 derived from RTN measurements without Wiener filter- ing were quite erratic, and are likely unreliable, especially given estimations of long RT60 at high frequencies where the room is expected to become more ane- choic. RTN measurements nearly agree with MLS and Wiener-filtered RTN mea- surements for the measurements at 80, 100, 125, and 315 Hz but otherwise return higher-than-expected values of RT60. Wiener filtering resulted in RTN measurements that largely agreed with the MLS measurements, though there are some significant differences at low frequen- cies. The Wiener filtering likely makes RTN estimations of RT60 agree with those 77 .111, I MLS-Measured Transfer Function 1 0.5 111%1111111111111waw . . \ /\\ 0 1 F‘J \441 ,_ 10 10 20 3O FrequencysgtHz) 111, 11111111 RTN-Measured Transfer Function I 1'. I 0.5 mm A a \. ,/ \ 0 l l M \‘m WuA 10 20 30 40 FrequencyfigHz) 1ener-F1Itered RTN- Measured Transfer Function 0.5 _ 0 WFW a... _1__ _A 1 0 20 30 40 50 Frequency (kHz) Figure 2.10. The TF of a small, roughly square office of dimensions 5.7 m by 4.5 m by 3.5 m high. In each case, the TF was calculated from the average IR calculated over five measurements. The top panel shows the TF as measured by MLS. The middle panel shows the TF as measured by RTN. The final panel shows the TF calculated when the average IR measured by RTN was filtered by a Wiener filter before the TF was calculated. 78 Reverberation Time in 1/3-Octave Bands 1500 - . fivfl - T . ..-., . . H---” -e- MLS + RTN 1 1400* + Wiener-Filtered RTN 1200~ - 1000» a 7:? § q, 800— 4 .§ l.— 600~ - 400- 6 \ , »‘ : ‘ + 200~ " ~ 0 . . W: I . ..r...4 . . L 101 1o2 10" 10‘ Frequency (Hz) Figure 2.11. Measurements of the 60-dB reverberation time in the same space as that referred to in Figure 2.10. Open circles represent times calculated from MLS measurements, exes mark times calculated from RTN measurements, and open diamonds mark times calculated from Wiener-filtered RTN measurements. 79 made with MLS because both MLS and the Wiener-filtered RTN measurements have relatively little noise in the high-time part of the IR. Noise power in the high-time part of the IR will distort measurements of RT60 made via Schroeder reverse integration, decreasing the decay slope and thus increasing the estimate of RT60. However, fine structure in the IR, which the Wiener filtering process tends to ”smooth out,” is not very important when calculating RT60 by the method of reverse integration because small changes point by point will not significantly af- fect the resulting reverberation time, as long as the overall structure of the IR is preserved. 2.7.3 Conclusions The Wiener-filtered RTN and MLS agree closely when it comes to calculating RT60. This agreement is likely due to the fact that both methods yield relatively little noise in the long-time component of the IR. Though the Wiener filtering procedure destroys some fine structure information in the RTN-measured IR, this information is not very important in calculating RT6O. The effect of this loss is seen, however, in the calculation of the TF. In this case, the RTN-measured TF is closer to that given by the MLS than does the Wiener-filtered RTN. This is again likely because the Wiener-filtering procedure smooths out some parts of the IR that are not noise, but structure. It would seem then that while Wiener filtering may be beneficial in measuring RT60, it is wholly detrimental if applied before the TF is calculated. Given the results of this chapter, it is hard to deny the overall usefulness of MLS when compared to RTN. The MLS performs better than RTN at measuring impulse responses, transfer functions, and reverberation times of rooms, and does so using signals of shorter length. Even when using filtering techniques to improve the ac- curacy of RTN measurements, the performance of MLS is more accurate and more consistent. In the experiments that follow, MLS will be used exclusively for mea- 80 surements of rooms and binaural properties. Of particular interest in the following chapter will be measurements of interaural coherence. 81 CHAPTER 3 Experiment 8: Measurements of Envelope and Waveform Interaural Coherences With KEMAR Measurements of the waveform coherence (Equation 2.4) are relevant in cer- tain aspects of acoustics, and waveform coherence is often quoted in studies of rooms [8, 27, 30, 55, 58, 63]. It has increasingly become an important measure in studies of both human and animal hearing (for example, [13, 14, 12, 29, 31, 34, 42, 45, 51, 60, 84, 88]). Coherence is related to the degree to which listeners can make use of ITDs in localizing sounds [59] and the apparent auditory source width [17, 73, 4, 74]. It is of interest to make measurements of these values in real listening environments. As center frequency increases in fixed-bandwidth analy- sis, so does the rate of oscillation of the cross-correlation function (this can be seen, for example, in Figure 2.2). At high frequencies, the ear is insensitive to interaural differences in the fine structure of signals, but differences in the envelopes of the signals become important [54, 69, 86]. To see why this is so, consider a calculation of the interaural phase difference (IPD). The ITD and IPD are simply related by the 82 angular frequency, w = 27r f : ITD: E). (U The ITD at a given angle of incidence is roughly constant across frequency, though it tends to decrease slightly with increasing frequency. Thus, as frequency in- creases, so must the IPD. Writing the ITD as ITD = a sin6 where a is a constant and 9 is the angle of incidence, IPD = 27rfasin0. At some frequency (f), the IPD will equal 7r. There is then the possibility of a 180° phase error, leading to an ambiguity in whether the angle of incidence is 9 or —0. The lag at which waveform coherence occurs thus becomes an unreliable estima- tor of ITD. The envelope coherence is then a better estimator at high frequencies. A calculation of the cross-correlation of the envelope of the signals in the left and right ears leads to a measurement of the envelope coherence (Equation 2.5). Mea- surement of ITD likely depends on detection of the envelope coherence, especially as the auditory system becomes insensitive to the waveform structure at higher frequencies [100, 98]. The envelope coherence is thought to dominate auditory processing of ITDs above 1300 Hz, while waveform coherence dominates below about 1000 Hz. At other frequencies, the trade-off between timing estimates made in the waveform and the envelope is not well understood. The waveform coherence is relatively straightforward to measure, involving direct manipulation of the measured signals. Measuring the envelope coherence is more complex, and requires transformation of the signals. Therefore, it would be helpful to establish some constraints on the possible values of the envelope co- herence based on measured values of the waveform coherence. To this end, MLS signals were presented to a KEMAR in both a normal listening environment and a highly reverberant space. Recordings of the signals were taken from the KEMAR 83 ears, and the interaural coherences in both the waveforms and the envelopes were calculated in 1/ 3-octave bands. The coherence in the waveform and in the enve- lope of any two signals can be calculated as the maximum of the cross-correlation between those signals or between the envelopes of those signals, respectively. In this experiment, the calculation of the cross-correlation functions was limited to a lag of i1 ms. The average cross-correlation between ”uncorrelated” signals is zero, but not so for the cross-correlation of the envelope of those signals, E(t). Since the enve- lope of a signal involves only positive values, the cross-correlation between two envelopes will always be greater than zero. Van de Par and Kohlrausch [87] note that for a Gaussian noise, the probability density function (PDF) of the envelope of the noise is Rayleigh distributed. Then: (E) = W? (3.1) Here, angled brackets denote expectation values in time. Let 7w be the waveform coherence and 73 be the envelope coherence of two otherwise uncorrelated (i.e. 7w = 0) Gaussian noises. Let c E L E R be the cross-correlation between the envelopes of these noises. Since Equation 3.1 holds for both noises, 2 cELER = (%fi) = 3 (3.2) 4 Thus, for uncorrelated noises, '73 = 77f. Bernstein suggested a power law relationship between the waveform and en- velope coherence [11]. A relationship between envelope coherence and waveform coherence may be expected to have the form: 71' "Ye : Z + [7an (33) Note that n is not necessarily an integer. This equation allows for 78 = %- when 7w = 0. Since both ”m and 78 must be bounded on a range from 0 to 1, it is ex- pected that b = 1 — %. Bernstein [11] fitted generated Gaussian noise signals to an 84 equation of the form of Equation 3.3 without constraining the value of parameter b and found fitting parameters of b = 0.2142 and n = 2.2. Note that 1 — % = 0.2146 is quite close to Bernstein’s leading multiplicative constant (0.19% difference). The results of van de Par and Kohlrausch [87] should hold true for measure- ments made with white Gaussian noise. The current experiment involves the mea- surement of coherence in 1 / 3-octave bands through KEMAR in various acoustical environments using MLS. Though MLS was used and not white Gaussian noise, the distribution of amplitudes of a MLS after being passed through a gammatone filter appear to be Gaussian, as shown in Figure 3.1. Moreover, this experiment was performed in a real room, and so some deviation from Equation 3.3 is expected. The goal of this experiment was to compare the coherence relationships measured in real rooms with those found using idealized noise generated on a computer and to arrive at a general equation for predicting the enve10pe coherence of two signals given their waveform coherence. 3.1 Computer Simulation of Noises and Coherences It is possible to create a simple but accurate simulation that will provide a basis for the expected relationship between waveform and envelope coherences. Gaussian white noises can be generated on a computer and manipulated such that their waveform coherence is approximately equal to some desired value. Their enve- lope coherence can then be calculated per Equation 2.5. The process of generating these noises begins with two independent, identically-distributed Gaussian white noises, no and n L- The signal nL represents the signal in the left ear. The signal in the right ear, n R is then generated using: nsz1—a2no+anL (3.4) 85 12 P IA- 4”“ \ o EL n- P "1‘ l O I L. .I. \ 5‘. I \ C°° I I mo." _ 3° .2 I ..|. 8“: I I o I l “>’ + T 13¢ ’ l 00" l\ .— EO ’ I .I. 0: T \ (\l I I O I I I * T I I -l -0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 Signal Amplitude Figure 3.1. A frequency histogram showing the distribution of the amplitude of samples in an order-18 MLS after the signal was passed through a gammatone filter with center frequency 500 Hz and a 1 /3-octave bandwidth. The histogram is enveloped by a Gaussian distribution with zero mean, standard deviation 0.229, and scaled by a factor of 0.067, shown as a dashed solid line. This attests to the Gaussian nature of the filtered MLS. 86 The parameter a is a constant, bounded on the range of [0, 1]. Note that a scaled Gaussian random variable is still Gaussian, and the sum of two Gaussian random variables is also Gaussian. Thus, n R is Gaussian distributed. The cross-correlation function of Equation 2.3 between n L and n R can be written as: foTnR(t) "LU + T)dt \[for "Rmzdt foT "L(t)2dt The integrals in the cross-correlation function can be simplified if it is assumed that CIILnR = no and n L are uncorrelated. It is simple then to show that: C _ a fOT nL(t) nL(t + T)dt nLnR fOT nL(t)2dt Thus the cross-correlation function becomes simply the autocorrelation function of n L, multiplied by a. The waveform coherence is the maximum value of this function. Since the normalized autocorrelation of any real signal is 1, it becomes clear that a is the desired waveform coherence. Thus, by choosing a, one can ap- proximately specify the waveform coherence of n L and nR. It should be noted that in practice, no and n L will not be completely uncorrelated if generated ran- domly. However, in this simulation, noises with a wide range of coherences will be generated, and the exact value of the coherence for each individual noise pair is unimportant. To create a set of noise pairs with a wide range of coherences thoroughly cov- ering the range of possible waveform coherences, 1001 values of It were used be- tween 0 and 1. For each value of a, a pair of Gaussian noises were generated and treated in the manner described above to create a ”left” and a ”right” signal. These noises were filtered by 1/3-octave band gammatone filters with ISO center fre- quencies between 160 and 10000 Hz and their waveform and envelope coherences were calculated within each band. Then, a nonlinear regression was performed on the coherence data, fitting them to an equation of the form of Equation 3.3 but with 87 a restricted multiplicative constant: 7T 7T n=Z+U-ZMJ as The nonlinear regression yielded a best-fit value for the power parameter n. A good measure of how well the data fit Equation 3.5 within bands is the RMS error between the data and the fitting equation. These simulations resulted in a set of 1001 waveform-envelope coherence pairs per 1 /3-octave band, and a best fit to Equation 3.5 was found. The data and best fitting lines for six representative bands are shown in Figure 3.2. A complete set of values for the power parameter n and its respective RMS error is shown in Ta- ble 3.1. Two are particularly worth noting in the data gathered from the simulated Gaussian noise pairs. First, the value of n does not vary significantly across fre- quency bands and has an average value of about n = 2.11. This value is quite close to that determined by Bernstein [11] (n = 2.2). A derivation of the analytic relationship between the enve10pe and waveform coherences is difficult to obtain. Van de Par and Kohlrausch made progress in this derivation (though it was not their original intent), which led to the 7r / 4 factors in Equation 3.5 [88]. Further progress in arriving at an analytical derivation of this relationship is complicated by the nature of the integrals, which involve the absolute value function when cal- culating the envelope coherence (see Equation 2.5). It is tempting then to theorize that perhaps a value of exactly n = 2 might result from an analytic derivation of the envelope coherence as a function of waveform coherence. Given the results of the simulated coherences however, the value of n is statistically almost certainly not 2 (one-sample t-test, N = 19, t = 32.75, p < 0.001). The spread of the data about the best fit curve is larger at low values of the waveform coherence. The extent of this deviation is related to the RMS error. Both Figure 3.2 and Table 3.1 show that the deviation of low-coherence points from the 88 1.00 0.96 ' 0.92 b 0.88 l 0.84 0.80 0.76 1 .00 0.96 0.92 73 0.88 ' 0.84 ' 0.80 0.76 1 .00 0.95 0.92 0.88 0.84 0.80 0.78-.°:,.,__.,+ ., 0.00 0.25 0.50 0.75 1.00 025 0.50 0.75 1.00 7w 7w 6300 Hz n = 2.12 V r V 'ITtv' l 1111l+11 ' I fl 0 , , 8 - 0 ° ° .. . O I N AllelAiAljl U l A Figure 3.2. Plots of envelope coherence versus waveform coherence for simulated Gaussian noise pairs in six different 1/3-octave bands. Each band contains 1001 coherence pairs, each of which is a single data point. A best fitting line to Equa- tion 3.5 is shown as a thick solid line in each plot, though it may be obscured by the data points. The value of the power parameter n for each set of points is shown in each panel. 89 Frequency (Hz) n RMS Error 160 2.11 i 0.048 0.020 200 2.08 i 0.045 0.019 250 2.08 :1: 0.040 0.017 315 2.11 :1: 0.039 0.016 400 2.10 :1: 0.038 0.016 500 2.12 :1: 0.035 0.014 630 2.10 :1: 0.032 0.013 800 2.11 :1: 0.029 0.012 1000 2.10 :1: 0.027 0.011 1250 2.12 :1: 0.024 0.011 1600 2.10 :1: 0.023 0.0095 2000 2.13 i 0.021 0.0086 2500 2.11 :1: 0.019 0.0079 3150 2.13 :1: 0.017 0.0069 4000 2.11 :1: 0.015 0.0062 5000 2.13 i 0.014 0.0057 6300 2.12 :1: 0.012 0.0052 8000 2.12 :1: 0.011 0.0046 10000 2.12 i 0.011 0.0045 Avg. 2.111 :1: 0.0071 0.011 Table 3.1. Values of the power parameter n from a fit of the coherence data within each 1 /3-octave band to Equation 3.5 for randomly generated Gaussian noises. The bounds about each 11 value are 95% confidence intervals. As expected, the RMS error decreases as the center frequency, and thus the width of the noise band, increases. The errors indicate a general trend toward a better fit to Equation 3.5 with increasing frequency. 90 0.02 0.01 log RMS Error i l l l l l 1 j Ii 1 l l l l l l l i 0.005 100 1000 10000 Frequency (Hz) Figure 3.3. RMS Error as a function of center frequency. The trend line shows a decrease in RMS Error proportional to 1 /f0'35. best fit curve decreases with increasing frequency. The RMS error has a high neg- ative correlation with the center frequency of the band (r = —0.814). The RMS error as a function of center frequency is plotted in Figure 3.3, and can be fit to an equation of the form: 0.17 RMS EI'I'OI' = fO'—35‘ The results of these simulated coherences forms a basis for comparison to coher- ences measured in real rooms. 3.2 Methods To compare the relationship between waveform and envelope coherences in a real room, a KEMAR manikin and Mackie HR824 studio monitor, seated on a padded chair so as to be at the ear level of the KEMAR, were situated in a room. The Mackie 91 HR824 is a powered, high-accuracy, two-way loudspeaker with an especially flat response curve over a wide frequency range, chosen for these properties. Two rooms were used. One was a fairly Open lab space, 6.5 m by 7.5 m by 4.5 m high (this is room 10B, cited in Hartmann, et al. [51]). The broadband RT60 of room 10B was approximately 0.8 s. The other room was a large reverberant chamber, with dimensions 7.67 m by 6.35 m by 3.58 m high (the same reverberation room as used by Hartmann, et al. [51]). The broadband RT60 of the reverberant room was approximately 2.0 s. A set of four distances between the KEMAR ears and the loudspeaker was used: 0.5, 1.0, 3.0, and 5.0 meters. The KEMAR and loudspeaker were always situated so as to be ”facing” each other. For each measurement, a different order-18 MLS was played through the loudspeaker at a sampling rate of 195, 312.5 Hz (set by the TDT hardware). The response was recorded through the ears of the KEMAR on two channels (left and right). Then, the interaural waveform and envelope coherences between the left and right ears were calculated in nineteen 1 / 3-octave bands from 80 to 10, 000 Hz with ISO standard center frequencies. This constituted a single trial at a given distance. In subsequent trials, the procedure was repeated, but the KEMAR and loudspeaker were moved to different places within the room, maintaining the same fixed distance between them. For each distance, 11 trials were taken, each at a different place within the room. This resulted in 11 measurements of interaural coherence within each 1/3-octave band for each room for a total of 19 x 11 = 209 pairs of measurements of envelope and waveform interaural coherence. Nonlinear regression to Equation 3.5 was then performed in a manner similar to that done for the simulated noises. This was done for each distance within each room both within and across bands. Errors in the measurement of waveform coherence occurred when the peak in the cross-correlation function did not occur at the ”correct” ITD. Problems of this 92 kind occurred most often at high frequencies, where multiple peaks in the cross- correlation function have similar heights. Also, as noted in the introductory pas- sage on the cross-correlation function and coherence measurements, the temporal resolution of the cross-correlation function is limited by the sampling frequency. At higher frequencies the true peak of the cross-correlation function is more liekly to fall in between samples and falls off fast enough that a nearby peak, which has a similar amplitude and may occur closer to a sample, may be mistaken for the peak of the function. This is known as a ”period error” since a peak one or more periods away from the true peak is the largest. In practice, the difference between the am- plitudes of the measured peak and the true peak in the case of a period error was very small. They were detected and corrected for by looking at the estimate of ITD that comes from such measurements and determining if it is very different than the estimate that was given by the next lower frequency band. This relies on the lowest frequency bands being free of such errors, but period errors are not possible in low-frequency bands since the lag of the cross-correlation function is limited to :1:1 ms. Such errors were far fewer in the measurement of the envelope coherence. 3.3 Analysis of Results Within Bands For each distance, in each room, and within each 1 /3-octave band, the behavior of the relationship between waveform and envelope interaural coherence varied significantly across different locations of the KEMAR and loudspeaker. Plots of the data taken in twelve of the nineteen bands for each distance (0.5, 1.0, 3.0, and 5.0 meters) in each room (10B and the reverberant room) are shown in Figures 3.4 through 3.11. Equation 3.5 fitted the data remarkably well in most cases. There are, however, large deviations of the data from the predicted trend as distance from the loud- 93 108 at 0.5 m 1.000 ' ' '«1.000-‘ ' ' '- 1.000 ' ' ‘ ‘ 200 Hz 250 Hz 400 Hz . n82.29 . n=2.39 1 082.26 9 0.993 - 8 0.993 - 8 0.993 - o o r . * 0.985 P. . - .8 0.985 ~. . . - .8 0.985 8. - . . 8 0.970 0.983 0.995 0.970 0.983 0.995 0.975 0.985 0.995 0.995 I I '8 0.990 ' ' T r 0.990 - ' ' I 630 Hz 1000 Hz 1250 Hz . n32.22 . , n=2.26 . . 1182.31 0.980 - . 0.972 - 8 0.983 - O 0.965 P. - . - .8 0.955 -. - . - 0.975 r. . . - .8 78 0.930 0.955 0.980 0.900 0.938 0.975 0.950 0.963 0.975 0.990 ' ' 20.990 r r ' ' ‘80.990r ' T 1600 Hz 2000 Hz 3150 Hz . n=2.24 . . n=2.29 . “.224 . . 0.980 L t 0.980- « 0.977 - o b. q I. 0.970 r. . . - . 0.970 ' . . - .8 0.965 -. . . . . 0.940 0.958 0.975 0.940 0.960 0.980 0. 0.950 0.975 0.990 I r '8 0.995 .' I E '8 0.995 T 1 ' ' 8000 Hz . 10000 Hz . 1181.92 ' . 1 . 031.93 0.987 - 8 0.990 - 8 0.985 t o 10 O 0.984 t. - . - 8 0.985 8. - . - .8 0.975 . - L - . 0.965 0.972 0.980 0.965 0.975 0.985 0.945 0.985 0.985 7w Figure 3.4. Plots of waveform coherence versus envelope coherence as measured through KEMAR in room 103 at a distance of 0.5 m to the loudspeaker. Each panel shows the data measured in a particular 1/3-octave band with center frequency indicated in the top left comer of the panel. A best fit line is shown in each panel, which fits the data within the indicated band to Equation 3.5. The best fit value of the power parameter n for the data in each band is shown in the upper left of each panel. Note that the horizontal and vertical scales differ in each panel. 94 108 at 1m 1.000 8' ' ' ' '8 1.000 -' ' ' 1 '8 1.000 8‘ T ' 200 Hz , 250 Hz 400 Hz . n=2.09 . . n=2.06 . n=2.28 . 0.990 . . 0.970 8 8 0.982 8 0 8 0.980 -. o- . - .109408. . . . .8 0.965 8. . . - .8 0.950 0.975 1.000 0.880 0.940 1.000 0.930 0.963 0.995 0.990 8' ' ‘ ' 8 0.985 8‘ ' ' ‘8 0.985 -' ‘ ‘ 630 Hz 1000 Hz 1250 Hz . n=2.23 . . n=2.28 n=2.15 1 0.965 - o - 0.942 - 8 0.940 - 8 I I 1 b O 0.940 8. . . . .8 0.900 _ . . .8 0.895 8. . . . - 73 0.870 0.923 0.975 0.780 0.865 0.970 0.735 0.850 0.985 0.985 ' ' f- 0.985 8‘ ' ' '8 0.990 8' ‘ ' ' '8 1600 Hz 2000 Hz 3150 Hz 1 n=2.14 . n=2.16 . n=2.19 . 0.953 . 8 0.948 8 8 0.965 8 8 1 . . 0.920 8. . . - 48 0.910 _ L _ .8 0.940 8. - - - .8 0.805 0.882 0.960 0.785 0.862 0.960 0.865 0.920 0.975 0.990 -' ' ' ' ’- 0.995 8‘ ' ' * '8 0.995 8‘ ‘ ‘ ' ‘ 5000 Hz 8000 Hz 10000 Hz . F2021 8 ’ "=20is 1 ":10” 0.970 - 8 0.983 - 8 0.983 - I 1 - 1 O 0.950 8. . . . .8 0.970 - _ . _ .8 0.970 8. - . . .8 0.895 0.935 0.975 0.940 0.965 0.990 0.930 0.958 0.985 791 Figure 3.5. Plots of waveform coherence versus envelope coherence as measured through KEMAR in room 10B at a distance of 1.0 m to the loudspeaker, in a manner similar to that of Figure 3.4 95 1.000 8' ' '8 0.985 8' ' '8 0.985 8' ‘ ' 200 Hz 250 Hz 400 Hz . 7182.09 I . n=2.16 ' I 0.980 8 8 0.965 - 8 0.938 8 8 O 0.960 . .8 0.945 - . - . 0.890 8. - . - .8 0.920 0.960 1.000 0.870 0.917 0.965 0.725 0.845 0.965 0.960 8 0.915 8' ' ' '8 0.895 8' ' ' '8 1000 Hz 1250 . n=2.06 I . n82.14 0.907 8 - 0.863 8 8 0.857 8 ’ O O 0.855 8. - . - . 0.810 - . - .8 0.820 .f - . . .- 76 0.595 0.753 0.910 0.370 0.578 0.785 0.425 0.578 0.730 0.880 -‘ ' ' 0.885 87 ' ' '8 0.905 8' ‘ ' ' '8 1600 Hz 2000 Hz 3150 Hz . 082.02 . 1182.12 . n82.12 O 0.835 8 8 0.847 8 8 0.860 8 8 1 0.790 8 _ . - .I 0.810 8. . . - -8 0.815 8. - . - .8 0.125 0.398 0.670 0.380 0.535 0.690 0.415 0.580 0.745 0.880 8' ' ' '8 0.940 8' ' ' ' 0.915 8' ' ' ' '8 5000 Hz 8000 Hz 10000 Hz . n82.10 . 1132.16 . 1032.10 0 I 0.847 8 - 0.907 8 8 0.893 8 . F I . 1 0.815 . . . - .8 0.875 8. - . _ .8 0.870 8: z z - .8 0.400 0.533 0.665 0.670 0.763 0.855 0.640 0.710 0.780 7W 103 at 3 m Figure 3.6. Plots of waveform coherence versus envelope coherence as measured through KEMAR in room 10B at a distance of 3.0 m to the loudspeaker, in a manner similar to that of Figure 3.4 96 0.995 . 0.965 0.935 - 0.985 .' 250' HZ. . n=2.32 8 0.965 .. .8 0.945 - 103 015 m 0.985 ' 80.958- .8 0.930 - 0.855 ‘ 0.923 ‘ 0.990 0.870 A 0.917 4 0.965 0.830 ‘ 0.900 0.970 0.960 ' 0.865 8' ' ' 0.830 ' ' ' 8 1000 Hz 1250 Hz 1 081.96 L 082.13 'I 0.907 . 0.855 8 8 0.825 F 8 0.785 8 8 0.808 t 8 0.785 . 7e 05 A 0.753 ‘ 0.910 0% ‘ 0.325 ‘ 0.600 0.135 A 0.296 ‘ 0.460 . 8' ' ' . ‘ ' ' . ' ' ' 48 ”‘5 1600 Hz 034512000 Hz ”5‘“ 3150 Hz . . n=2.07 . n=2.08 . 1082.07 0.820 - 8 0.818 V 8 0.833 - . 0.795 t. . . - .8 0.790 1‘ - . . .8 0.805 - 9 . . .8 0.255 0.403 0.550 0.220 0.378 0.535 0.330 0.457 0.585 0.815 ' ' '8 0.8608' ' f ' '8 0.845 8' ' 7 ' ’ 5000 Hz 8000 Hz 10000 Hz . n=2.08 .1 . n=2.14 . n=2.11. O 0.803 . . 8 0.840 - 8 0.825 - g C . . , L 0790': . . .8 0.820 . . . . .8 0.805 8.. . . . ' 0.230 0.300 0.370 0.425 0.510 0.595 0.340 0.440 0.540 7W Figure 3.7. Plots of waveform coherence versus envelope coherence as measured through KEMAR in room 103 at a distance of 5.0 m to the loudspeaker, in a manner similar to that of Figure 3.4 97 RR at 0.5 m 0.990 8' ' ' ' "8 0.995 8' ' ' ' '8 0.990 8 200 Hz . 250 Hz . 1182.17 . 0 I . 1182.38 .I 0.983 8 o 8 0.988 8 o 8 0.983 8 0.975 . . . . 0.980 . .8 0.975 - 0.945 0.963 0.980 0.965 A 0.975 A 0.985 0.945 ‘ 0.963 ‘ 0.980 0.950 -' ' ‘ I r. 0.940 8' ' ' ‘8 0.940 .1 ' ' '8 630 Hz 1000 Hz . 1250 Hz . n82.21 . I . 1182.16 I . 1182.11 0 0.930 - 8 0.915 - ‘1 0.910 P 8 O . O 0.910 - . J . .8 0.890 - - . z .8 0.880 -. - . - . 79 0.790 0.838 0.885 0.725 0.793 0.860 0.690 0.770 0.850 0.930 8‘ ' ' ' '8 0.950 . ' ' ' '8 0.940 ' ' T 1600 Hz . 0 2000 Hz 3150 Hz . 11821.? I . 1182.18 . n82.17 O 0.907 - 8 0.930 - 8 0.928 . ‘ ' co '0 0.885 - - . - .8 0.910 - - 4 . .- 0.915 -. - . . .8 0.725 0.770 0.815 0.775 0.830 0.885 0.790 0.823 0.855 0.970 8' ' ' '8 0.985 8' ' ' ' '8 0.985 8' ' ' ' '8 5000 Hz 8000 Hz 10000 Hz . 1182.16 0 I . 1182.16 . I . 1182.11 I 0.960 . 8 0.980 - 8 0.978 - 8 O O 0.950 - . . . 8 0.975 -. . . - .- 0.970 -. - . - -8 0.8 0.907 0.925 0.945 0.952 0.960 0.935 0.947 0.960 7w Figure 3.8. Plots of waveform coherence versus envelope coherence as measured through KEMAR in a reverberant room at a distance of 0.5 m to the loudspeaker, in a manner similar to that of Figure 3.4 98 RRat1m 0.995 1‘ ‘ r ' '8 0.990 .' 0.970 ' ' ' 400 Hz 0.975 t 8 0.960 - 0.930 - 0.955 8 - . . .8 0.930 . - - . . 0.890 8. - . . 8 0.895 0.942 0.990 0.835 0.903 0.970 0.720 0.825 0.930 0.925 0.860 8‘ ' ' ' '8 0.835 . ' ‘ ' ‘8 1000 Hz 1250 Hz . 1 "32.12 P 032.07 . o . I . 0.885 - 0.833 - 8 0.817 . ‘8' .. I 0.845 . o - - - .8 0.805 -. - O . - 8 0.800 89 - . - 8 79 0.520 0.657 0.7 0.355 0.472 0.590 0.290 0.380 0.470 0.840 8' ' " ' 0.820 ' ' ' ‘ '8 0.875 8' ' ' 'I 1600 Hz 2000 Hz ° 3150 Hz . 1182.00 0 . 1182.14 I . n82.05 O 0.8258 8 0.805 8 . 80.857 8 1 . b 0.810 8. - . - .8 0.790 8.0 - - - 8 0.840 8. - . . 8 0.340 0.420 0.500 0225 0.320 0.415 0520 0.580 0.640 0.870 ' 0.925 ‘ 8 0.920 v ' 5000 Hz ' 8000 Hz ’ 10000 Hz . 1182.11 . n=2.21 . r182.“ 0.855- 8 0.918 8 0 80.910 8 .0 o 0.8408 - . - -80.9108.-’ . . - 80.9008. . - - .8 054 0.590 0.635 0.785 0.805 0.825 0.745 0.770 0.795 711 Figure 3.9. Plots of waveform coherence versus envelope coherence as measured through KEMAR in a reverberant room at a distance of 1.0 m to the loudspeaker, in a manner similar to that of Figure 3.4 99 RRa’13m 8 0.975 0.995 8 8 0.963 8 8 0.905 - . 8 0.930 - . - .8 0.835 8. - . - 0.885 0.933 0.980 0.845 0.915 0.985 0.465 0.700 0.935 - ‘ f 1. v r 0.895 0.860 0.810 '250 Hz . 1181.89 ' 1000 Hz . n81.890 0.847 8 0.8m . _ . . . 0.785 8. - . - .8 0.785 8. - . 78 0.325 0.528 0.730 0.100 0.330 0.560 0.085 0.185 0.815 8' ' ' 8 '80.805 . ' ' v '8 0.8108 ' F ‘ '- 1500 Hz ’ 2000 Hz 8150 Hz . 8 $249 4 8 "850% ' . O . d 0.800 . 0.790 8 M 8 0.8008 c . 8 O . . . . O 0.785 8. - . - .8 0.775 8 - . - .8 0.790 8. . . - . 0.075 0.210 0.845 0.050 0.198 0.845 0.160 0.245 0.880 0.800 F—‘ ' ' ' 0.825 8' ' ' 510.825 8‘ ' ' ' '8 ' ' l 8000 Hz 10000 Hz 5 . 5000 H2 . n=2.09 . . 1182.06 .. O 0.7938 8 0.810- , " 80.8188 8 . O L 4 h . 1 . . 4 O 0.785 82 - . _ .8 0.795 8. , . . .- 0.800 8. - '. .' .8 0.090 0.168 0245 0270 0.848 0.425 0.280 0.820 0.360 7111 Figure 3.10. Plots of waveform coherence versus envelope coherence as measured through KEMAR in a reverberant room at a distance of 3.0 m to the loudspeaker, in a manner similar to that of Figure 3.4 100 RRatSm 0.990 8' 8 0.990 8' ' ' ' 0.960 8 250 Hz . 1182.24 0.960 8 8 0.945 8 8 0.918 8 4 8 0.930 - . . 48 0.900 8 - . - .8 0.855 8. - . . ' 0.840 0.907 0.975 0.760 0.668 0.975 0. 0.778 0.955 0.900 8' ' ' ' 8 0.810 8 ' ' ' ¢. 0.815 8 ' ' ' 630 Hz 1000 Hz ' 1250 Hz 7 . n82.12 ' . 1 1182.03 . n81.64 . 0.865 8 8 0.798 8 8 0.800 8 8 9 o . . 0.880 8. . . - 0.785 8. - . . .8 0.785 8? - . - .8 73 0.490 0.615 0.740 0.055 0208 0.840 0.075 0.185 0295 v 0.8058 1600 Hz £08108 2000'1-12 , 80.8108 3150 Hz 11182.03 . My .n=1.77 0 .. O 0.795 5 8 0.785 .’ o 80.795 8. . r O C C 0.785 8. - . 8 0.7608 . . - - 0.780 0.075 0.178 0.280 0.080 0.195 0.310 0.040 ' 0.167 A 0.295 o""’5'5000Hz , '°'°°5'8000Hz ‘°‘°'°' 10000Hz ..n81.62 . . . . n82.03 0.7958 '8 0.795 ' .108008 4 r “ 0.7858. . . . .8 0.785. . . . .8 0.7908. . . - 8 0.060 0.138 0215 0.130 0220 0.810 0.170 0.253 0.885 7111 Figure 3.11. Plots of waveform coherence versus envelope coherence as measured through KEMAR in a reverberant room at a distance of 5.0 m to the loudspeaker, in a manner similar to that of Figure 3.5 101 speaker and reverberation times increase. In room 10B, the data deviate far from the best fit of Equation 3.5 in some bands (e.g. 1250 Hz) when the distance is in- creased to 5.0 m (Figure 3.7). In the reverberant room, some data differ noticeably from the fitting equation even at the relatively short distance of 1.0 m. Deviation from the fitting equation tends to be greater in situations where the waveform co- herence is small, as is more often the case in reverberant environments. This can be seen in the reverberant room data in, for example, a distance of 5.0 min every 1 / 3-octave band above 630 Hz. The values of the power parameter n and the RMS error of the fit for each con- dition and each 1/3-octave band can be found in Tables 3.2 through 3.5. Several things can be noted immediately. First, most values of the RMS error are quite small. On average, percents error increase with distance from the sound source and are larger in the reverberant room. The largest single RMS error, 0.016, was measured in the 800 Hz band in room 103 at a source distance of 5.0 m. On av- erage, the largest errors occur in the reverberant room at a distance of 5.0 m. In general, even the largest deviations between measured data and curve fits are rel- atively small. RMS errors, and thus deviations of the data from the model of Equation 3.5, tend to get larger as the distance to the source increases. In room 103, RMS errors are significantly larger at 3.0 m than at 1.0 m (one-sided, two-sample t-test, t = —5.61, df = 24, p < 0.001) and larger at 1.0 m than at 0.5 m (t = —3.64, df = 20, p = 0.001). However, there is no significant difference between the errors at 3.0 m and at 5.0 m. In the reverberant room, the errors are smaller at a distance of 0.5 m than at 1.0 m (t = —3.52, df = 28, p = 0.001), but there is no significant difference between the errors at distances of 1.0, 3.0, and 5.0 m (One-way ANOVA, P (2, 56) = 0.36, p = 0.700). This is likely because errors do not increase appreciably in highly reverberant environments once the sound source is in the diffuse sound field. The 102 Room 108 Distance 0.5 m 1.0 m Frequency (Hz) 11 RMS Error 11 RMS Error 160 2.03 :1: 0.089 0.00056 2.09 i 0.17 0.0011 200 2.29 :1: 0.13 0.00073 2.06 :1: 0.19 0.0014 250 2.39 :1: 0.078 0.00036 2.22 :t 0.0940 0.0011 315 2.07 i 0.13 0.000048 2.25 :1: 0.16 0.0012 400 2.26 i 0.16 0.00078 2.28 :1: 0.14 0.0014 500 2.28 i 0.068 0.00048 2.08 :1: 0.049 0.00058 630 2.22 :1: 0.078 0.00089 2.23 :1: 0.12 0.0026 800 2.31 :1: 0.12 0.0011 2.07 :t 0.11 0.0045 1000 2.26 :1: 0.11 0.0014 2.28 :1: 0.083 0.0026 1250 2.31 i 0.073 0.00077 2.15 :l: 0.12 0.0040 1600 2.24 :1: 0.077 0.00094 2.14 :1: 0.067 0.0020 2000 2.29 :1: 0.085 0.00096 2.16 :1: 0.077 0.0027 2500 2.24 :1: 0.024 0.00035 2.26 :1: 0.057 0.0019 3150 2.24 :t 0.036 0.00042 2.19 :1: 0.039 0.00087 4000 2.18 :1: 0.051 0.00047 2.22 :1: 0.055 0.0010 5000 2.18 i 0.071 0.00059 2.21 :1: 0.054 0.0012 6300 2.15 :1: 0.094 0.00091 2.13 :t 0.052 0.0012 8000 1.92 :1: 0.18 0.0012 2.16 :1: 0.079 0.00096 10000 1.9 i 0.20 0.0018 1.99 :1: 0.086 0.0013 Avg. 2.20 :t 0.062 0.00080 2.17 :1: 0.040 0.00178 Table 3.2. Values of the power parameter n with 95% confidence intervals for the data in each 1/3-octave band measured in room 108 at distances of 0.5 and 1.0 m. These values were found from the best fit of the data within that band to Equa- tion 3.5, and the RMS differences between the data and the best fitting line are given. 103 Room 108 Distance 3.0 m 5.0 m Frequency (Hz) n RMS Error n RMS Error 160 2.1 :l: 0.23 0.0020 2.11 :l: 0.12 0.0016 200 2.25 :l: 0.12 0.0014 2.20 :l: 0.14 0.0022 250 2.11 :l: 0.13 0.0024 2.32 :1: 0.075 0.0013 315 2.13 i 0.15 0.0034 2.00 :l: 0.14 0.0029 400 2.16 i 0.12 0.0042 2.14 i 0.11 0.0033 500 1.94 :l: 0.21 0.0098 1.86 i 0.11 0.0051 630 2.13 i 0.17 0.0080 2.0 :l: 0.20 0.010 800 1.99 :l: 0.13 0.0064 2.1 :l: 0.37 0.016 1000 2.06 i 0.11 0.0052 2.0 :l: 0.23 0.0081 1250 2.14 :l: 0.13 0.0065 2.1 i 0.37 0.012 1600 2.0 :l: 0.28 0.0096 2.1 :l: 0.20 0.0086 2000 2.12 :l: 0.13 0.0065 2.1 :l: 0.30 0.012 2500 2.03 :l: 0.19 0.0088 2.0 :l: 0.27 0.0064 3150 2.12 :l: 0.14 0.0069 2.07 :l: 0.15 0.0070 4000 2.12 i 0.076 0.0039 2.19 i 0.083 0.0034 5000 2.10 i 0.089 0.0044 2.1 :l: 0.21 0.0059 6300 2.07 :t 0.094 0.0050 2.13 :l: 0.16 0.0050 8000 2.16 2!: 0.057 0.0026 2.14 :1: 0.053 0.0025 10000 2.1 i 0.062 0.0031 2.11 :l: 0.11 0.0050 Avg. 2.10 :I: 0.027 0.0053 2.09 :1: 0.050 0.0063 Table 3.3. Values of the power parameter n with 95% confidence intervals for the data in each 1 / 3-octave band measured in room 103 at distances of 3.0 and 5.0 m. These values were found from the best fit of the data within that band to Equa- tion 3.5, and the RMS differences between the data and the best fitting line are given. 104 Reverberant Room Distance 0.5 m 1.0 m Frequency (Hz) n RMS Error n RMS Error 160 2.2 :l: 0.28 0.0028 2.10 :l: 0.11 0.0010 200 2.17 :l: 0.17 0.0018 2.12 :1: 0.080 0.0011 250 2.38 i 0.093 0.00062 2.17 :t 0.11 0.0027 315 2.24 :l: 0.13 0.00093 2.15 :l: 0.16 0.0028 400 2.16 :t 0.13 0.0015 2.1 :l: 0.25 0.0073 500 2.30 i 0.13 0.0029 2.32 :l: 0.15 0.0051 630 2.21 :l: 0.087 0.0033 1.96 i 0.17 0.0090 800 2.10 :l: 0.15 0.0050 2.2 :t 0.27 0.012 1000 2.16 :l: 0.12 0.0049 2.1 i 0.23 0.011 1250 2.11 :t 0.10 0.0023 2.07 :t 019 0.0077 1600 2.17 :l: 0.12 0.0052 2.0 i 0.23 0.010 2000 2.18 :t 0.085 0.0033 2.1 :l: 0.33 0.012 2500 2.20 :l: 0.073 0.0032 2.08 :l: 0.19 0.0089 3150 2.17 :l: 0.053 0.0020 2.05 i 0.071 0.0038 4000 2.16 :t 0.033 0.00098 2.12 :l: 0.10 0.0053 5000 2.16 i 0.066 0.0016 2.11 :1: 0.080 0.0042 6300 2.17 :1: 0.036 0.00077 2.14 :t 0.084 0.00043 8000 2.16 i 0.074 0.00097 2.21 :1: 0.033 0.0014 10000 2.11 :1: 0.078 0.0011 2.11 :1: 0.046 0.0021 Avg. 2.19 i 0.032 0.0025 2.12 :1: 0.040 0.0059 Table 3.4. Values of the power parameter n with 95% confidence intervals for the data in each 1/3-octave band measured in a reverberant room at distances of 0.5 and 1.0 m. These values were found from the best fit of the data within that band to Equation 3.5, and the RMS differences between the data and the best fitting line are given. 105 Reverberant Room Distance 3.0 m 5.0 m Frequency (Hz) n RMS Error n RMS Error 160 2.0 :l: 0.25 0.0030 2.1 :l: 0.22 0.0045 200 2.30 i 0.081 0.0012 2.22 i 0.16 0.0034 250 2.27 :1: 0.092 0.0017 2.24 i 0.11 0.0026 315 2.0 :l: 0.27 0.0077 2.16 :l: 0.10 0.0028 400 1.9 :l: 0.21 0.0085 2.1 i 0.21 0.0074 500 2.36 :t 0.18 0.0076 2.1 :l: 0.24 0.011 630 2.2 :I: 0.21 0.0094 2.1 :l: 0.24 0.012 800 1.8 :t 0.22 0.0096 2.0 :l: 0.34 0.014 1000 1.9 :l: 0.36 0.015 2.0 :l: 0.44 0.011 1250 1.9 :l: 0.48 0.012 1.8 i 0.45 0.016 1600 2.0 :l: 0.43 0.0081 2.0 :l: 0.46 0.0099 2000 2.2 :l: 0.45 0.0080 2.2 :l: 0.51 0.011 2500 2.1 :l: 0.49 0.0079 2.1 :t 0.30 0.0056 3150 1.95 :l: 0.19 0.0056 1.8 :l: 0.23 0.0056 4000 1.8 i 0.23 0.0062 1.9 :l: 0.37 0.0078 5000 2.0 :l: 0.40 0.0081 1.6 :l: 0.22 0.0054 6300 2.0 :l: 0.30 0.0049 1.6 :l: 0.30 0.0076 8000 2.09 i 0.093 0.0034 2.1 :l: 0.36 0.0074 10000 2.06 :t 0.17 0.0059 2.03 :l: 0.19 0.0041 Avg. 2.04 :I: 0.080 0.0070 2.01 i 0.090 0.0079 Table 3.5. Values of the power parameter n with 95% confidence intervals for the data in each 1 /3-octave band measured in a reverberant room at distances of 3.0 and 5.0 m. These values were found from the best fit of the data within that band to Equation 3.5, and the RMS differences between the data and the best fitting line are given. 106 distance from the detector at which the diffuse field ”begins” varies with room, and is expected to scale inversely with reverberation time. In the diffuse field, the sound intensity no longer depends on the source distance to the detector, interaural coherence is generally poor, and localization is quite difficult if not impossible. A reasonable expectation is that the distribution of coherences also does not change appreciably in the diffuse field, and so the related RMS error should plateau at some maximum value, which should depend on the room. For any given source distance, the RMS error tends to be smallest at both the lowest and the highest frequencies, and is greatest at some intermediate frequency. For example, Figure 3.12 plots the RMS error within 1 / 3-octave bands across center frequency for the reverberant room at a distance of 3.0 m. A low RMS error in any particular band is due to the tendency for coherences to be large within that band since Equation 3.5 tends to agree uniformly well with high waveform coherences. At low frequencies, the ears may not be in a diffuse sound field, and so waveform coherences are high. At high frequencies, large waveform coherences tend to occur because the room is more anechoic than at lower frequencies. Average values of the power parameter n and of the RMS error calculated across bands at each source distance for each room are shown in the bottom rows of Tables 3.2 through 3.5. In room IOB, there is no significant difference between the average value of n found at a distance of 0.5 m and that found at 1.0 m (two— . sided, two-sample t-test, t = 0.93, df = 30, p = 0.357), and there is no significant difference between 3.0 m and 5.0 m in this respect (t = 0.37, df = 31, p = 0.711). However, the values of 11 measured at 0.5 m and 1.0 m are significantly larger on average than those measured at 3.0 m and 5.0 m (t = 4.08, df = 70, p < 0.001). Similarly, neither the mean difference between 11 values in each band measured at 0.5 m and at'1.0 m (two-sided, paired t-test, t = 1.07, p = 0.297), nor the dif- ferences between 3.0 m and 5.0 m (t = 0.52, p = .304) are statistically significant. 107 0.010 ..-.1 ......... 44.7.--1‘ . ' i . (L: o E 0 O . t i . E i m 0.005 ----: ------------------------------ a- -------------------- ;o ----- §---- (0 E o E E E - ' i . é . 0 g 1 : O . O 0 ”l """"" L .141 """"" I""I'"."'.".".".".'lm" 100 1000 10000 Frequency (Hz) Figure 3.12. RMS Error as a function of 1/3-octave band center frequency for co- herences measured in the reverberant room at a source distance of 3.0 m. However, the n values at 1.0 m are significantly larger than those at 3.0 m (t = 3.02, p = 0.004). In the reverberant room, the average value of 11 across bands is signifi- cantly larger at 0.5 m than at 1.0 m (t = 2.63, df = 34, p = 0.006) and larger at 1.0 m than at 3.0 m (t = 1.93, df = 25, p = 0.033), but there is no difference on average between 3.0 m and 5.0 m (t = 0.51, df = 35, p = 0.308). These averages and the statistical significance are summarized in Table 3.6, where values of n that are not significantly different within each room are grouped by brackets. Except for the difference on average between 0.5m and 1.0 m, the tendency of n to decrease with increasing source distance is the same as in room 103. It should be noted, however, that this is an average tendency across bands and is not neces- sarily true for individual bands. Consider, for example, the behavior of the 500 Hz band in room 10B. The value of 11 measured in that band monotonically decreases with increasing source distance. The 10, 000 Hz band in room 103, however, shows n values that consistently increase with increasing source distance. Some bands, such as the 250 Hz band, neither increase or decrease monotonically with chang- 108 Distance 103 RR 0.5 2.20 :1: 0.062 2.19 i 0.032 1.0 2.17 i 0.040 2.12 :1: 0.040 3.0 2.10 :1: 0.027 2.04 i 0.080 5.0 2.09 :1: 0.050 2.01 :1: 0.090 } Table 3.6. Summary of average values of the power parameter n calculated across fits to Equation 3.5 within 1 / 3-octave bands in both room 10B and the reverberant room at each distance. Values of 11 within each room that are not significantly different from each other are grouped by brackets. ing source distance. Similarly, there seems to be no clear trend in the values of n with band center frequency. This point is exemplified by Figure 3.13. These plots show values of 11 across frequency and source distance for each room, and betray no apparent trends in the value of n either across frequency or source distance. 3.4 Analysis of Data Combined Across Bands If the data across all 1 / 3-octave bands for a given source distance are combined, they form a more complete picture of the relationship between waveform and en- velope coherence. By pooling all data points in all bands and fitting Equation 3.5 to the combined data, the overall relationship between waveform and envelope co- herence can be examined for each room at each source distance. These combined data and best fitting curves are shown in Figure 3.14, and the best fitting 11 values and RMS percents error are given in Table 3.7. It may be noted that the values of n estimated from the combined data differ little between source distances of 3.0 m and 5.0 m, and are also nearly identical in both room 103 and the reverberant room. Figure 3.14 clearly shows that in more reverberant conditions and at greater source distances, the waveform coherences decrease on average, spreading out over the range of values shown along the abscissa. The data deviate from the best 109 2.5 . a, 2.25 D B > c 2.0 1.7 5 100 1000 10000 Frequency (Hz) U ! I I T W I ! I 1 I r T I I 25' = Reverberqnt Room ‘ 8 2.25 B > c 2.0 1.75 1.5 i l L l l I l l i L l j L l L l l 100 1000 10000 Frequency (Hz) Figure 3.13. A plot of values of n measured in rooms 103 (above) and the rever- berant room (below) as a function of 1 / 3-octave band center frequency. Different lines are plotted for different source distances. There are no clear trends evident in the trends across either frequency or distance to the source. 103 Reverb Distance 11 RMS Error 11 RMS Error 0.5 m 2.22 :1: 0.023 0.0011 2.17 :1: 0.020 0.0030 1.0 m 2.17 :l: 0.020 0.0024 2.11 i 0.035 0.0072 3.0 m 2.09 :1: 0.027 0.0061 2.07 :1: 0.060 0.0088 5.0 m 2.07 i 0.042 0.0079 2.07 :I: 0.068 0.0092 Table 3.7. Values of the power parameter n and the RMS error for the best fit of Equation 3.5 to envelope versus waveform coherence data combined, at each position, across all 1 / 3-octave bands. 110 1.00 Lootu-u-+-v--.‘ 0.90. 0,95; 0.5 m . 0.92. 0.92:. n = 2.17 . 0.00. 0.00; 0.04. 0.04’ 1 0.00_ 0.00 «J 0.70 0.70 - 1.00 1.00 t 0.90} 1 m 0.90 -‘ 0.92 1. n = 2.17 . 0.92 . 0.00L : 0.00? 0.04L 3 0.04 1 0.001 J 000' 1 0.70; ‘ 0.701. - .4 . - . - 1 1.00- 1.00-' ' ‘ ' T ' ‘ ‘1 0.90L 0.90L 1 0.92 » 0.92L . 79 0.88: 0.88:- '3 0.04 - 0.04 - 1 0.001 0.001 - 0.70' 0.701. ‘_ ‘1 1.00 « 1.00 0' - 0.90’ l 0.90L : 0.92. 0.92? '1 0.00' - 0.00 - 1 0.04' 1 0.041 1 0.00 1 0.001 : 0.701....4...l 0.701.-A...-...- 0.00 0.25 0.50 0.7 1.00 0.00 025 0.50 0.75 1.00 7w 711 Figure 3.14. Combined envelope versus waveform coherence data across all 1/3- octave bands measured in rooms 103 and the reverberant room at each source distance. Trend lines show the best fit of the data in each panel to Equation 3.5, and the value of the fitting parameter n is shown in the upper-left corner of each panel. 111 fit to Equation 3.5 more at lower waveform coherences than at greater coherences, especially in the reverberant room at source distances of 3.0 and 5.0 m. In some cases at very low waveform coherence, the envelope coherence even drops below the theoretical minimum of %. 3.4.1 Trimming Data Based on Centrality of Waveform Coher- ences It must be noted that across all conditions tested in this experiment, the actual value of n is rather subtle in its effect on Equation 3.5. Consider, for example, Figure 3.15, in which Equation 3.5 is plotted for three different values of the power parameter n. The values of n chosen for Figure 3.15 are roughly indicative of the range of n values found for regression to combined data per Figure 3.14. Clearly, the difference between the curves for the range of the power parameter from n = 2.0 to n = 2.2 is small over the range of possible waveform coherences (0 to 1). The greatest difference between the curves shown in Figure 3.15 occurs around 7w = 0.62 (although this varies slightly depending on exactly which two curves are being compared). For instance, the greatest difference in envelope coherence between the curves for n = 2.0 and n = 2.2 is A73 2 0.0075. Near 7w 2 0 and 7w = 1, there is an even smaller difference between the curves. Therefore, data with waveform coherences near the boundaries provide little reliable information for determining the best fitting value of n for any given set of data because these points could potentially fit well to curves with widely varying values of n. A more reliable value of power parameter 71 may be obtained for any given data set if only the data with waveform coherences near the middle of the pos- sible range (near ya, = 0.5) are used in the regression to Equation 3.5. A cutoff parameter 1p may be defined and the data treated such that any data points with ’qu < IP or ’qu > (1 — 11)) are not included in the regression procedure. The data 112 1000 :1: .0 I 1 1 I ! I " m n=2.l ‘ 0.90 -"=22< ---------------------- g- ---------------------- ------------------ - 0.92 .. ---------------------- ing --------------- - 0~ 0-83 ”*3} """ g """"""""""" ‘ b t 2 ~34}? : ‘ . . ..0/ . z : ..y/ : 0.04 ry’ ----------------- ;- -------------------- 4 s .0? s ' .00” * 0.00— ------------- ;; W* ---------------------- -------------------- -1 0.70 . 1 . a . 1 . 0 0.25 0.5 0.75 1 7W Figure 3.15. Plots of Equation 3.5 with three different values of the power parame- ter, 71. There is very little difference between the curves for these values of n. that contribute to the regression are then only those near the middle of the curve. For instance, a cutoff parameter of 1p 2 0.1 would remove both the top and bottom 10% of the allowable values of 7w. A cutoff of 1]} = 0 would reject no data and thus return a regression identical to those shown in Figure 3.14. This is depicted for increasing values of 1p in Figure 3.16. Table 3.8 shows the 95% confidence intervals about the mean for values of the power parameter n given the source distance for room 108 and the reverberant room respectively as a function of 1]). When only the central data points are used in the regression, the combined data across 1 /3-octave bands tends toward a fit to Equation 3.5 with n 2 2.09, indepen- dent of source distance and room. Though the width of the related confidence intervals and the RMS percents error increase with increasing 1]), this is likely due to the fact that the best fitting points (those close to ya, = 1) have been eliminated from the fit. However, due to the small differences between the values of Equa- 113 103. 5.0 111 RR. 5.0 m 1.00 1.00 1 0.90. 1 0.92 - 0.00’ 0.04. 0.00 0.70 1.00 0.90. 0.92 0.00 1 0.04. 1 0.80 ' 1 0.70' 0.70;. - . . : . - 1-00 1.00 - ' ‘ ‘ '1 0.90 0.90L - 0.92 0.92’ 1 7e 0.00’ 0.00’ 1 004' 0.04 .1 0.00’ 0.00 1 0.70' 0.70. 1 1.00 1.00 1 0.90’ 0.90 1 0.92' 0.92 1 0.88 0.88 '1 0.04" 0.04' 1 0.80, 0.00' .1 0.701._.,.-.-.10.70l._.......- 0.00 025 0.50 0.75 1.00 0.00 025 050 0.75 1.00 7,, 7., Figure 3.16. Plots of the combined data at a source distance of 5.0 m in room 103 and in the reverberant room. Each panel shows the data remaining after points with waveform coherence 7w < 1]) and those with 7w > (1 — 112) have been re- moved. The best fit of Equation 3.5 to the remaining data is shown as a solid line and the power parameter n is given in each panel. 114 1 Room 10B Reverberant Room 1.0 m 1]) n RMS Error 71 RMS Error 0 2.17 :1: 0.020 0.0024 2.11 :1: 0.035 0.0072 0.1 2.17 :1: 0.037 0.0036 2.11 i 0.039 0.0081 0.2 2.09 :l: 0.15 0.0065 2.10 :1: 0.043 0.0085 0.25 - - 2.09 :1: 0.047 0.0088 3.0 m 0 2.09 :1: 0.028 0.0061 2.07 :1: 0.060 0.0088 0.1 2.09 :1: 0.031 0.0068 2.07 :1: 0.068 0.0097 0.2 2.08 i 0.033 0.0070 2.07 :1: 0.075 0.0099 0.25 2.08 :i: 0.036 0.0072 2.09 :t 0.089 0.011 5.0 m 0 2.07 i 0.042 0.0079 2.07 :1: 0.068 0.0092 0.1 2.07 i 0.047 0.0088 2.07 :1: 0.077 0.010 0.2 2.07 :1: 0.050 0.0089 2.09 :1: 0.090 0.010 0.25 2.09 :1: 0.048 0.0082 2.08 :l: 0.10 0.011 Table 3.8. 95% confidence intervals about the mean of the power parameter n for different source distances in room 103 and the reverberant room. The parameter n was found by nonlinear regression of Equation 3.5 to data combined across 1 / 3- octave bands after points with waveform coherences 7w < 1]) or 7w > (1 — 41) were removed. Thus, these values of n and the related RMS errors are determined only by data with abscissae near 0.5. Increasing values of 1]) remove increasingly many points near the boundary values of 7w. In the case of a 1.0 m source distance, most data points are grouped on the high end of the best-fit curve (see Figure 3.14), and a cutoff of 1]) = 0.25 left too few points in the data set to perform a regression. 115 tion 3.5 near 7w 2 1, the points at that end of the curve would return a small RMS percent error for nearly any value of n. It must also be considered that with increas- ing 1p, the number of data points used in the regression decreases, thus increasing the width of the confidence interval. Even though the RMS percent error of the fit tends to increase with 1]), the resulting values of 71 should perhaps be considered to be more reliable than those found when 1]) = 0 (Table 3.7). Of course, there is a limit to which 1}: can be increased before important and meaningful data are eliminated, but the exact value of this limit is not clear. The largest value of cutoff used here, I]: = 0.25, only allows points with 710 > 0.25 and ”m < 0.75, the middle 50% of the allowable range, to be used in the regression to Equation 3.5. This is perhaps the greatest value of 1]) that would return a reliable estimate of n. 3.5 Conclusions The relationship between envelope and waveform interaural coherences measured in real rooms agrees remarkably well with computer simulations that calculate coherences from Gaussian white noise pairs. This close agreement suggests that this relationship is robust to the acoustic environment (reverberation time, source distance, et cetera). Both within 1 /3-octave bands and across bands, Equation 3.5 provides a reliable relationship between waveform and envelope coherences: _ .75 _ Z n Although the power parameter 71 may change across 1 / 3-octave bands for a given room and source distance, the changes tend to be small and unpredictable for indi- vidual measurements in any specific band. On average, 71 seems to decrease with increasing source distance. By combining data across bands for a given room and source distance, a reasonably good value of n can be found for any room, given the 116 source distance. Moreover, this value can be applied to any particular band rea- sonably well. Although it may at first seem that the value of n for any given room depends on the reverberant characteristics of that room, this is not supported by current observations, which instead suggest that there is little difference between rooms with vastly different reverberation times. On average it appears that the value of 11 both within and across bands tends to decrease to some minimum threshold around 2.07. However, further analysis puts this into doubt. The exact value of 71 found by nonlinear regression of the envelope coherence versus waveform coherence data is very sensitive to small variations in the data. This sensitivity arises because the exact value of 71 does not have a large impact on the shape of the curves, so curves of similar shape can have quite differ- ent values of 11 (see Figure 3.15). This also implies that the exact value of 72 may not be important for describing the behavior of the relationship between envelope and waveform coherences. Instead, a ”ballpark” estimate may suffice in most cases. Nevertheless, data reduction techniques provide a basis for determining the most reliable possible value of 11. Specifically, results show that fitting data within a lim- ited range of coherences between 7w 2 0 and 7w = 1 (i.e. ”trimming” the data) is effective. Calculations of coherences resulting from trimming data seem to confirm the notion that the relationship between waveform and envelope coherence is insen- sitive to or independent of source distance and the reverberation characteristics of the room. Although the values of 71 found for source distances of 0.5 m initially seem to violate this ideal behavior, at such small source distances, all of the val— ues of 710 are relatively large. It has been argued that data with only large values of 7w will tend to give an unreliable value of 71 using regression to Equation 3.5 because 0f the very small difference between curves with quite different values of 71 near 710 = 1. It is suggested that a reliable equation describing the relationship 117 between waveform coherence 7w and envelope coherence 78 in any room and for any source distance is: w = %+ (1 — %) 7102.11 (36) The enve10pe coherences computed from Equation 3.6 are not exact. In or- der to quantify the expected deviation of envelope coherences from Equation 3.6, waveform-coherence—envelope-coherence pairs were combined across all frequen- cies into one large set. Three such sets were constructed —— one for both of the two tested rooms, and one for the computer simulation data. The data in each set were binned by the value of their waveform coherence into bins of width 0.1, thus creating ten bins across the range of waveform coherences. Within each bin, two methods of measuring error were employed: (1) calculation of the mean absolute difference between the measured envelope coherences and enve10pe coherences predicted by Equation 3.6; and (2) calculation of the absolute difference from Equa- tion 3.6 for which 95% of the data deviate less than the specified amount (i.e. the ”95% bound”). The absolute errors for the rooms turned out to be equal to or slightly less than the errors in the simulated coherences for both methods of error estimation. Thus, a conservative estimate of the absolute error in Equation 3.6 is given by the errors measured in the computer simulation of coherences. The absolute errors for both the Mean and 95% Bound methods are shown as a function of the waveform coher- ence, TM in Figure 3.17. This figure shows third-order polynomial function fits to the absolute errors. The mean absolute error E can then be described as a function of the waveform coherence by: E = 0.011 + 0.0083711, — 0.031711,2 + 0.013'yw3 (3.7) The 95% bound is approximately three times as large as the mean absolute error. A user of Equation 3.6 may expect the difference between enve10pe coherences cal- culated from Equation 3.6 and the actual envelope coherences to be approximately 118 Absolute Error 1 1 0 0.25 0.5 0.75 l Waveform Coherence 7,, Figure 3.17. The 95% Bound curve (closed circles) gives the absolute error in enve- lope coherence for which 95% of the simulated coherences deviate by that amount or less from the coherence predicted by Equation 3.6. The Mean curve (open cir- cles) gives the mean absolute error relative to Equation 3.6 for simulated envelope coherences. Both curves are third-order polynomials fitted to the absolute errors of the binned data. if: on average, and may expect envelope coherences to deviate no more than 3:31? from Equation 3.6. It is both interesting and important to realize the remarkable agreement be- tween the apparent coherence relationship in rooms as measured through KEMAR and that determined for simulated noises. For pairs of Gaussian noises with coher- ences spanning the range from 0 to 1, it was found that n = 2.11 fits the data well in every 1 / 3-octave band. This is essentially identical to the relationship between envelope and waveform coherences in rooms and suggests that the value of the power parameter 7: = 2.1 is indeed a good estimate. Furthermore, since a does not vary very much across frequency for the simulated noises, additional credence may be given to the notion that n is essentially the same for any 1 /3-octave band in real rooms. 119 CHAPTER 4 Measurement of Binaural Properties in Rooms as a Function of Azimuth 4.1 Experiment 9: Measurements of Binaural Proper- ties as a Function of Horizontal Angle in Anechoic Conditions Experiment 8 showed that interaural coherences both in the waveform and the envelope may be calculated for signals recorded by KEMAR in a real room. The experiments of the present chapter measured other binaural properties, specifi- cally interaural time differences (ITDs) and interaural level differences (ILDs). As these are perhaps the two most important binaural cues in sound source localiza- tion for sources in the horizontal plane [48], it is worth carefully measuring them and comparing the results to physical predictions. As the properties of a room may interfere with estimations of ILD and ITD in part due to reverberation, initial mea- surements were made in anechoic conditions. ILD and ITD vary primarily with the angle from which the sound source is located relative to the head, and so the 120 incident angle of the sound is the independent parameter in this experiment. At the same time, coherences in both the waveform and envelope were measured as function of angle. The simplest model of the head approximates it as a solid, rigid sphere with antipodal ears. In this model, the head itself will have a small effect on the cross- correlation function, though its orientation (angle relative to the location of the sound source) will shift the location of the peak in time. Thus, the shape of the cross-correlation function is expected to mimic that of the auto-correlation function of a noise, a(T), shifted in time such that its peak occurs for a lag equal to the ITD. For flat-spectrum noise, this auto-correlation function is [47]: a(T) = A311 sin (%) cos (a T) (4.1) In the above equation, Aw is the bandwidth, and a is the center frequency of the noise band. The autocorrelation can be seen to be a cosine function scaled by the total noise power and enveloped by a sinc function. This gives an idea of its shape. For 1 /3-octave bands of flat-spectrum noise, Aw = 0.235, or for octave bands of noise, Aw = %E. The coherence derived from Equation 4.1 (the peak value) would always be 1, and the ITD (the location of the peak) would vary with the orienta- tion of the head. Hartmann [47] cites particularly favorable cases for how well this approximation works with signals measured through KEMAR, oriented at 45° rel- ative to the direction of incident sound, using a band of noise approximately an oc- tave wide, from 504 Hz to 1000 Hz. In that example, the peaks of Equation 4.1 and the cross-correlation calculated from the signals at the KEMAR ears agree quite well. The coherences measured in the following experiment are expected to differ somewhat from those predicted by Equation 4.1 because flat-spectrum noise is not being used, and the geometry of the KEMAR head is more complicated than that of a simple rigid sphere as it is intended to mimic the shape of an average adult’s head and torso. 121 A method of predicting ITDs comes from a theory of the diffraction of harmonic plane waves by a rigid sphere. thevkin [78] shows that the total pressure on the surface, pt, normalized by the incident free field plane wave pressure, p0, is given by: (4.2) fl_ (1)2 °° zm+1Pm(sin0)(2m+1) P0 k“ m=0 171,1 (kW-Inn]: (kg) Here, Pm are Legendre polynomials of degree m, jm are spherical Bessel functions of order m, and nm are spherical Neumann functions (Bessel functions of the sec- ond kind) or order m. The parameter a is the radius of the sphere or the equivalent head radius, and k = 2—751 is the acoustic wavenumber where f is the frequency of the incident wave and c is the speed of sound. Kuhn [62] derives from this a low-frequency approximation for the interaural phase difference (IPD) between antipodal ears — ears on exactly opposite sides of the head, each 90° from the nose — on a rigid spherical head of radius a when exposed to low-frequency plane waves at an incident angle 6 relative to the median sagittal plane: IPD = 2101117161“: sin a) This approximation assumes that (ka)2 << 1. The IPD and ITD are related simply by ITD = %)., and so the ITD in this limit is: ITD = —2—tan_1 (E ka sin 9) (4.3) kc 2 From this, in the low-frequency limit where %ka sin 9 << %, the first order series expansion of the arctangent function in Equation 4.3 can be taken, giving: I TD 2 37a sin 6 (4.4) This approximation, with a sinusoidal shape for the ITD as it varies over the az- imuthal angle of the source, describes the low-frequency ITDs quite well (Fig- ure 4.3). 122 For high frequencies (ka > 10), Kuhn further derives Woodworth’s high- frequency approximation for the ITD [94]: ITD 2 % (sin9 + 9) (4.5) This equation suggests a more triangular shape for the ITDs at higher frequencies (which will later be demonstrated by Figure 4.4). 4.1.1 Methods The same Mackie loudspeaker used in previous experiments was used as a sound source for MLS noise. The loudspeaker and KEMAR were placed in an anechoic chamber, 3.0 m wide by 4.3 m long by 2.4 m high, positioned 1.8 m apart. Three sound-absorbing foam wedges were placed on the seat of a sturdy cushioned chair such that they formed a ledge, atop which the loudspeaker was placed. In this way, the center of the loudspeaker was at the ear height of the KEMAR and the loudspeaker was acoustically isolated from the chair. With the KEMAR facing the loudspeaker, only minor binaural differences were expected. This was known as the 0° condition. Other angles were tested by rotat- ing the KEMAR on its base, leaving the loudspeaker in place. The full set of angles tested was: 6 = {0, 21:15, :l:30, 21:45, :l:60, i75, :l:90, :l:120, i150, 180} degrees. Positive angles corresponded to clockwise rotations of the KEMAR (such that the loudspeaker was to the left), and negative angles corresponded to counterclock- wise rotations (such that the loudspeaker was to the right). This convention is depicted in Figure 4.1. Correspondingly, positive ITDs were defined as those for which the left ear led the right, and vice versa. Positive ILDs were defined as those for which the sound level in the left ear was greater than those in the right ear, and vice versa for negative ILDs. In order to align the KEMAR to specific angles relative to the loudspeaker, a 123 Figure 4.1. A schematic view of the alignment of the KEMAR head relative to the fixed loudspeaker and some of the angles 9, indication the direction of the KEMAR nose. When the KEMAR is rotated in place and the loudspeaker remains stationary, the angles of incidence are positive for clockwise rotations and negative for counterclockwise rotations. 124 protractor was mounted on the top of the KEMAR head such that the 0° mark of the protractor lay along the KEMAR median axis. The KEMAR was aligned to 0° by projecting a laser beam down the 0° mark of the protractor and adjusting the angle of the KEMAR until the laser beam struck the center of the loudspeaker. Then, the laser beam was projected along each other angle 9 while keeping the KEMAR in the 0° position. Marks were made on the walls of the chamber where the laser beam landed for each angle 6. The protractor was then removed. In order to then align the KEMAR to any given angle in 9, a laser beam was projected along the KEMAR median axis and the KEMAR was rotated until the laser beam struck the appropriate mark on the wall of the chamber. For a single trial, an order-18 MLS was generated with random initial input and played through the loudspeaker. Recordings through the left and right ears of the KEMAR were made at a sample rate of approximately 200 kHz. Thus, the sig- nal was about 1.3 seconds long. Measurements of interaural waveform coherence, ITD, and ILD were made (Equations 2.2, 2.4, 2.5,and 2.6) in twenty-two 1 / 3-octave bands between 80 and 16000 Hz. Each trial was repeated five times consecutively for each angle, each time with a different MLS of the same order, and averages and standard deviations were computed across trials for each quantity measured at each angle. The entire experimental procedure was repeated, but with the KEMAR and loudspeaker at different places in the same anechoic room. The relative separation between the KEMAR and loudspeaker remained unchanged, but their positions within the room were changed. This change in position is depicted in Figure 4.2. Such a shift of position in an anechoic environment should not change the results of the measurements made in that environment, and that is exactly what this repeat in the experiment was meant to check. Any significant differences in the results between those made at the two different positions could be attributed to imperfec- 125 Position 1 Position 2 \ patch panel ‘ > ’ ..1 3.05 m Q ‘13 3.05 m Figure 4.2. A depiction of the locations of the KEMAR and louspeaker relative to the surroundings of the anechoic room in Positions 1 and 2. In each position, the relative separation of the KEMAR and loudspeaker remains the same, though their positions within the room change. tions in the anechoic environment. Similarly, any anomalies that would appear in both sets of measurements could not be blamed on imperfections in the anechoic environment. 4.1.2 ITD Results ITD was estimated as the lag at which the peak of the cross-correlation function occurred in 1 /3-octave bands. Plots of ITD across angle within each frequency band are shown in Figures 4.3 and 4.4. More apparent in these figures than in their companion plots of coherence data is the effect of ”period errors.” In measuring the binaural coherence and ITD, it is sometimes possible that the highest peak in the cross-correlation function occurs not at the ”correct” delay, but one or more periods away (where the periodicity of the cross-correlation function in a given frequency band is understood to be close to that of a sine wave with a frequency 126 equal to that of the center frequency of the band). These ”alias points” were noted as well by Hartmann et al. [51] when making similar measurements in a real room at a distance of 6 m. In the course of this experiment, the ITD and waveform coherence of period errors were noted and recorded along with the number of periods in error. Cor- rect peaks could be identified by locating the peak in the cross-correlation function that was nearest to the ITD measured in other surrounding bands. This, along with the expectations that period errors are unlikely in lower frequency bands (since few periods fall within the :l:1 ms limit in lag), and that lTD is expected to decrease somewhat with increasing frequency, allowed for the correct peak to be determined. The location and height of the correct peak then replaced the in- correct ITD and waveform coherence in the data set. The period errors are shown in Figures, 4.3, 4.4, 4.11, and 4.12 as filled-in counterparts to the open symbols. Period errors tended to be more frequent at higher frequencies. At both po- sitions, period errors most frequently occurred in the 16 kHz band. Errors of two periods occurred two times at each position — in position 1 at —120° in the 8000 Hz band and at 90° in the 12.5 kHz band, and in position 2 at 45° and 90° in the 16 kHz band. No error of more than two periods was observed within the parameter space of this experiment. The ITDs are, as Kuhn predicts (Equation 4.3), sinusoidal in shape at low fre- quencies. At higher frequencies, the pattern of ITDs take on a more triangular shape, as predicted by Woodworth’s formula (Equation 4.5). Both Equation 4.3 and Equation 4.5 were fitted to the ITD data in order to arrive at an estimate of the equivalent head radius of KEMAR. These equations were fitted to the given ITD data in a manner in which the head radius a is a parameter that is allowed to vary with frequency. The result is that Kuhn’s low-frequency approximation predicts a head radius of about 8.0 cm given the ITD data in the lowest frequency bands, and 127 I I I ""5736155'95 """ é """""" ‘ A ----- A;100 Hz pg? 05 . . D"'D:125 H2 516%“ ' ' °':-9<>:150 “Id-n .0 1:3. 0' : fl :2 '- ‘ 0 .. ..... ....... .-' '---:-=.--.¢:'..- -.- 8‘13! Q? £59 E 0‘0 ' ": p : : 70-5 s 5.0.. 1:129 = o-~° : : 3“ o" E 5 a O 1 I “l ‘1 """""" r """ : """ r """"""" 1‘ . r 1 -1 ........... 1 ..... f ..... '. ........... :-_. ................. 1 ..... '. ........... .— ; O ------- O :200 Hz ; ; O ----- 0:200 Hz ; ' A ------- A :250 Hz 1 A ----- A1250 Hz : ITD (ms) Position 1 Position 2 %40 §1-0.5 : :--l ........... 1-1 D—--I:Ji000H . 0—--<>Eiooo :12 0'5 ‘ -0 ii-0.5 ’l’ """" l """ l' """ 3"”i """ L """ l ”:I """ I ’’’’ :I """ I """ :I '''' .' """ }-1-' -l80-120-60 0 60 120 180 -120-60 0 60 120 180 Angie (degrees) Figure 4.3. ITD in milliseconds as a function of angle in 1 / 3-octave bands, plotted in the same format as Figure 4.11. ITDs for the remaining higher frequency bands are shown in Figure 4.4. 128 Posi ion 2 Posi ion 1 _ r' L" .............. P--- r . . . . . . u . . . . . o . . . . . . . . . . .u ":n n n g . . . . . . u n . u u n u u . n u n 1" .r‘" ....... 0"---00' ..... DI . . . . . . . . o . o . . . . . z . . aw. ".."mumm “ y . . . omo . . 0 . 00 0 . . 5 u 00 0 n u .m . .mm m . . 1" '1" ..... r IIIIIIIIIIII I . . . " ".OAQO . " “nuns. " . . .u u -.. . . . ... . . . . " 020AD¢ . u . . . . . . . . . . . . . . . . . . l' ..... [If .... r ........ r--- r _ _ _ _ _ _ H u q A _ a . Ir .1 nnnnnnn r 111111111 r111 r... . . . - ... .................... I. ..... .. .............. ”1 mzzzuz " mzz m mznmnmmm m ..HHH.H . .HH .Lv.H . . . . . . . 000 . "0000 ""00 "HOOWO " SOQO 50 00 o ..HH.03..--. 6.4 ............... I...-....-....1..--.- --- I "osuo ."os 0 "nosuo " "n H .n. u "m u .2 u . .n. n .u . ... . .m n n." . . ” ... . r... o . nlrpn " a o ..... o o .A "oafiv ""0000 n " OAEV " Ki---" ......... .. ..... I1... ..... ”i--- .0--- -.Ir ...... .r ............... "1 _ p _ _ _ — _ _ — _ 0. 0 4 . 0 Jo . -l80 -120 -60 0 60 120 180 -120 ’60 0 Angle (degrees) 60 120 180 Figure 4.4. ITD in milliseconds (ms) as a function of angle in 1/3—octave bands, plotted in the same format as Figure 4.12. Closed symbols represent original data before corrections for period errors were made. lTDs for lower frequency bands are shown in Figure 4.3. 129 l 3.0 l 2.0 l 1.0 101) Si) 81) 'L0 011 5X) 00’) (cm) 160 1000 20000 Frequency (Hz) Figure 4.5. Head radii a as a function of frequency, calculated by fitting Kuhn’s low-frequency approximation (Equation 4.3) and the Woodworth formula (Equa- tion 4.5) to ITD data in 1/3-octave bands. The results of fitting the Woodworth equation are shown by closed triangles and the results of the low-frequency ap- proximation are represented by open circles. The low-frequency approximation predicts a head radius of about 8.0 cm in the low-frequency bands, and this is in close agreement with the predictions of the Woodworth formula in high-frequency bands. the Woodworth formula predicts a nearly equivalent head radius in the highest frequency bands. The trends are shown in Figure 4.5. These fits are least-squares regressions over the full 360° range shown in Figures 4.3 and 4.4. Previous studies in which the Woodworth formula was shown to agree well with measured data include those of Algazi et al. [2], Duda et al. [32], and Fedder- sen et al. [37]. In those studies, the Woodworth formula was shown to correspond well to measured ITDs. However, Feddersen et al. took measurements with mi- crophones on the sides of heads, just in front of the ear canals of human listeners. Duda et al. took measurements on opposite sides of a rigid sphere, though human ears are not truly antipodal. In neither case were measurements taken at the ex- 130 act location of the ear canals, nor did those measurements include the effects of pinnae or ear canals on the signals. Algazi et al. took measurements of human beings, but did not compare those to measurements in KEMAR or any other head simulator. The Woodworth formula itself was derived from a model of the head as a rigid sphere with antipodal ears, and so it is perhaps not surprising that the measurements of Duda et al. conform to it so well. It should also not be surprising, however, that some inconsistencies arise between it and the current data, which were measured through the ears of KEMAR, pinnae and ear canals included. Burkhard and Sachs [24] quote the ear-to-ear distance of KEMAR to be 15.2 cm, which gives a head radius of 7.6 cm. Measurements of various dimensions on and about the head were made by Algazi et. al [3] on 43 human subjects. The average, plus and minus one standard deviation, of the ear-to-ear head width across all 43 subjects was 14.5 2’: 0.95 cm. The head depth from forehead to the back of the head was 21 :l: 1.2 cm, and the circumference of the head was 57 :i: 2.5 cm. These three measurements yield three different estimators of the correct equivalent head radius - 7.25 cm, 10.5 cm, and 9.1 cm respectively. Algazi et al. in another study compute the average Optimal effective head radius of 25 different human listeners to be 8.7 cm [2]. All of these measurements indicate head radii close to, though usually larger than the effective radius of the KEMAR head. An altemative procedure for fitting the ITD data may come from varying not the head radius, but the contribution of the linear term 0 in Equation 4.5. This would be a fit of the ITD data to an equation of the form: ITD = ‘2' [sin6 + b(f)6] (4.6) In Equation 4.6, the head radius a is fixed, and the contribution of the linear term, I), is allowed to vary with frequency. Using the estimate of the head radius from Algazi, et. al, the average of parameter b for frequency bands at and above 2500 Hz is found to be 0.25. The variation across frequency is shown in Figure 4.6. 131 b(f) 1.0 -—i ------ --------- 1 ------------------- 1..--.. 2500 10000 20000 Frequency (Hz) Figure 4.6. The linear parameter b as a function of frequency, calculated by fitting Equation 4.6 to ITD data for frequencies at and above 2500 Hz. This parameter describes the relative contribution of the linear term 6 in Woodworth’s equation (Equation 4.5) when the head radius is fixed to be 7.25 cm. Figure 4.7 shows data plotted from one particular high-frequency band (cen- tered at 8000 Hz), and plots of the best fitting equations. The solid line shows the best fit of Equation 4.5 to this data, which is achieved for a = 8.10 cm. The dashed line is the best fit of Equation 4.6, which is achieved for b = 1.22 when the head radius is set to a = 7.25 cm. Woodworth’s equations (Equations 4.5 and 4.6) seem to fit the data rather well, though they perhaps do not capture the triangular shape of ITDs at high frequencies particularly well. 4.1.3 ILD Results Plots of ILD across angles for all frequency bands at both positions are shown in Figures 4.8 and 4.9. At low frequencies, head shadow is expected to be minimal because long wavelengths are diffracted around the head. That Figure 4.8 does not reflect this at the lowest frequencies is an indication that the anechoic room is not 132 — a=8.10cm. b=l ------ o=7.25cm. b=1-22 1 I ! , ! I ! I ! I ! r _,.i.:.i.1.i. -180 -120 -60 0 60 120 180 Angle (degrees) Figure 4.7. ITD data from the 8000 Hz 1 / 3-octave band is plotted across frequency as open circles. A fit of Equation 4.5 to this data, shown by the solid line, is made when the fitting parameter (the head radius) is a = 8.2 cm. The dashed line shows a fit to this data of Equation 4.6, in which the head radius is set to a = 7.25 cm, and the fitting parameter is found to be 17 = 1.22. 133 truly anechoic at those frequencies. This supposition is supported by the aberrant behavior of the ITDs at very low frequencies (see Figure 4.3). The anechoic room used for these measurements, which is lined with sound-absorbing foam wedges of approximately 0.89 m in length, is not truly anechoic at these low frequencies. The room can be expected to be anechoic for wavelengths shorter than four times the length of the absorbing wedges (3.56 m) [10], which corresponds to a frequency of approximately 96 Hz. Further discussion of binaural properties in the anechoic condition and comparisons of further results to the anechoic environment will be understood to exclude the measurements at these four lowest frequency bands (those below 160 Hz) unless otherwise noted. At 0° and 180°, the ILD is expected to be zero. Here, the average ILD, plus and minus one standard deviation, across all frequency bands (not including the lowest three) at 0° was 0.3 :l: 0.83 dB at position 1 and 0.3 :l: 0.91 dB at position 2. Here, any difference in the ILD from zero at 0°, found to be roughly equal at both positions, is likely the fault of slightly unequal gains in the two recording channels. At 180°, the average ILD was found to be 1.0 i 0.83 dB at position 1 and 1.0 :l: 0.42 dB at position 2. Differences in ILD between 0° and 180° are likely due to asymmetries in the KEMAR, but these differences are explored in greater detail in a later section. Dips in the magnitude of the ILD function were found at 90° in many of the higher frequency bands. This is the acoustical equivalent of Fresnel’s ”bright spot.” Its existence was derived by Poisson as a result of a theory of the wave-nature of light put forth by Fresnel and later experimentally verified by Arago [53]. A rigid spherical object will diffract light in such a way that a bright spot will appear on the side of the object directly opposite of the light source. A similar effect is expected of acoustical plane waves incident on a rigid spherical diffracting object. A ”bright” acoustical spot at the far ear for an angle of 90° will result in a lower 134 Position 1; Position 2 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 15 _ 2, ...... 2 :goflfiz 010000-11 __ o ...... (A) ggoflfiz c50000. _ 1:1----1:1 125 Hz :' A4014- S-u-o 125 Hz :5 AAA 10 '- --- 55 -- ,-_ .-.4 A. Q n 0. 0 I“ .t W‘ 0.0. 0- O 160 H! o 9.. ‘0' 5 - " '52- -- ' 9' “05-. - ° ' 038%” °'°-- ' ””0- . .‘ O, ,- -5 "" l..'o.“.: .4: -I- l."- '5. is. "l 10 .n'AAa‘l-p O 35554.5 oooo-O ] o000.0 -‘5 " 1 1 1 1 1 1 1 1 1 1 1 1 1 T'- 1 1 1 1 1 1 1 1 1 1 1 1 d 1 1 1 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 l 2 ...... 2 g 1: Q (A) ...... 2 g a: - """" .-"A"~. '''''' 0.. 4 4 U----U 315 Hz 0 .0- D'°‘“D 315 HZ 8A,"'o <>----0 400 Hz , up, 0----<> 400 Hz “8900' ‘ a: .' a “$4.70 ILD (CB) b». -180-120-60 0 60 120180 -120 -60 0 60 120180 Angle (degrees) 15 10 -10 -15 Figure 4.8. ILD in decibels (dB) as a function of angle in 1/3-octave bands, plotted in the same format as Figure 4.11. ILDs for the remaining higher frequency bands are shown in Figure 4.9. In each panel, the vertical scale changes, as shown by axes labels on the left and right sides of the figure. 135 Position 1 Position 2 I l I l I l l T l l I I I l I I l l 1 1 1 Q . D-"O 2WD Hz 9. .. sum... ,1 1 3.x III-"D 12500 Hz 0""0 16000 Hz l J l l l I l «1- l 1 .1 l I l J l I L l l l L l -180 -120 -60 0 60 Figure 4.9. ILD in decibels (dB) as a function of angle in 1 / 3-octave bands, plotted in the same format as Figure 4.12. ILDs for lower frequency bands are shown in Figure 4.8. In each panel, the vertical scale changes, as shown by axes labels on the 120 180 60 120 180 -120 -60 0 Angle (degrees) left and right sides of the figure. 136 ILD for wavelengths short enough to be appreciably diffracted by the head. This can be seen in frequency bands as low as 1250 Hz as a flattening of the ILD curve at 90° such that the curve looks to deviate from the sinusoidal shape seen in lower frequency bands. By 2500 Hz, there is a clear dip in the magnitude of the ILD at 90° relative to the angles around it. 4.1.4 Coherence Results The coherences measured in the anechoic environment are plotted as a function of frequency for several values of the incident angle in Figure 4.10. Most notable about these plots is the fact that low-frequency coherences are small at certain in- cident angles, most notably in the range of 30° to 90°. This trend is remarkably similar when measured at each location, suggesting that it is not an effect of the room itself. A more interesting view of what is happening is given by plotting the coherence as a function of incident angle in various center frequencies. Results for coherence in 1 / 3-octave bands from 80 to 1000 Hz at each position are shown in Figure 4.11, and those for bands from 1250 to 16000 Hz are shown in Figure 4.12. There was not much difference in coherence measurements made at position 1 versus those made at position 2. The greatest difference between posi- tions was for —- 15°, where on average, plus and minus one standard deviation, the coherences were 0.023 :1: 0.031 greater at position 2 than at position 1. Pooling all pairs of data together from position 1 and position 2, the paired differences show that the coherences are greater at position 2 than at position 1 by only 0.005 :1: 0.036 (paired t-test, p = 0.001). Though the data indicate a slightly higher set of coher- ences at position 2 versus position 1 on average, the absolute percent difference of 0.5% is minimal. This difference is likely unimportant in the sense of comparing coherences at two different places in a room. Of particular interest in the plots of coherence across angle is the trend for co- 137 p. 8 fl = 0 °’ :2 L .— ° 0 ‘5 a. o I 0° e 30° x 00° A 90° 0 120° * 150° Coherence 1 Position 2 ' 80 1000 16000 Frequency (Hz) Figure 4.10. Coherence 'y as a function of frequency for positive angles in 30° in- crements. The top panel shows measurements made at Position 1, and the bottom panel shows measurements made at Position 2 in the anechoic environment. 138 Position 1 Position 2 I I I . - owb .5 H; -- o ...... b .0 Hz - A""AINHZ A """ A 100* ' a“? 12:: 7' 9.1-73 :21: ' ! -&:8.3: :M‘. ... :~ 0'”- o 8 [- 11... ‘s «I 080:5 .1... 9'.‘8\ .- . l'b’o..:: - °-° r - bosons; WWI: -- ' A» 0 w x 0.:- - '. ‘. ' . 0' '.° 3 . '. -' O ." " o 4 *- ". '° A 5 Han .‘ o -- '-. Eqaaggo' 2W, : .. e 0 'AAA& : 1. A .’ :- 1. bad ..‘ .'. A e: o ... "-. u". I I I I I 0.0 0-5 0.4 M» . gag .gvM‘iM. . . 0.95] 1.300..” w}- -- firm «0}? «0.95 o 9 - cw $110.00”; _- apauaq In“ ’15 - o 9 ° 299.. 15 5b” .3” ”a 8 l .2 ° 0.05 ~ 0 1. 19 °-.;,;I-“ Q 3 0.05 008'" 1 1 1 1 1 1 1b? 1 1 1 1d 1 1 1 1 101 1 13? 1 1 1 1‘008 “180 “120 '50 0 60 120 180 '120 .50 0 60 120 100 Angle (degrees) Figure 4.11. Coherences as a function of angle in 1 / 3-octave bands for each of two different positions in the same anechoic room. Four frequency bands are shown per plot, with results for position 1 on the left and for position 2 on the right. Shown in this plot are coherences for frequency bands from 80 to 1000 Hz. Filled— in symbols represent original data before corrections for period errors were made. Coherences for the remaining higher frequency bands are shown in Figure 4.12. 139 Position 1 Position 2 I I I I I I I I I o. ...... o ‘m H: A ------ A 1000 Hz D-“O 2000 Hz - o----0 2500 Hz 0.99 ‘ .1 ; ° "1351k 3", "-0.99 0.97 - “$39? ‘4 -0.97 0.95 ‘ ' - - - 0.95 0'93 1 1 1 :61 1 1 1.3033 o ------ o 0150 H: I 4000 Hz - 5000 Hz 1' 0----0 6300 Hz § 1 .. - 1 . my 0.90 . %0 .1 -0.90 .2 ’_-' 0.00 a? U 00% ’. :- 1: :0 “00% 0.94 - 2:." -- ~ 0.94 A '6 0092-11 1 1 1 1 1 1 1 1 1 1 1 1 h'1 L 1 1 1 1 1 1 1 1 1 1 1‘0.” _ oI ..... Io aIOOOIHz I I I I I I j .11.. I OI ..... Io “In I T I I I I I .1 - A ------ A 10000 Hz « A ------ A 10000 Hz 1 _ D-"D 12500 Hz __ D----C| 12500 Hz .. I .. 0""0 16000 Hz _ 0""0 16000 Hz _ ‘ L " 0,9 "3f"- 008 I. '- ‘0 f A ’ 'I 008 - - 19 k 4 0.6 - - - 0.6 0.4 ~ - - 0.4 -100 i120'-00 l0 '00 [120110011-120300‘6'00‘120'10'0 Angle (degrees) Figure 4.12. Coherences as a function of angle in 1 / 3-octave bands for each of two different positions in the same anechoic room. Four frequency bands are shown per plot, with results for position 1 on the left and for position 2 on the right. Shown in this plot are coherences for frequency bands from 12500 to 16000 Hz. Filled-in symbols represent original data before corrections for period errors were made. Coherences for the lower frequency bands are shown in Figure 4.11. 140 0.4 - — 0,3; 1 A7=19.97/f°'°16 o l» I 9,121“ " I 0.1 - C _ Coherence Difference 100500100010002000 2500 3150 Frequency (Hz) Figure 4.13. In open circles, a plot of average difference in coherence between those measured at 0° and 45° across 1 /3-0ctave frequency bands from 160 to 3150 Hz in anechoic conditions. The trend line and its accompanying equation represent a nonlinear least squares regression to the data. herences to dip down sharply, with minima near i45°. This unexpected effect was found consistently in the range of 200—3150 Hz. With increasing frequency, the difference between perfect coherence (1) and the coherence measured at 45° steadily decreased from an average difference of 0.360 in the 160 Hz band to 0.025 in the 3150 Hz band. The difference in waveform coherence between 0° and 45° decreases roughly as a power function in frequency of the form: 19.97 A7 = f0.816 (4.7) A plot of these differences and the best fit line corresponding to Equation 4.7 are shown in Figure 4.13. The cause of the anomalous behavior in coherence was ex- plored further in follow—up experiments. Incoherence due to Signal Degradation One possible cause for unusually low coherence is decay of the signal before it reaches one of the ears. This may be a result of diffraction by the head leading 141 to partially destructive interference at one ear. Incoherence caused by signal de- cay should correspond to an anomalous ILD. For instance, in the 4000-Hz band at -120°, there is a sharp dip in coherence as seen in Figure 4.12; this is matched by a similarly out-of-place increase in the magnitude of the ILD at that angle in that frequency band. Though less pronounced, this same coincidence can be seen for the 2500 Hz band at 120°. It is important to note that an anomalous ILD may be correlated with, but is not the cause for incoherence. The drops in coherence for frequency bands through 3150 Hz at angles of :l:45° cannot be universally explained simply by an unusual degradation of the signal in one of the ears because a corresponding increase in the magnitude of the ILD is not often seen. There are several other places at higher frequencies where this disconnect between unusually low coherence but normal ILD takes place, such as in the 5000 and 8000 Hz bands at —120°, and the 16000 band around 60°, but never in so systematic a way across a range of frequencies as for 45°. Since an unusual anomaly in the overall level of one of the signals is not the source of the drop in coherence at 45°, it must be that one signal is otherwise somehow changed relative to the other, as if by filtering. At frequencies above 4000 Hz, filtering effects of the pinna may become impor- tant [52], as they affect the spectrum of the sound in a manner often thought to aid localization in the MSP. Asymmetrical spectral effects of the pinnae may in cer- tain instances result in incoherence in those frequency bands affected. However, the spectral effects of the pinnae are not generally present at such low frequencies as where these systematic dips in the coherence were seen in the present study (around 45°). 142 Incoherence due to Spectral Properties of Signals Signals can also be made incoherent by means of additive noise, but that is unlikely to be the cause of the dips in coherence seen in measurements made with KEMAR in the current experiments (see Figures 4.11 and 4.12). The cause for the interaural incoherence at and around :l:45° may be due instead to differences within bands in either the time or the amplitude spectrum of the sounds in the left and right ears. When either ITDs or ILDs are frequency-dependent within a noise band, the interaural coherence will be imperfect (less than 1) as a result [49]. In general, inter- aural incoherence may be the result of a combination of frequency-dependent ILDs and frequency-dependent ITDs within bands. In this experiment, it is possible to separate out the individual contributions of these two factors. To see the effect of the ILDs within bands alone, the signals recorded in the KEMAR ears were altered such that the IPD (and thus the ITD) between the ears was set to zero for each spectral component. To do this, the signals in the left and right ears were both Fourier transformed, and the phase spectrum of the sig- nal in one ear was altered, component by component, to match the phase spec- trum in the other ear. In so doing, the left and right ear signals then had iden- tical phase spectra, but the ILDs (which result from differences in the amplitude spectra) were unchanged. The signals were then inverse Fourier transformed, the cross-correlation between the signals was computed, and the new ”equal-ITD co— herence” was found. Any incoherence found could then be attributed entirely to differences in the ILD spectra of the two signals. A similar procedure was per- formed in which the original signals were made to have equal amplitude spectra while maintaining their different phase spectra. Then an ”equal-ILD coherence” due only to differences in ITDs between the signals could be found. These coher- ences were computed in the 500 Hz 1/3-octave band for five different measure- ments at each of three positions: 9 = {—45°, 0°, +45°}. 143 l r I I I True ’ 0 Equal ILD ,I A Equal ITD I ’ O ‘0 1 i 1 ?~ 01 ' ‘ 8 q) .. .................................................................................................. .. L ()8 1E 01 .C'. O '- . o P \l l l 05.1.1.1. -45° 0 45° Angle Figure 4.14. Coherences in a 500 Hz 1/3-octave band measured in KEMAR for sound at incident angles of 9 = {—45°,0°, +45°}. Symbols represent averages over five measurements and error bars extend one standard deviation in each di- rection. Square symbols represent the ”true” interaural coherence, measured be- tween unaltered signals recorded in the left and right ears. Circles represent co- herences when the ILD was made to be constant within the frequency band, and thus are representative of the effect of varying ITDs within the band. Triangles represent coherences when the ITD was made to be constant within the band, and thus are representative of the effect of varying ILDs within the band. The results of the calculations investigating the role of frequency-dependent ILDs and ITDs in the 500 Hz band were shown in Figure 4.14. When the ITDs within the 500 Hz band are set to be equal across the entire band (triangle sym- bols), the coherence at :l:45° increased substantially compared to the true coher- ence. However, when the ILDs (but not the ITDs) were set to be equal across the entire band (circles), the coherence was much closer to the true coherence and thus much more incoherent. Thus, the difference in phase spectra within a 1 /3-octave band is the primary source of incoherence. The ITD spectrum within a band is computed by first finding the phase spectra 144 of the signals from each ear filtered into that band, taking the difference between the phase spectra, and dividing each component of the phase difference by its an- gular frequency. The ITD spectra for the bands with center frequencies of 500 Hz and 10 kHz are illustrated by Figure 4.15. In a high frequency band such as 10 kHz, where the coherence across angles is relatively flat, the ITD spectrum —45°, 0°, and +45° looks qualitatively flat, indicating a consistent ITD across the entire band. This is also true in the 500 Hz band at 0°, though there is more jaggedness than in the 10 kHz band. However, at :l:45° in the 500 Hz band, a great deal of variation in the ITDs across the band is quite evident. Curiously, this only occurs when the loudspeaker is in the anterior half of the horizontal plane - a symmetric effect in the posterior half of the horizontal plane (which would occur at :l:135°) is not ob- served. This implies that such an effect could not be predicted by a simple theory of acoustical scattering from a rigid spherical head in which the interaural axis is exactly perpendicular to the MSP (antipodal). Incoherence Due To Dispersion The observed behavior in the interaural coherences near 45° incident sound di- rection has been attributed mainly to the phase spectrum of the signals measured in the ears of KEMAR. The phase spectrum can affect coherence if the ITDs are frequency-dependent within the 1/3-octave bands in which they are calculated. Over narrow bands, systematic dispersion, such as that found in the spherical head model, can be approximated as being linear. Here the effect of a linear dependence of the ITD on frequency is investigated. In this section, a mathematical framework is constructed for estimating the effect of such a linear dependence on interaural coherence and the results are compared to the measurements made earlier in this experiment. Start by considering the signals in the left and right ears, xL(t) and x R(t) re- 145 10000 Hz 500 Hz 1_1fi 1111‘ 7:0.9500 —45° «— 7‘0-573‘ ”450: 0.5- r * ‘ 0; j; l l . + ” ' l .05. r " . l i '1‘ 1 1 1 + 1 1 1 “ 1 1 1 1 a 1 . 1 7:0.9923 0° ‘1 7:0.9739 0° ‘ 1 P U" ' 1 1 1n3(mm) O I P U" 1 . 1 1 . .1 4 -1 L- 1 1 1 1 1 1 1 " 1 1 1 1 , 1 1 1 T I l- 1 1 1 1 1 1 1 1 _1 1 11 l 1 1 u * 7:0.9536 45° " ' 0.5 1 ~» - 0 - -- 4 -0.51 1- - 1* " 7:0. 288 45° 9000 10000 11000350 500 650 Frequency (Hz) Figure 4.15. The ITD spectra in the 10000 Hz-centered band (left plots) and the 500 Hz-centered band (right plots). The three sets of plots show the variation in the ITDs within the indicated 1 / 3—octave frequency band when the KEMAR is ori- ented —45°, 0°, and +45° relative to the direction of the sound source. At :l:45°, the large variability of the ITDs in the 500-Hz band may lead to significant inco- herence. 146 spectively. With an appropriately fine frequency spacing, any such signal can be represented as a sum of cosines with a phase and amplitude for each frequency component. 00 =£1A L, nCOS (wnt + 49L, n) n=1 00 xR(t = z AR,mC05(wmt + 4’R,m) m=1 The cross-correlation of these signals is given by Equation 2.3. If the signals are long, then the frequency spacing is quite close and we can replace the numerator of Equation 2.3 by an integral over time. fooo dt 271:1 2111321 AL,nAR,mCOS(wnt + ¢L,n)cos(wm(t + T) + ¢R,m) C “R ( )= Nf—‘PLPR The integral that appears then in the numerator is greatly simplified by the mutual orthogonality of the cosine functions for different frequencies. Taking this into account, with some manipulation, this can be simplified to an integrated form: _ 220:1 AL,nAR,nCOS(wnT + ¢R,n _ ¢L,n) CxLxR (T) — PLPR (4.8) Now, if it is assumed that the ILD is constant and the 1 / 3-octave band is flat, then the amplitudes A in the numerator of Equation 4.8 cancel with the calculations of power in the denominator. Of course, this experiment uses gammatone-shaped bands, but this calculation will suffice for the purpose of exploring the effect of dispersion on coherence in a slightly simpler case. The remaining cross-correlation equation is simply: oo CxLxR(T =n21CO 5(wnT + ¢R,n " ¢L,n) Finally, the analytic cross-correlation function can be easily calculated if the sum over frequencies is turned into an integral over the bandwidth. Having filtered the signals into a frequency band with center frequency wo and bandwidth Aw; 147 and defining 6 = Aw/ 2 such that (5 is half the bandwidth, we can write the cross- correlation in an analytic form: 1 (410-16 CxLxR(T) = fi wo—é cos(wT + (PR — cpfldw (4.9) Now consider the possible behavior of the phase difference in Equation 4.9. For ITDs that are constant over a frequency band, the relationship between the IPD, ch — 49R, and ITD, T, is simply: (PL — (PR 2 (UT (4.10) As it is, this leads to a cross-correlation function that appears to be of the form of a cosine function enveloped by a sine function, centered with a peak at the ITD, T: c (1) ___ sin [MT—1)] xLxR 5(T— 1) cos [w0(T — 1)] (4.11) This cross-correlation function is shown in Figure 4.16 for a center frequency of 1000 Hz, a 1 / 3-octave bandwidth, and an ITD of 0.27 ms. Now, assume that the ITD is not constant, but varies linearly across the fre- quency band. Let the ITD at the center frequency w = wo be T = To. The ITD varies linearly from T = To — c: to T = To + 1: over the frequency range w = [wo — 5, wo + (5]. This dispersion relation is depicted in Figure 4.17. The ITD can be written in this case as: T: T0+C(w—wo)/(5 The ITD at the center frequency of the band is To. The parameter 1: characterizes the dispersion, and equals the difference between the ITD and T0 at w 2 can :l: (5. Using Equations 4.9 and 4.10, the cross-correlation function becomes: 1 (do-F6 CU — (do CxLxR(T) = is w dw (4.12) cos [w(1 — To) — 1: (do—(S 148 1 p..............-...........-.. .............. ;.;-3.*-. —w.‘..-‘o.ca............: .............. u— - ~ 0 1 o ’ ‘ a E 2” \L . I. ' \ c I . n - ' I ' ‘ \ ‘ " : \s [A- o a a . a n u u u n o . a 1 n . 3 I I I I I i I 1 Z 0 5 ...ar... a-u .oéu-c-neeua-nu}. ...... €Ioou...‘upns‘la ------- ;--ucnoaotlné ooooooooo :OaG’IOOICltotuud ' . . . . . I . . - n u 1 ~ 1 a c u v A n u a n a - o I o a - a - c . . I . n . I o - a a o Cross-Correlation O l 1 1- E . i - 1 E E i E E -05 ....\ ....... .. ........... ..1 \ 2 I 2 I i \ ' I I O \‘I 1 I I : O I I 0 ’ ~ 5‘ E 5 E E ’ .1 . ‘. . . . I : : x : : ’: ‘ : : ‘~4.._ . gv” ' = —1 hog?...-.......:u.....-.....:u.........c.¢: ........ H.v:..c——_—a—:fl'l........;............'.......n-u-ln- 1 #1 L 1 4 1 . 1 . 1 1 1 . 1 1 1 -2.0 -l.5 -l -0.5 0 0.5 l 1.5 2.0 Log 1' (ms) Figure 4.16. A plot of the cross-correlation function as described by Equation 4.11 with a center frequency of 1000 Hz, a 1 / 3-octave bandwidth, and an ITD of 0.27 ms. The dashed line shows the sinc function enveloping the cosine. Figure 4.17. A linearly varying ITD over some finite frequency band with center frequency wo and bandwidth 26. The parameter {.‘f describes the maximum differ- ence between the ITD at any given frequency and the average ITD, To. 149 A solution to this integral is tenable with some manipulation. Grouping terms in orders of w, the argument of the cosine function is a quadratic. Completing the square gives: 2 cos [10(1 — To) — 13w :Swow] = cos [g (w — 1x)2 — %] where v = 2% (T — To + £240) is used for brevity. Trigonometric expansion of the difference term in the cosine function leads to: 1 (do-1'6 V2 CxLxR(T) = i3 1120—6 cos [%(w—VVJ cos(§E—)dw+ 1 “’0” C 2 - CV2 — ' — - -— d 4.13 +25 (do—6 sm [6(a) v) ]sm( 5 )w ( ) Equation 4.13 involves integrals of the form of Fresnel cosine and Fresnel sine in- tegrals. These are, as defined by Abramowitz and Stegun [1]: C(z) 2 [OZ cos(—u2)du (4.14) are: 5(2) : [Oz sin(—2—u2)du (4.15) The series expansion of the Fresnel integrals is: co _ 71 Zn C(z) Z 2 ((28:64:31) 2411+] (4‘16) _ °° (-1)"(7T/2)2"+1 4n+3 5(2) _ ”1:30 (211 + 1)!(4n + 3)Z (4'17) The Fresnel Integrals are plotted in Figure 4.18. They have the limits: lim {C(z), 5(2)} = 1% Z——>:l:oo Using these gives an equation for the cross-correlation function in terms of the 150 1 1 1 1 . — C(z) : . """ 5(2) : ' " 1 ‘ 1/2 _. ......................... ........................... 2:.....)1.:', ‘. 1 6 -' 1' L : : ,' 5 U, h : : 1 i .I a, E E 1 2 :6. = .' _ 0 ...... .......................... .I. ......................... ........................ ...1 0 ‘ I . E C E 1 i : (D 2 o 3 3 0 r 1 1 : : '1 L 3 1 1 3 LI. é , o ‘1 g g ..1/2 , .> am “.1”- ‘T 2.1....1 ........................... ........................ .. -1. 1 1 1 -10 -5 0 5 10 Figure 4.18. Plots of the Fresnel integrals C (2) (solid line) and 5(2) (dashed line). Fresnel integrals: 2 CxLxR(T) = %{cos [6%] (C|: %(w0+6—v) — Finding the ITD and waveform coherence from Equation 4.18 requires finding its first derivative with respect to the delay '1, 5% = g5 5%, setting it equal to zero, and solving for '1. Unfortunately, the result does not lead to an algebraic function of '1. Further progress with Equation 4.18 can be made with numerical methods to solve for the estimated ITD, T (the value of T which maximizes Equation 4.18) and the interaural waveform coherence, ’y (the maximum value of Equation 4.18). Note that Equation 4.18 involves the ITD, To. The actual value of ITD measured 151 by this method will not necessarily be equal to To due to the dispersion, but will be shifted by some amount AT = T — To. Numerical simulation of the cross- correlation requires definition of a center frequency wo, a bandwidth 25, the ITD To, and the dispersion parameter g. For a few important center frequencies of a 1 / 3-octave bandwidth 26 = 21 /3wo and an ITD equal to that measured in KEMAR at 45°, the dependence of the ITD shift and coherence on g for values of 6 up to 0.1 ms is shown in Figure 4.19. This simulation is performed via numerical computation of Equation 4.18. In frequency bands up to and including the 3150-Hz band, the trend for AT to increase and for 7 to decrease with increasing 1: follows a common form. Simple equations may be fitted to these trends: 7 = 1 — 17 gm (4.20) The best fit values for the parameters a, b, and m are shown in Figure 4.19 for the frequency bands plotted there. The trend in these parameters as a function of frequency is depicted in Figure 4.20. The tendency for a to decrease with increas- ing frequency indicates that the ITD shift AT decreases with increasing frequency for any given dispersion parameter g, but the magnitude of this shift is rather small. The power parameter m in the coherence function generally decreases with increasing frequency, though over a very small range. The scaling parameter b, however, increases by two orders of magnitude between 160 and 3150 Hz, sug- gesting that linear dispersion can have a significant effect on coherence, especially at higher frequencies. At frequencies higher than about 3150 Hz, the behavior of the ITD shift and coherence seems to diverge from Equations 4.19 and 4.20. In fact, since the func- tions in Equation 4.18 rely on the product of the dispersion parameter 1: and the 152 ITD Shift A Coherence 7 I r I ' I I ' I ' I 10 1.--: --------------- r --------------- W211:=:::::::::::;:;:..... --------------- ‘--- 1.000 ' : 0:0.0862 I l. : "“~n32:.‘::::: ...... E . £1? 81"": """"""""""" ,t ---------- ‘ "-1 g ° 3 6—--;r --------------- ........... pi"? """""""" :é """""" ;"‘°°99° o 4 L"; ____________ - ______________ '1-.." gb=l.656xl : * *‘ gl- . : ------- . : 1—«1 --------------- 1 --------------- 1nd 0.995 4 2...-.1 .......... '. ............... 1--.. : 1999 : : . : “ : : ' :"FE ' : : ‘ o _... ----- :- ------- 1 ------- Z ------- l ------ 1 --------------- 1 --------------- 1" 0.994 10 ~--.- --------------- .~ --------------- mw:::::::::;¢:.. -------------------- ..... 1.000 ' :a=0.0860: -. : ...... 3 . NA 57"; """""""" 1111 x ------------ §---o.995 I 3 6""; '''''''''''' 1"”,1'j" -------- r---’ z : ‘ §" 41 i ‘‘‘‘‘ i “"1 ------- i ----- 5- 11071-1990 1- 1- " """"""""" :"1- : b: .54 x « 4 ...... 1 ............... r! ------- 1 ------------- "“0385 - ........ g g 4' - . 1 o -- ------------ 1 --------------- }---»---}--~'-'-'----'9 - - 9 -- 5 -- i' --------------- 1 0.900 j ' I Y I l l I 10 ~--: --------------- :- --------------- '----4---::::::::3-.. ------- 1 --------------- t--- 1.0 * :a=0.0778: : 1. : ""f ...... 5 . £A 8"“: """"""""" r """""""""" '1 : ‘ : J 00 *__,_: _______________ : __________ ‘---1 -------------------------- 1«- 0.3 3' 5 .' .' ' : 7 : 8 4‘ 1 .. :1:-9.31110 : ' 1- .15 . : 1 I O m 2_ 1 ----------------- 1 o --------------- 1-- % 25 --------------- I" 1 - : ' ............ : 1 --------------- 1-- : 1~ ‘ _______ i . I I I O 1 """" r """" 1" 1 . ’035 I . l 1 . 0 0.025 0.05 0.075 0.1 0 0.025 0.05 0.075 0.1 Dispersion Parameter 6 (1113) Figure 4.19. The ITD shift AT and the coherence 'y for various bands with center frequencies indicated on the vertical axis and flat 1 / 3—octave bandwidths as a func- tion of the dispersion parameter 1:. Data points for various values of if are shown in Open symbols (circles for AT and diamonds for 'y), and best fitting curves to Equations 4.19 and 4.20 are plotted in solid lines. As either 1',“ or center frequency increases, these equations do not hold, and so no such fit lines were plotted for the highest frequency band. In principle, only one plot is necessary because 7 is a function of (5 C (see Equation 4.23). 153 150 ‘ I I 1000 ' 3:200 Frequency (Hz) Figure 4.20. Best fit parameters a, b, and m across band center frequency. These parameters fit ITD shift data and coherence data to equations of the form of Equa- tions 4.19 and 4.20. 154 center frequency wo (a dimensionless factor wo é), Equations 4.19 and 4.20 will fail to describe the behavior of AT and of 'y as either wo or g become large. Consider a calculation of the coherence, 7 = chxR (T = T). First, the half— bandwidth (5 depends on how wide the bands are chosen to be, but in general we may write: 5 = 5% The parameter 5 determines the bandwidth. For instance, 13 = 21/3 — 1 would give 1 / 3-octave bandwidths. For small can C, a simplification to Equation 4.18 may be made by noting that at low frequencies, T — To 2 a§ (Equation 4.19). Then the (do; (a (I + 5 ) + 3) 5 a l3 = — 1 ( 2 + ) The argument of the trigonometric functions in Equation 4.18 is then: —-—<> parameter 1! becomes: 8 NlOa‘RZIQ' O A Q N| If the frequency is small, then the function sin(§v2/(5) will be much smaller than cos(§v2/ (5) since the product wot: is small. In order for the sin function to be less than 5% of the cosine function, it is required that the factor wog‘, must be smaller than some limit that depends on a and E. For 1/3-octave bandwidths and using a = 0.086, this limit is wot: < 0.0254. For a dispersion parameter of g = 0.1 ms, this would allow a maximum center frequency of only 40.4 Hz. It seems that this is an unnecessarily strict condition to place on the usage of this approximation since a glance at Figure 4.19 suggests good agreement with this approximation through at least the BOO-Hz band. 155 Assuming small wag, the terms involving the sine function are ignored. Now, the remaining Fresnel integrals may be series expanded to first order via Equa- tion 4.16 (C (2) 2 2). After some elementary simplification, what is left is a low- frequency approximation for '7: 2 wag a6 1 = -—- -- — . 4.21 , ...[2fi(4+2)] ( ) Expanding this as a series for small values of the product wag yields: 2 2 4 wag a6 1 2 — — - . 4.22 7 1 8,52 ( 4 + 2) ( ) This equation has the same form as Equation 4.20 with m = 2.0, the approximate value which best fits frequency bands up to about 1000 Hz (see Figure 4.20). The coefficients on (:2 are reasonably close in this approximation to those calculated by fitting Equation 4.20. For instance, given a center frequency of 160 Hz, 6 = 0.26 (a 1 / 3-octave band), and a = 0.0862, the coherence becomes: 7 = 1 — 1.954 x 106i;2 The coefficient of {,‘2 has units of inverse-seconds squared. This is close to the value b = 1.656 x 106 that was found from fitting Equation 4.20 to the data generated from the complete solution, Equation 4.18. Alternately, this may be expressed as a function of the bandwidth 5 times g, in which case the multiplicative coefficient is unitless: 4 ~ 1 2 a 1 7—1-5056) (fig) (423) Plugging in the aforementioned values of 6 and a, this becomes: 7 = 1 — 7.15 (6&2 A derivation of the ITD for small wag involves setting the derivative of Equa- tion 4.13 or Equation 4.18 with respect to T equal to zero and solving for T (or 156 0.0—m1 1 - 1 1 x----- o ‘025' 0.5 no.75‘ 1 Bandwidth Parameter 6 Figure 4.21. The parameter a relating AT to the dispersion parameter 6 as a func- tion of the bandwidth parameter 6. An approximate relationship between a and 6 is given by Equation 4.24 to be a = 6/ 3. T — To). Efforts to this extent have thus far proved ineffective. If simplifications are made to the extent that AT is linear in g, the coefficients are largely inaccurate compared to the values that led to Equation 4.20 and Figure 4.20. The difficulty likely lies in the oscillatory nature of the functions involved. Without delving into the details of the derivation, the relationship between AT and i: for small values of wag best fits a relationship of the form: — 9 AT _ 3; (4.24) For a 1/3-octave bandwidth, g = 0.0867. This is very close to the value found when using the complete solution (Equation 4.18), a 2 0.086. Returning to the computer model using Equation 4.18, it is possible to find the coefficient a from Equation 4.19 as a function of 6. This relationship is shown in Figure 4.21. Though the exact solution gives a = 0.3076 instead of a = 6 / 3 (predicted in the limit of small wag), the two are quite close. Thus, useful equations have been found to predict the behavior of the coherence and the measured ITD in the case of linear ITD dispersion. The telltale sign of the presence of linear dispersion is not in the 157 ITD, where even if the average ITD To is known. The difference between it and the calculated ITD T is very small according to Equation 4.24, even for a calculation within broad bandwidths. Coherence within bands is a far more obvious indicator of linear dispersion. For even a moderate dispersion, the resulting incoherence is noticeable, and becomes more extensive with increasing frequency. Linear disper- sion cannot describe the behavior of the interaural coherence measured in KEMAR at angles near 45°, however, unless the dispersion parameter {f decreases with in- creasing frequency. Certainly it is possible that if is a function of both frequency and incident angle, but it is unlikely to be the only factor in the incoherence seen in KEMAR. Even then, the linear dispersion observed in the ITD spectrum of mea- surements made in KEMAR is rather small. In the 500 Hz band, for instance, the linear dispersion is approximately 0.1 ms, which would result in a coherence of ap- proximately 0.985, far larger than the coherence actually observed. The noisiness of the ITD spectrum is also a likely contributor (see Figure 4.15), and may in fact be the dominant factor in determining coherence. 4.1.5 Incoherence due to Pseudorandom Dispersion A glance at Figure 4.15 suggests a noisy ITD spectrum at lower frequencies, which may contribute to incoherence along with the apparent linear (or approximately linear) dispersion present in the ITD spectrum. Thus, a new model is proposed for the cross-correlation function. In this model, the ITD is a Gaussian random variable with mean To equal to the average ITD across the band, and standard deviation (7. ITD ~ N [T0,U] Alternatively, the ITD may be written as: ITDZT0+n 158 where n is normally distributed about zero with standard deviation 0'. Beginning with Equation 4.9, the IPD can be written as: ¢R—¢L=W(To+") And the cross-correlation function then becomes: 1 wo+5 chxR (T) = E (do—6 cos [w (r — To) +wn] dw (4.25) A computer simulation of Equation 4.25 is evaluated for values of the stan- dard deviation on the range (7 E [0, 50 143] in 1 us steps. This is accomplished by computing a Riemann sum. The random additional phase am has a different ran- dom value for each frequency component drawn from N [0, 0] added to it. These random added phases are held constant as the cross-correlation function is calcu- lated for different lags T. Lags are computed between —2.0 ms and 2.0 ms in steps of 10 143. For each value of a, the maximum value of the cross-correlation func- tion (the coherence '7) and the difference in measured versus mean ITD (T — To) is recorded. For each value of a, this process was repeated ten times and average coherences and ITDs are computed. The results of the computation of cross—correlations with pseudorandom phases are depicted for a few frequencies in Figure 4.22. The coherences behave in a way simply related to the center frequency of the band and the noise variance 02: 'y = Exp[—%wgaz] (4.26) In fact, this result can be derived from Equation 4.25. First, the cosine function is expanded to give: chxR (T) = 2.1—6 50:6 {cos [w (T — To)] cos [um] - sin [w (T — T0)] sin [wn]} dw 0_ Now, the coherence occurs for ’y = chxR (T = To). Thus, the sine terms vanish 159 1 1 ----- Mm... N . s 2 s 1' s s o g. 0.5 ' ....... , ......... - 2 . . 0 - . 1 . . . l- ''''' W """ " £ + < o F 0.5 1. ...... p ............ p ........... ..s ........... 4.... O l . l0 0 N l I 1 fi 0 8 05 ... . m o g 1 .- .......... ' T 8 § 0.5 ......... 4 ............ 4 .................. O t O ...... d 0 . 1 '" """" """"" """"" """" . " 0 20 4O 60 80 100 0 (118) Figure 4.22. Coherence in various 1 /3-octave bands as a function of the standard deviation of the noise added to the ITD of the signals. Best fitting curves corre- sponding to Equation 4.26 are drawn with the data. 160 entirely, and what is left is: 1 (00+(5 =— cos can dw 'r 25 1.10—.5 [ 1 In this form, the integral is impossible to evaluate, but statistical methods allow an evaluation of the ensemble average of the coherence (7). In fact, this is what the results of the numerical calculation of Equation 4.25 represent. 1 wo+6 (’Y) = 53 5 (cos [wn]) dw (4.27) wo— The ensemble average of cos [can] is simple if we note that the product can is, for any given value of w, a normally-distributed random variable with mean 0 and standard deviation am. If no value of w carries any more or less weight than any other value, as is true here, then (U can be replaced in the ensemble average with the average value of w. Thus, the ensemble average can be equivalently written as: (cos [wn]) = (cos [won]) The ensemble average of the cosine function can be written as an integral in- volving the probability density function (PDF) f (n) of the normally distributed random variable 71: (cos [won]) = L:f(n)cos(w0n) dn (4.28) This can be compared to the characteristic function of a normally distributed ran- dom variable evaluated at wo, x" (wo), with mean zero: Xrl(w0) = E [Ewan] [00 f (n) e’“’0"dn (4.29) Now since the PDF of a normally-distributed random variable with mean 0 is a purely even function, the odd (sine) components of the exponential term vanish in 161 the integration, leaving only the even (cosine) terms. This leaves: (wo) =/_oof c)os (won) dn (4.30) 00f Fortunately, this is exactly the integral of Equation 4.28, and so the ensemble av- erage coherence (’y) is exactly equal to this characteristic function. The character- istic function of a normally-distributed random variable is well understood. For a normally-distributed random variable with mean 0 and standard deviation 0’, the characteristic function evaluated at x is: 2612 (x) 2 Exp [—%02x2] The ensemble average coherence is therefore: ( )= Exp [— —w§o2] (4.31) This is exactly the form of the coherence found by fitting curves to the numerical data (Equation 4.26). It should be noted now that 0‘ as it has been defined here is the standard devi- ation in the noise inherent in the phase dz'fi‘erence between two signals, and not the noise inherent in the phase of either signal independently. However, if the noise in the phase of each signal is independent and identically distributed, then the variance in each signal, as, is: 202/2 Then the coherence may be related to the variance in the noise inherent to the ITD spectrum of each signal by: “r = EXp[-w30§] The incoherence noticed in KEMAR for incident angles near 45° is likely a com- bination of linear dispersion and a noisy ITD spectrum. It is apparent that even a modest standard deviation in the ITD spectrum can lead to significant incoherence. 162 Thus, it is likely the variance, and not the slope, of the ITD spectra that lead to sig- nificant incoherence. Consider for instance the 500-Hz band shown in Figure 4.15. At 45°, the standard deviation in the ITD is a = 0.27 ms. Plugging this into Equa- tion 4.31 gives ('7) = 0.7061, which is very close to the coherence arrived at via the cross-correlation function, 'y = 0.7288. 4.1.6 Comparison of KEMAR Ears It is important to note the possibility of absolute differences in the behavior or response of the left and right ears of the KEMAR. In the previous results of this ex- periment, such differences were not noticeable because those measurements were only concerned with how those binaural differences changed as the angle between the direction of incident sound and the facing direction of the KEMAR changed. In later studies concerning room acoustics, however, several measurements will be taken using a head-on (0°) geometry. At an angle of 0°, an ideal model of the head where the ears are an equal an- gular displacement from the center of the face gives zero ITD and zero ILD. In an ideal anechoic environment where there is no random incidence of reflected sound on the ears, coherence is expected to be 1 in all 1 / 3-octave bands. The same results are expected for an angle of 180° since the ears would again be equally separated from the sound source. However, asymmetry between the ears may lead to devia- tions from these ideals. Differences in measurements at 0° and 1800 may be due either to external or internal effects. Internal effects are those such as asymmetries in the location of the KEMAR ears relative to the center of the face, and differences in the height, shape, or canal length of the ears, differences in the response of the electronic components at the ears of the KEMAR or in other electronic components involved in the trans- mission and processing of the received sound. External effects are mainly those 163 of the auditory environment itself such as reflections and reverberation. To iden- tify and separate external and internal effects, measurements of coherence, ITD, and ILD in 1 / 3-octave bands taken at both 0° and 180° were compared. Theoreti- cally, both the ITD and ILD should be zero when both ears are equidistant from the sound source in an anechoic environment. Any measured values that are different from zero may be attributed to either internal or external effects. If such a differ- ence occurs but changes sign between 0° and 180°, then this difference is likely an external effect, while those that keep the same sign under a 180° rotation are likely internal. ITDs in 1 / 3-octave bands from 80 to 16, 000 Hz at 0° and 180° for both positions 1 and 2 are shown in Figure 4.23. Measurements at 0° are shown in circles, and measurements at 180° are shown in triangles. Open symbols are for measurements taken at position 1 and closed symbols are for measurements taken at position 2. Error bars smaller than the size of the points are invisible. The ITD is nearly zero, as expected, everywhere except at frequencies at or below 250 Hz, where the ITD is negative at both 0° and 180°. Errors in ITD at low frequencies may be due to the breadth of the peak in the cross-correlation functions at low frequencies (see Figure 2.2), leading to errors in estimation of the location of the peak. This also accounts for the variance seen in ITD measurements at low frequencies, whereas there is almost no variation in ITD estimations in higher frequency bands. Since the ITDs at low frequencies are negative for measurements taken at both 0° and 180° at either position, this suggests an internal effect. One possibility is that a capacitor in the preamplifier is responsible for causing a phase shift in low- frequency inputs, though it is unusual for such a thing to happen for frequencies as high as 200 Hz, as is observed here. If internal effects are indeed responsible, then there may also be an effect noticeable in the measurement of coherence across fre- quency bands. Such a plot is shown in Figure 4.24. Another possible explanation 164 0 VIII I I IUIIIT I I T‘Il'l .2, | l l I I I I O I O _ Position 1 0 n«n~r«uu~nuuuuu«-~ - 100 1000 10000 Frequency (Hz) Figure 4.23. Measurements of the ITD taken at 0° (circles) and 180° (triangles) in an anechoic environment in 24 1 / 3-octave bands from 80 to 16000 Hz. Measurements were taken at two different positions in the anechoic room, each with the same separation between KEMAR and loudspeaker. Measurements taken at position 1 are indicated by open symbols and are shown in the top panel. Measurements taken at position 2 are indicated by filled symbols and are shown in the bottom panel. 165 IIIT' I I ‘IIIII' I I IIIIIII I I I I Position 1 j llll I I 08 IIII a ITII I q... .1 d .4 .1 qt- q— 4. 1- 4 q q d .1 a -d Coherence 7 I I I I I I I J I i I I I I l I I I i 100 1000 10000 Frequency (Hz) 0.8 III I Figure 4.24. Measurements of the interaural coherence taken at 0° (circles) and 180° (triangles) in an anechoic environment in 24 1 /3-octave bands from 80 to 16000 Hz. Measurements were taken at two different positions in the anechoic room, each with the same separation between KEMAR and loudspeaker. Measure- ments taken at position 1 are shown in the t0p panel, and measurements taken at position 2 are shown in the bottom. lies in the previously noted imperfection of the anechoic environment, resulting in some reflections at low frequencies. However, external effects are expected to change sign between 0° and 180°, and no such sign change is noticed here. In Figure 4.24, a drop in coherence can be seen for frequencies at or below 125 Hz at both positions 1 and 2. This is likely due to the imperfection of the anechoic environment at low frequencies. The deviation in measured coherences from the expected value of 1 is more limited in the range of frequencies over which it occurs than the imperfections seen in ITD. Thus, it seems that the deviations in ITD at frequencies below 125 Hz is likely a combination of room (external) effects 166 and internal effects. If so, then a similar phenomenon is expected when 1T D3 are measured in other locations due to the internal effect. It should be noted, however, that the front-back differences suggest that ITDs agree with ideal predictions at all but the lowest frequency band of 80 Hz. Deviations in coherence at high frequen- cies in the anechoic room are likely an external effect caused by reflections off of objects in the anechoic room which are not padded. At 10000 Hz, the wavelength of sounds in air is about 3.5 cm, and even small reflective objects may produce some scattering. Measurements of the ILD at 0° and 180° are shown in Figure 4.25 in a manner similar to that of Figure 4.24. First, it should be noted that most ILDs are above 0 dB, usually near 1 dB. This consistent difference is likely due to a mismatch in gain between the left and right channels of the preamplifier (an internal effect). This mismatch seems to have been consistent over measurements made at both positions 1 and 2, as evidenced by good agreement of measurements made at 0° and at 180° at both positions. There are significant ILDs in the measurements. These are especially noticeable at high frequencies, but the magnitude of the ILDs observed are relatively small compared to the range of naturally-occuring ILDs at these frequencies (see for in- stance Figure 4.9). In the 16, 000 Hz center frequency, an ILD of about —2 dB is observed for an angle of 0° and an ILD of more than 2 dB at 180°. This indicates excellent symmetry about the setup angle at high frequencies, where sensitivity to small errors in orientation is expected to be greatest. Given the sign change, this seems to be an external effect of the room and may point to another frequency re- gion where the behavior of the anechoic room is not ideal. Support for this notion comes from the high frequency coherences (see Figure 4.24), which drop below 1 at high frequencies. Some reverberation in the room may occur for high frequency bands, where 167 3 I I I I I I I I I I I I I f I I I I I I . 2: __ 5 Position 1 ......... _ f : 1 r -- q o e- ; ; : -a 1: s . ‘ I I ? C)O° 3 A _2 L. ........ ;- ................................... E ........... A 180° - ‘ 0.3 : s ° : v -3 b I I I I I I I I I L I III. 4 I I I I I I I - O - I I I I! . I. T I I I I I1! I I I I I I I I : __._ 2 -Pos't'on’c‘ ....................... .- 1 :b. I ......... 1 / : 0 ----------;~ ---------------------------------------------------------------------- i : § 1 -1 -~-------- ----------------------------------------- —; I E I '2 r- '''''' ? -------------------------------------------------------------- . -3 - I I I II I I I I I I I II I I I I L I I I . 100 1000 10000 Frequency (Hz) Figure 4.25. Measurements of the ILD taken at 0° (circles) and 180° (triangles) in an anechoic environment in twenty-four 1 / 3-octave bands from 80 to 16000 Hz. Measurements were taken at two different positions in the anechoic room, each with the same separation between KEMAR and loudspeaker. Measurements taken at position 1 are indicated by open symbols of either type and are located in the top panel. Measurements taken at position 2 are indicated by filled symbols and are located in the bottom panel. 168 reflections off of even small or thin non-absorbing objects in the room may occur. This would then lead to a decrease in interaural coherence and an ILD. A patch panel near one corner of the room, though curtained with sound-absorbing foam, is a likely source of such reflections. However, it seems unusual for the effect to persist in the same manner when the KEMAR and Mackie were moved from one position to another. Instead, this effect at high frequencies might result from asymmetries in the pinnae of the left and right ear. The observed interaural differences are unexpected in the ideal model of a rigid, spherical head with symmetrically located ears. At 0° and 180°, the sound source is located in the MSP. Though measurable, these binaural cues in the me- dian sagittal plane are small, and are not thought to be particularly useful in aiding sound localization [70]. 4.2 Further Experiments on the Coherence at 45° Thus far, the degraded coherences measured in KEMAR near angles of :l:45° have gone largely unexplored. In this section, several experiments are presented which attempt to determine the cause for these incoherences and their possible effect on human listeners. Experiments are also presented which attempted to measure this effect in listeners. 4.2.1 Variations on Methods of Coherence Measurement in KEMAR A set of further measurements were made with the KEMAR in anechoic condi- tions to further investigate the nature of the drops in coherence at :l:45°. Under test conditions identical to those previously used in Experiment 9, several more 169 measurements were made at angles of 45°, 0°, and -—45°. For each set of measure- ments, the experimental setup was varied in one of the following ways: 0 A. KEMAR wearing external microphones near ears 0 B. KEMAR with both pinna removed 0 C. KEMAR with both pinna removed and skull smoothed o D. KEMAR only left pinna removed a E. KEMAR only right pinna removed 0 F. IVC foam dummy head 0 G. KEMAR as normal for this experiment, with both pinnae o H. KEMAR without torso Methods In variation A, KEMAR was fitted with two small microphones placed about 1 cm in front of the KEMAR ear canals on each side of its head. Recordings were made through those microphones instead of through the KEMAR ears. This ef- ‘ fectively eliminated the effects of the pinna, ear canal, and internal electronics of the KEMAR itself. In variation B, recordings were made through the KEMAR ears, but both pinnae were removed. This left a recess on each side of the KEMAR head in which was the KEMAR internal microphones. This eliminated the contribution of both the KEMAR pinnae and ear canals but not the internal recording equipment. In variation C, the recesses left on each side of the KEMAR head when the pinnae were removed were filled by pieces of cardboard. The cardboard pieces each had a hole cut into them such that sound could get through to the KEMAR . microphones. In this way, the KEMAR was fitted with ear canals while still lacking pinnae. This effectively eliminated the indentations on each side of the KEMAR head, leaving a smooth surface with a hole for the ear canal. In variations D and E, only one pinna was removed at a time. Nothing was inserted to take the place of the missing pinna. Measurements at all three angles of 170 interest were made with the only left pinna missing, and again with only the right pinna missing. In variation F, the KEMAR was not used at all, but instead a IVC foam dummy head designed to make binaural recordings was used. The IVC head was fitted with a headphone-like apparatus with microphones on the outside where the ear canals of the dummy head would be. The IVC head was attached to a microphone stand and its height off the ground was adjusted so that the height of the IVC microphones was the same as those of the KEMAR ears. In variation H, the KEMAR head was removed from its torso and situated on a microphone stand. The height of the stand was adjusted such that the KEMAR ears were at the same height as they would have been with the torso still attached. Both pinnae were in place, and recordings were made through the KEMAR ears. In this way, the effect of the torso could be measured. Results The results of the coherence measurements made at 45°, 0°, and —45° for all seven variations of the experimental setup are shown in Figure 4.26. At an angle of 0°, the coherence does not differ much across different situations, except for some minor differences at very low and very high frequencies. At angles of :l:45°, however, there is a clear effect in certain variations. At —45°, coherence is significantly worse at frequencies up to about 1250 Hz in variations D (only right pinna present) and G (both pinna present). At 45°, coherence is degraded for variations E (only left pinna present) and G. No other variation changes the coherence so significantly at :l:45° compared to 0° as do vari- ations D, E, and G. In variations D and E, the coherence is only degraded when the pinna that is present is near to the sound source. For instance, when the KEMAR is turned 171 ' f I I rTI‘I' I U I "III' I l .0 0.8 0.6 0.4 0.2 I Y :III.."V ‘v V ’L";"A"‘ r.' V ‘ 'I l .|. " a“, " - vr v “'v; 'v ' l” "' '8‘. . I v” )" 1"?fo v / V I I I I I I I I I I LI L IfT‘I'I'I'I'I l .0 0.8 0.6 0.4 0.2 cmwmwv I'I'I‘ITIII +§N O o I I I I I IJ I I I I I I O I 1 .0 0.8 0.6 0.4 0.2 'I'I'I'I'l'l IJIIIIIIIIIII O I l I I IIIJIJJ l I III-II l 100 1000 10000 Figure 4.26. Coherences measured in 1 / 3-octave bands in an anechoic environ- ment using several variations of the experimental apparatus to record the signals in both ears. Each variation is labeled with a different letter and is plotted separately with different symbols. The variations are A: measurements through headphone- mounted microphones placed on KEMAR; B: recordings through KEMAR ears but with pinnae removed; C: as variation B, but with cardboard ”ear canals;” D: as variaton B, but with only the left pinna removed; E: as variation B, but with only the right pinna removed; F: recordings through microphones mounted on a JVC foam dummy head; G: recordings made through KEMAR normally with both pin- nae present. 172 to —45°, the loudspeaker is to its right, and the right ear is closer to the sound source than the left ear. If the right pinna is present, then the coherence is less than it would be if the right pinna were absent, regardless of the presence of the left pinna. A similar point can be made about the left pinna in the 45° configuration. The unequivocal result is that the drops in coherence only occur when the pinna of the ear nearer the sound source is present. Results of Removing the Torso The effect of removing the torso can be seen by comparing variations G and H as shown in Figure 4.27. The presence of the torso had a statistically significant effect on the coherence (in a one-sided, paired t-test, t = 5.37, df = 37, p < 0.001), but it is clearly not the dominant effect. The mean improvement in coherence, plus and minus one standard deviation, at i45° when the torso is removed is 0.03 :1: 0.033. Even if only the frequency bands below 1 kHz are considered, the mean improvement in coherence when the torso is taken away is 0.05 i 0.035. The effects of the shoulders and torso have been noted to have an effect on the measurement of head-related transfer functions (HRTFs) at frequencies as low as 1 kHz [25]. Low- frequency effects of the shoulders and torso have been previously noted to provide important elevation cues [5] (as do the pinnae). However, the effect observed here would point to a low-frequency effect of the torso in the horizontal plane, albeit a small one. The degradation of interaural coherence at :l:45° seems to be primarily a low- frequency pinna effect. The effect on the ITDs within frequency bands shown in Figure 4.15 is then due to the pinna nearer the sound source, and the opposite pinna has little or no effect on the coherence. However, no explanation has yet been given for why angles around :l:45° are so special in this respect, and why a similar effect does not occur at 135°. In the geometry of the pinnae, the angle of 173 I I I I IIIII' I I I ITIII' w; W - 0.8 / .. 4f . 0.6- + 1 0.4- . 0.2- _450 " 0' 1 : .- ::::::' : : ::::“' - - 160 tdoo 16600 q 1.0- WW 4 :~ . . gua- - 0.6b .. + G - - 4‘ H 0.4.. «- o - . 0.2- 00 "' 0" ' . .. ::::' . . ::4:“ ' - 100 two “000 - p .- l.0- W .. 0.0” - b 6+, d 0.6_ + d 0.4- - b d 0.2“ 450 - 0 1 . . ......i J . ......1 ‘l 100 1000 10000 Figure 4.27. Coherences measured in 1/3-octave bands in an anechoic environ- ment normally through KEMAR with pinnae in place (variation G, + symbols), and though a KEMAR head, detached from the KEMAR torso (variation H, up- arrow symbols). 174 45° does not seem to have any special place. The angle between the pinnae and the side of the head, for instance, is on average 24.1° for the average adult, and 22° for KEMAR [24]. The main effect of the pinnae is traditionally thought to be to aid in sound localization in the median saggital plane. This is accomplished by means of the comb-filtering that occurs when reflection of sounds off the pinna interfere with the direct sound incident on the pinna [7]. These effects are thought to be par- ticularly important above about 6 kHz (corresponding to a wavelength of about 5.7 cm), and this makes sense given that the length of the KEMAR ear is reported to be 5.89 cm by Burkhard and Sachs [24]. At a low frequency of, say, 200 Hz, the wavelength of the sound (1.72 m) is 29 times the length of the pinna, and so scatter- ing by the pinna at low frequencies must be very small. At even lower frequencies, any scattering by the pinnae must be even smaller. An effect is observed, however, which is greater at lower frequencies and yet is attributed to the pinnae. It seems implausible then that simple acoustical scattering by the pinnae is responsible for the degraded coherences observed at certain angles of incidence. 4.2.2 Localizability of Noise Bands at 45° by Human Listeners Poor interaural coherence potentially leads to poor localization of sounds via ITD cues. However, coherence is generally more important in the ITD localization of higher frequency noises (where ILD cues are not present), and becomes less im- portant for lower frequencies. Constan [29] reports measurements of the threshold coherences necessary to correctly localize shifts in ITD of high-pass filtered noises. He observes that higher frequency cutoffs require greater coherences (nearer to 1) in order for a particular magnitude of ITD to be detectable. The drops in coherence measured previously in this experiment for KEMAR recordings at a 45° angle of incidence are most dramatic at low frequencies, and therefore may not be as im- 175 Angle ’7 ITD (ms) —45° 0.6865 -0.2816 45° 0.7303 0.2400 Table 4.1. Coherence and ITD of 500-Hz noise bands recorded through KEMAR at incident angles of :l:45° in an anechoic environment. portant a factor in human localization of sounds in those frequency bands. The depth of the clips in coherence suggest, however, that even low-frequency localiza- tion tasks may become difficult for sounds with those coherences. This experiment measured the effect of noises with coherences as measured though KEMAR on the localizability of sounds by human listeners. Methods To determine the significance of the drops in coherence at 45°, the KEMAR record- ings made at i45° were used in a discrimination task designed to determine the ability of real listeners to localize coherent and incoherent noise bands. Two sets of recordings were made through KEMAR at —45° and +45°. These noises were filtered into a 1 / 3-octave band centered at 500 Hz. The coherences of the noises in this frequency band as well as the measured ITDs are shown in Table 4.1. An ”incoherent” noise presentation was made to listeners by presenting them with the signals exactly as measured through the KEMAR ears for :l:45° incident angles. This was referred to as a ”standard incoherent noise pair.” A coherent noise presentation was made by presenting the same signal (that recorded in either the left or the right channel) to both ears, and thus the signals in this presentation had an interaural coherence of exactly 1. In the case of coherent noise presentations, the 176 signals were circularly shifted so as to match the ITD, and scaled so as to match the ILD of the incoherent presentations at the same angle. This was then referred to as a ”standard coherent noise pair.” The standard coherent and standard incoherent noises were perceptually shifted in space by altering the ITD between the left and right ear signals. Since ITD is thought to be the dominant cue for binaural sound localization in the re- gion of 500 Hz, only the ITD was altered. Also, the relative magnitude of ILDs was small in the 500 Hz band, so a small shift in incident angle, which corresponds to a significant change in ITD, corresponds as well to a very small change in ILD. Thus, it was unnecessary to alter the ILDs. The standard noises were subjected to a set of delays (in number of samples) of {0, :l:3, i6, 21:9, i12, 3:15}. These noises were recorded at and were to be played back at a sample rate of 195, 312.5 Hz, so that every three samples of shift repre- sents a change in ITD of 15.4 us and a corresponding change in perceived angle of, coincidentally, approximately 3°. Noises were shifted such that negative delays corresponded to a perceptual shift to the left and positive delays corresponded to a shift to the right. In this way, a set of shifted noises was created. Listeners were presented with a task in which two noises were presented, one after the other, with a 500 ms pause between them. The first noise of the stimulus was a standard noise and the second was a shifted noise. Listeners were asked to indicate whether they heard the second noise to the left or to the right of the first. The full delay set was presented four times in random order, giving 44 noise pairs. This constituted a single run. In any given run, the noises presented to the listener were either all coherent or all incoherent. All noise pairs in a single run corresponded to only one angle, and were all co- herent noises or incoherent noises. Thus, there were four main conditions: 45° co- herent, 45° incoherent, —45° coherent, and -45° incoherent. Listeners were given 177 a training period of several runs of both coherent and incoherent noises at each angle until their performance at both tasks no longer improved. This process of learning how to do the task was especially important in the case of incoherent noises. Listeners were then put through three runs of each condition for which their discrimination responses were recorded. Four listeners, all of them males between the ages of 21 and 27, participated as subjects. All four had normal hearing, and were inexperienced in listening ex- periments of this type. Stimuli were presented at a level of 70 dB SPL through Etymotic ER-7 in-ear headphones. The headphones were placed in the ear canals of the listeners, so that the only pinna effects present were those of the KEMAR. The percentage of times a listener responded that they heard the second noise to the right of the first was calculated for each value of the delay. A response rate at or above 75% was taken to be the threshold for correct discrimination of positive delays, and a response rate at or below 25% was taken to be the threshold for neg- ative delays. By finding the smallest delay for which threshold discrimination was reached, a just-noticeable difference (IND) in shift was measured for each listener in each condition. Larger INDs indicate greater difficulty in localizing sounds. Results The results for all four listeners are shown in Figures 4.28 and 4.29. The INDs for shifts to the left are the delays at which curves crossed the 75% line, and the IND for shifts to the right are the delays at which curves cross the 25% line. Open circles represent incoherent noise, and filled-in circles represent coherent noise. Listeners unanimously described the coherent noise bands as compact and easy to localize, whereas the incoherent noises were quite diffuse and difficult to local- ize. Listeners were able to perform well in the coherent localization task almost im- mediately, but required several runs with the incoherent noise before they learned 178 2 Response Left or'...j ..... i'”l ..... i ........ iii. ..... i‘" ””i'” ..... i ' J l i -15 -9 -3 0 3 9 15-15 -9 *3 0 3 9 15 Delay (samples) Figure 4.28. Percentage of ”right” responses as a function of delay for noises at an angle of 45° for four different listeners. Responses corresponding to coherent noises are plotted with filled circles. Responses corresponding to incoherent noises are plotted with open circles. Thresholds for discrimination are plotted as solid horizontal grid lines. A shift of one sample corresponds to a perceptual shift of approximately one degree. 179 0.75 .° 01 0.25 O p 0.75 2 Response Left .0 an 0.25 115-15 Delay (samples) Figure 4.29. Percentage of ”right” responses as a function of delay for noises at an angle of —45° for four different listeners. Responses corresponding to coherent noises are plotted with filled circles. Responses corresponding to incoherent noises are plotted with open circles. Thresholds for discrimination are plotted as solid horizontal grid lines. A shift of one sample corresponds to a perceptual shift of approximately one degree. 180 to localize the incoherent noises at all. Following training, listeners described vari- ous methods that they had developed in the learning process to help them localize the incoherent noises, such as listening to the ”center” of the noises or focusing on only the highest frequency parts of the noise. No listeners reported that such special techniques were necessary for the localization of the coherent noise. The INDs for each listener in each condition are summarized in Table 4.2. The left side of the table contains the INDs for sounds moving to the left, and the right side of the table shows INDs for sounds moving to the right. In all cases, the IND for coherent noises is less than that for incoherent noises. In two cases, for inco- herent noises moving to the left at 45° and moving to the right at —45°, listener E did not reach the threshold of discrimination within a shift of 15°. The average difference between the magnitude of the INDs of incoherent and coherent noises in the same conditions across listeners was 5°, or an ITD of approximately 26 ys. The INDs for incoherent noises in each condition were significantly greater in mag- nitude than their coherent counterparts (in a paired, one-sided t-test, t = —6.88, df: 13, p < 0.001). 4.2.3 Measurements in KEMAR and Human Listeners with Probe Microphones A different method of measuring the interaural coherence in KEMAR, which could be repeated in real listeners, would be to make recordings of signals through in-ear probe microphones. These microphones receive sound at the ends of long plastic tubes which can be snaked into the ear canal of the subject. Using such a technique on KEMAR would serve to confirm the drops in coherence seen in Experiment 9 for angles near :l:45°. More importantly, using such a technique on real listeners would determine whether or not this phenomenon was specific to KEMAR, or was actually present in human listeners. 181 Left IND Right IND Listener Coherent Incoherent Coherent Incoherent 45° —45° 45° —45° 45° —45° 45° —45° E 7° 6° > 15° 9° —3° —3° —15° < —15° N 3° 2° 5° 9° —2° —2° —8° -5° Z 4° 2° 6° 9° —3° —2° —9° —8° L 5° 5° 10° 10° —6° —5° —8° —9° Avg. 48° 38° > 90° 93° —3.5° —3.0° —10° < —9.3° Table 4.2. Iust-noticable differences (IN Ds) for coherent and incoherent noises moving to the left or to the right. INDs for sounds moving to the left are on the left side of the table and INDs for sounds moving to the right are on the right side of the table. One degree on the table corresponds to an ITD of 5.1 ns. Incoher- ent sounds at 45° had an interaural coherence of 0.7303 and those at —45° had a coherence of 0.6865. These INDs tend to be significantly larger for the incoherent noises than for coherent noises, indicating greater difficulty of listeners to localize incoherent noises under otherwise identical conditions. Methods Three listeners from the experiment on the localization of noise bands recorded through KEMAR participated in this experiment (E, N, and Z). Listeners were seated on a rotating stool in the anechoic room used in previous experiments 3.05 m from the loudspeaker. Small paper markers on the walls of the room at approximately eye level indicated where the listeners should fixate in order to face in the —45° and +45° directions, respectively. Listeners were fitted with a velcro headband. Attached to the headband near the ears were the connectors for ER-7C probe microphones. A thin plastic tube was snaked from the connectors into the ear canal of each listener. Sound recorded at the ends of the tubes were fed to the external preamplifier. No special care was taken to ensure that the position of the microphones was identical in each listener, except to make certain that the end of the tubes was well inside the ear canal. Though this may be expected to lead to variance across listeners, the point of this experiment was not to make identical measurements in different listeners, but to 182 look for a large effect on coherence in human listeners at :l:45° when compared to 0°, and so the present setup and methodology was sufficient. Measurements were made through the probe microphones for listeners facing each of three positions: —45°, 0°, and +45°. At each position, listeners were ex- posed to three different MLS noises of order 18 at a level of 85 dB SPL. Recordings were made at a sample rate of 195,312.5 Hz. These conditions were identical to those used to test KEMAR in previous experiments in the anechoic environment. Listeners were instructed to sit upright, fixate directly on the paper marker on the wall, and to not move in between noise bursts. Once three noises were measured at one position, listeners were instructed to gently rotate themselves on the stool until they were facing the next position, and another three noises were presented and recorded. Listeners were instructed to begin by looking at the —45° marker, then directly at the center of the loudspeaker (0°), then at the +45° marker. No special measures were taken to ensure that the orientation of the listeners’ heads was exact beyond what has been described already. However, the drop in coherence observed in KEMAR was reasonably broad, and so it should not be necessary for a listener to be oriented such that their interaural axis is exactly 45° to the incident sound. An identical set of measurements was made using the probe micrOphones in the KEMAR. Results The interaural coherences were measured in 1/3-octave bands between 200 and 10, 000 Hz. Measurements in frequency bands lower than 200 Hz were unreliable because of the poor response of the probe microphones at those frequencies. The results of these measurements for each listener are shown in Figure 4.30. The av- erage coherence measured for each listener in each band is plotted, with error bars spanning one standard deviation in each direction. 183 0.9 0.8 i 0.7 Coherence 7 0.9 0.8 0.7 Frequency (Hz) Figure 4.30. Coherences measured in three human listeners and a KEMAR at inci- dent angles of sound of :l:45° and 0°. Averages and standard deviations are calcu- lated over three measurements made at each angle. Error bars span one standard deviation in each direction. Error bars smaller than the size of the data points are invisible. 184 The results show that coherences as measured in human listeners are also some- what degraded at angles of i45° compared to coherences at 0°, but not as much as in KEMAR. There is significant variation in the coherences measured at :l:45° across listeners, but this is not surprising given that the probe microphone tubes were not placed in any precise way in the ear canals of the listeners. To reiterate, the precise numbers are not as important as the observation of the overall effect in this study. Some greater care was taken with probe microphone measurements in KEMAR than was taken in human listeners. In the case of KEMAR, the probe micro- phoness were inserted as far in as possible, until they nearly touched the trans- ducer ”ear canals.” In this way, a measurement as similar as possible to those measured through KEMAR itself was made. A plot of the coherence across 1/3- octave bands at 45° measured both through KEMAR and with probe microphones inserted into KEMAR is shown in Figure 4.31. The probe microphone measure- ments yield almost identical results to those made though the KEMAR electronics, and the two have a Pearson product-moment correlation coefficient of r = 0.986. Measurements in KEMAR with probe microphones thus confirm measurements made through the KEMAR ears. The degradation of coherence around :l:45° in human listeners is evident in Figure 4.30, but to far less an extent than in measurements made in KEMAR. Com- paring coherences for the bands from 200 to 1000 Hz, where the differences be- tween KEMAR and human listeners is most evident, the coherences measured in KEMAR were significantly less than those measured in any of the listeners (in sev- eral one-sided paired t-tests between listener and KEMAR data, t < —7.53, df = 7, p < 0.001). In the 500 Hz band at +45°, for instance, the average coherence across all three trials, plus and minus one standard deviation, measured in KEMAR was 0.799 i 0.077. In the same band, the average coherence across all three human lis- 185 Coherence 'r Frequency (Hz) Figure 4.31. Coherences measured at 45° from recordings made through KEMAR Zwislocki couplers and electronics (+ symbols) and through ER-7 probe micro- phones (open squares). 186 teners was 0.94 :1: 0.056. Though the drop in coherence measured here for human listeners is much smaller than that measured in KEMAR, such coherence drops should be nonetheless detectable [29]. This may still imply that there is an effect on the ability of listeners to localize sounds incident at or around i45°. 4.2.4 Localization of noises with coherence as measured in hu- man listeners The sounds presented to listeners in previous parts of this experiment were noises filtered into 1 /3-octave bands. Previous experiments regarding the detectability of incoherence indicate that human listeners can be quite sensitive to even a mod- est amount of incoherence. At a center frequency of 500 Hz and a bandwidth of 100 Hz, listeners can discriminate perfectly coherent noises from noises with a co- herence as high as, on average, 0.99 [29]. The 1 / 3-octave bandwidth about 500 Hz as used in this experiment is approximately 115 Hz. Gabriel and Colbum [42] note that discrimination of interaural incoherence compared to a reference condition of coherence 1 is best at or below bandwidths of 115 Hz. Human listeners were shown to experience a far less dramatic drop in coher- ence for incident angles near :l:45° than those measured in KEMAR. However, the average coherence measured in listeners at i45° was 0.94, easily within the thresh- old of incoherence detection at the 500 Hz band [29, 42]. Previously, listeners were presented signals as recorded through KEMAR filtered into a 1 / 3—octave band cen- tered at 500 Hz, which had an average interaural coherence of 0.71. It is likely that noises with coherences equal to those actually measured in listeners are far easier to localize than those previously presented to listeners. In this experiment, noises were constructed in a 500 Hz band identical to that used in previous experiments but with an interaural coherence equal to that mea- sured in human listeners. These noises were presented to listeners in a manner 187 Listener Coherence '7 —45° 0° +45° E 0.9763 0.9960 0.9899 N 0.9363 0.9868 0.9000 Z 0.9337 0.9972 0.9524 KEMAR 0.7971 0.9942 0.7988 Table 4.3. Interaural coherences in the SOD-Hz band from signals measured with probe microphones in KEMAR and human listeners E, N, and Z. identical to that of the previous IND localization experiment. In this way, the abil- ity of listeners to localize noises with coherences as measured in their own ears was explored and compared to previous results for the localization of perfectly coherent noises. Methods A localizability experiment was performed in exactly the same manner as the pre- vious localizability experiment for human listeners. Stimuli were made from MLS signals, altered to have the same phase and amplitude spectra as those signals measured in each listener’s ears using probe microphones. The coherences of the noises presented to each listener were then exactly the same as what was previ- ously measured. These coherences in the 500 Hz band are summarized for each listener and for KEMAR in Table 4.3. Results The results for all four listeners are shown in Figures 4.32 and 4.33. The INDs for shifts to the left are the delays at which curves cross the 75% line, and the IND for shifts to the right are the delays at which curves cross the 25% line. Open circles represent incoherent noise, and filled-in circles represent coherent noise. All listeners described this task as being quite easy compared to the task where 188 -10 -6 -2 0 2 6 10 Delay (samples) Figure 4.32. Percentage of ”right” responses as a function of delay for noises at an angle of 45° for four different listeners. Noises presented to each listener had interaural coherences equal to that measured in that particular listener’s ears in the 500 Hz 1 / 3-octave band. Responses corresponding to coherent noises are plotted with filled circles. Responses corresponding to incoherent noises are plotted with open circles. Thresholds for discrimination are plotted as solid horizontal grid lines. A shift of one sample corresponds to a perceptual shift of approximately one degree. 189 0— . -6 -2 0 2 6 10 Delay (samples) -10 Figure 4.33. Percentage of ”right” responses as a function of delay for noises at an angle of —45° for four different listeners. Noises presented to each listener had interaural coherences equal to that measured in that particular listener’s ears in the 500 Hz 1 / 3-octave band. Responses corresponding to coherent noises are plotted with filled circles. Responses corresponding to incoherent noises are plotted with open circles. Thresholds for discrimination are plotted as solid horizontal grid lines. A shift of one sample corresponds to a perceptual shift of approximately one degree. 190 Left IND Right IND Listener Coherent Incoherent Coherent Incoherent 45° —45° 45° —45° 45° —45° 45° —45° E 5° 4° 6° 4° —6° —4° —6° —4° N 4° 2° 4° 2° —2° —4° —2° —4° Z 2° 3° 2° 1° —5° —5° —5° —4° Avg 37° 37° 40° 23° —4.3° —4.3° —4.3° —4.0° Table 4.4. Iust-noticable differences (INDs) for coherent and incoherent noises moving to the left or to the right. INDs for sounds moving to the left are on the left side of the table and INDs for sounds moving to the right are on the right side of the table. Incoherent sounds at 45° and —45° had coherences for each listener identical to that measured in that listener with probe microphones. The specific coherences for each listener can be found in Table 4.3. There is no significant dif- ference in INDs for the coherent and incoherent noises. the noises had coherences as measured in KEMAR. The INDs for each listener in each condition are summarized in Table 4.4. The left side of the table contains the INDs for sounds moving to the left, and the right side of the table shows INDs for sounds moving to the right. There was no significant difference between INDs for coherent and incoherent noises in this case (in a two-sided, paired t-test, t = 0.00, df= 11, p = 1.000). That the INDs for each listener are not different when listeners are presented with coherent noise versus when the noise has characteristics as measured in the listeners' ears may indicate a familiarity of each listener with their own acoustics. However, it is just as likely that the coherences measured in listeners’ ears were so high that, for a 500 Hz center frequency, noises were just as easily localizable as if the coherences had been equal to 1, even though listeners reported being able to detect the presence of incoherence. In any case, though the effect on interaural co- herence with sound source location seen in KEMAR is present in human listeners, the magnitude of the effect is much smaller and likely does not adversely affect horizontal localization. 191 4.3 Experiment 10: Measurements of Binaural Para- meters and Room Characteristics in a Highly Re- verberant Environment Measurements made on KEMAR in anechoic conditions form a baseline for mak- ing further measurements in a real room. The patterns and data collected in ane- choic conditions reflect the effects of the geometry of the KEMAR head, binaural asymmetries, and other such parameters relating to the experimental equipment. Thus they provide a basis for measurements made in a reverberant environment such that the effects of the environment and the effects of the equipment might be separated. In the previous experiment, binaural parameters were measured for a KEMAR in an anechoic environment. While such measurements provide a useful set of baseline measurements, most environments in which listening takes place are not nearly anechoic. Listeners experience the effect of their auditory environment, such as reflections off of walls, and cope with these effects to varying degrees of success. For instance, perception of reflections of impulsive sounds within rooms is largely suppressed by the auditory system. Listeners learn to ignore these re- flections and do not perceive them as separate auditory images (echoes) unless the delay in arrival time between the direct sound and a reflection is longer than about 10 ms for clicks, and 50 ms for speech [46]. The current experiment examines the influence of rooms on binaural parameters and how this relates to the reverberant properties of the room. 192 4.3.1 Methods A KEMAR was placed in a reverberant chamber with dimensions 7.67 m by 6.35 m by 3.58 m high (the same reverberation room as used by Hartmann, et al. [51]. A MLS signal was played using the same equipment as in previous experiments, with the loudspeaker situated 3.05 m from the KEMAR, at approximately ear level. Measurements were taken through the KEMAR of five different MLS stimuli, all of order 19. Such long signals were necessary since it was expected that the room would, in some bands, have reverberation times on the order of two to three sec- onds. However, this also greatly increased the processing time required and so the sampling rate was reduced to 97, 656.25 Hz. This resulted in signals 5.37 3 long. This sampling rate was sufficient for measuring frequency bands through 10 kHz. The Schroeder frequency of this room (Equation 2.8) was expected to be between 200 Hz and 300 Hz. To eliminate the influence of standing waves in the room, measurements were taken at six different positions in the reverberant room. Each position placed the loudspeaker and KEMAR in a different place in the room, but both were always separated by 3.05 m with the loudspeaker and KEMAR ”facing” each other. Five measurements, each with a different MLS of order 19, were taken at each position. In this way, measurements of the waveform coherence 'y, ITD, and ILD were taken, as well as measurements of RT60 in twenty-two 1/3-octave bands from 80 Hz to 10000 Hz. A unique IR may be calculated from the signal in each ear of the KEMAR for each MLS, and a RT6O in each 1 / 3-octave band may be calculated for each IR. Thus, two estimates of RT60 were made in each 1 /3-octave band for each MLS played. At each of the six positions tested there were then 10 measurements of RT60 per center frequency, leading to a total of 60 measurements of RT60 in each 1 / 3-octave band. Averages and standard deviations for RT6O in each band were calculated 193 across these 60 measurements. 4.3.2 Coherence Results Coherences in each 1 / 3-octave band were averaged across all six positions in the reverberant room for which data were taken. The average coherences in each band are shown in Figure 4.34. Error bars extend to plus and minus one standard de- viation. As done for measurements in the anechoic environment, coherences were corrected for period errors (peaks in coherence were expected to occur for lags near 0 ms). The differences in coherence between the correct peak and the coher- ence in the case of a period error were, on average (plus and minus one standard deviation), —0.009 :t 0.0097. Because the differences between the heights of the accepted peaks and the period error peaks of the cross correlation functions are so small, the expected peak is selected rather than the period error when measur- ing ITD and coherence. It is reasonable to expect that the auditory system is able to perform a similar operation by comparing the ITD in each critical band to that measured in lower critical bands to ensure it has found the ”correct” peak. The shape of this graph is quite similar to that of measurements made by Hart- mann, et al. [51] in the same room. For frequencies up to about 5000 Hz, the pattern of coherences is in fact extremely similar. At frequencies above 5000 Hz, measure- ments made here and those made by Hartmann, et al. are quite different, with coherences measured here generally higher on average by 0.184. It may be ar- gued that measurements made in the present study are more accurate at higher frequencies because the sampling rate used in the present study (about 100 kHz) was twice that used by Hartmann, et al. (50 kHz). Higher sampling rates yield bet- ter temporal resolution in the cross-correlation function, and thus more accurate measurements of coherence. The variances in the data measured by Hartmann, et al. were significantly 194 0.8 ‘ .0 m .0 A Coherence 7 0.2 0 0 I 0 Ill 1 j Lllllll I IJJIIII 100 1000 10000 Frequency (Hz) Figure 4.34. Average coherences across all measurements in 1 /3-octave bands made in a reverberant room at a distance of 3.05 m from the sound source. Er- ror bars extend one standard deviation in each direction. greater than the variances measured here (in two-sample, one-sided, t-test, t = 1.57, df = 36, p = 0.063). Greater variances in the earlier study were likely due to the types of noises used. Hartmann, et al. made use of a sequence of equal-amplitude random phase noises, each filtered into one of several 1 / 3-octave bands until 20 noises were created with center frequencies ranging from 142 Hz to 9000 Hz. The noises were played in five-second bursts, one after the other, and separate recordings were made for each noise band. In the present study, a single noise with a flat frequency amplitude spectrum (i.e. a MLS) was used and filtered a posteriori into 1 / 3-octave bands. This simultaneous measurement of all frequency bands at once, as well as the similarity in the properties of different MLS of the same order, likely led to the smaller variance within frequency bands seen in the present study. Unlike measurements made in the anechoic environment (Figures 4.11 195 and 4.12), there is noticeable variation in the measurements of coherence across trials in the reverberant space. The largest standard deviations tend to occur near a center frequency of 500 Hz, where the standard deviation in coherence is :l:0.094. Coherences tended to be larger and more consistent for bands centered at 250 Hz and below than for those above 250 Hz. Standard deviations for coherences in these low frequency bands were significantly smaller than standard deviations in higher frequency bands (one-sided two-sample t-test, t = —5.16, df = 16, p < 0.001). The reasons for this will be further elaborated in the following sec- tion. The cause for very low coherences in certain bands is likely variance in the ITD spectrum within bands, much as it was for recordings made through KEMAR ears in anechoic conditions for incident sound source angles near :l:45°. To see this, consider the ITD spectrum for one particular trial in which the waveform coher- ence in the 1000-Hz band was found to be only 7 = 0.2453. This is plotted in Figure 4.35. In that band, the mean ITD across individual components within the band was found to be 12.7 ms, very close to the expected value of 0, and the stan- dard deviation was 251 ms. Plugging this standard deviation into Equation 4.31 gives a coherence of (7) = 0.2883, which is very close to the coherence found via the peak cross-correlation function. Thus, it seems that the variance in the ITD spectrum is responsible for poor coherence in rooms. 4.3.3 Reverberation Times Reverberation times measured in the reverberant room are shown in Figure 4.36. Average reverberation times in each 1/3-octave band are calculated for all mea- surements of RT60 in each band, including those from each ear of the KEMAR. Error bars span one standard deviation in each direction. Reverberation times tended to increase with increasing frequency until peaking at 1.25 kHz with a re- 196 1 -.....J ........ r ....... I ........ j ......... V. ........ l ........ 1 ........ 4 ........ I. ................ '. ..... .4 5 7=0.2453 . a=251us .. . '(u Ola/2 =0 3 0.5 ... .......... .z. ..... .... .............. ...e ..... "2.8.9.... v .. . .01 ..... .. O ' I g j— H " l ; 4 _0.5 l... ................. I ........ ............. ...................-.................: ............... .— -1 ~------. ........ =1 ........ . ........ g ......... . ......... g ........ g ........ . ........ i- ........ , ..... 700 800 900 1000 l 100 1200 1300 Frequency (Hz) Figure 4.35. The ITD spectrum of the 1/3-octave band centered at 1000 Hz, recorded through KEMAR in the highly reverberant environment. The interau- ral waveform coherence within this band is ’y = 0.2453. The standard deviation in ITD across the band is (7 = 251 ms, which gives a coherence of ('7) = 0.2883 when Equation 4.31 is employed. 197 3 k I I I' I I I I I I I II I I I I I I I I' d b q ; 1 _ .1 2.5 I. d I- .. p d :- .. 2 - .. I- a A ' d (D - a V _ q 3 L5:- . l— . . (I . . I- . 1 - h c I- q .. d 0.5 _ - in II p d l- : d 0 b l 11 1 I I I l 1 L11 1 l l l l l J I ‘ 100 1000 10000 Frequency (Hz) Figure 4.36. 60-dB reverberation times, averaged across all measurements from both ears of a KEMAR in 1 / 3-octave bands, made in a reverberant room. Error bars extend one standard deviation in each direction and indicate excellent agreement across different positions in the room and between the two ears. verberation time of 2.43 :t 0.042 s. For frequencies above 1.25 kHz, RT60 decreased monotonically. These reverberation times agree quite well with those reported for the same room by Hartman, et al. [51]. In general, measurements of RT60 were quite consistent across different posi- tions in the room, as evidenced by the small size of the error bars. Standard devi- ations were significantly larger for center frequencies near or below the Schroeder frequency of the room (one-sided two-sample t-test, t = 2.86, df = 7, p = 0.012). This tendency in the size of standard deviations is exactly opposite that for coher- ence measurements. Presumably, the prominence of individual standing waves in the room (thought to occur for frequencies below the Schroeder frequency of the room) leads to un- certainty in RT60 measurements across positions in the room, as different positions 198 in the room will occur at different points along those standing waves. At the same time, each ear of the KEMAR is equally affected by standing waves common to both ears. This leads to a larger coherence and a smaller variation in coherence across measurements at low frequencies. It should also be noted that the difference in RT60 between ears was not statisti- cally significant (one-way ANOVA, P (1,262) < 0.01, p = 0.951). This gives some assurance that the reverberation times heard by each ear were nearly the same in each 1/3-octave band for each measurement. Thus, the practice of combining measurements of RT60 from both ears seems to be well justified. 4.3.4 ITD Results The measured ITD, based on the temporal location of cross-correlation peaks cor- rected for period errors, is shown in Figure 4.37 in open circles. ITDs were mea- sured in each 1 / 3-octave band five times at each of six positions for 30 total mea- surements in each band. Each measurement was individually corrected for period errors. Averages were computed across all 30 measurements in each band and standard deviations were computed across averages at each position. Error bars span one standard deviation in each direction. The only average ITDs that did not fall within one standard deviation of 0 ms were those at 125 Hz and below, where the ITDs were less than 0 ms. This is sim- ilar to the pattern of ITDs at low frequencies found in an anechoic environment (Figure 4.23). This is an expected result given the negative ITDs seen in anechoic conditions for this range of frequencies (Figure 4.23). Period errors were understandably frequent in the reverberant room. At each position, six or seven period errors were noticed in various 1 / 3-octave bands. The lowest frequency band for which period errors were found was at 1 kHz. Curi- ously, no period errors were found at any of the six positions tested or in any trial 199 ITD (ms) _11111 1 1 1:11]; 1 1 011111 100 1000 10000 Frequency (Hz) Figure 4.37. ITDs measured between two ears of a KEMAR in a reverberant room. Data points indicate average measured values, with error bars spanning one stan- dard deviation in each direction. Averages are calculated across 30 measurements, made at six different positions in the reverberant room, and standard deviations are calculated across average values from each position. At each position, the KEMAR and sound source were separated by 3.05 m with the KEMAR facing the sound source. Small dots indicate the ITD of period errors that occurred in at least one trial. Dashed lines indicate the theoretical ITD for one, two and three period errors respectively as a function of frequency. 200 for the band with center frequency of 4 kHz, and there is also a local peak in the coherence at this center frequency (see Figure 4.34). Each occurrence of a period error is plotted in Figure 4.37 as a small dot. Dashed lines indicate a multiple of the theoretical period (in ms) as a function of center frequency, and thus trace curves along which period errors should occur. The observed period errors conform well to the dashed lines, indicating good agreement with prediction of where period errors should fall. These measurements are similar to those reported by Hartmann, et al. [51]. In that paper, a similar figure is presented for ITD measurements taken in a room of similar volume, but shorter RT60 than the reverberation room presented here in connection with Figure 4.37. Both in that paper and in the observations made in the present study, wide variations in ITD were observed in frequency bands up to about 1.25 kHz. Above that frequency, ITD cues become consistent across all trials, and are nearly zero (the expected ITD in any band when the incident sound is at 0°). Though lTDs measured in most bands are within one standard deviation of 0 ms, the variances seem to become suddenly quite small once a single period error comes within the physiological range for ITDs of :l:1 ms. The variance in ITD at low frequencies may lead to a difficulty for listeners in localizing low- frequency sounds in reverberant environments, since ITD cues are thought to be quite important in low-frequency localization. Frequency bands below 1 kHz are not subject to period errors because the cal- culation of the cross-correlation function that leads to a measurement of coherence and ITD is limited here to a maximum lag of 21:1 ms. Period errors at these frequen- cies would fall outside of this range, and thus the measured ITD and coherence in those bands occurs only at the peak in the cross-correlation function within the i1 ms range. It is interesting then to repeat the calculation of ITD using the same method and the same recorded signals, but with a maximum lag of i2 ms. In 201 ITD (ms) 100 1000 10000 Frequency (Hz) Figure 4.38. ITDs measured between two ears of a KEMAR in a reverberant room, presented in a manner identical to that of Figure 4.37. ITDs are calculated by find- ing the lag of the peak in a cross-correlation function between the signals recorded in the left and right ears, with a maximum lag of i2 ms. some cases, the largest peaks may occur outside the 21:1 ms range. If this is so, then period errors may be accounted for in the 500—800 Hz bands as well. The re- sult of performing this calculation with a maximum delay of :l:2 ms is shown in Figure 4.38. Period errors were found in both 630 and 800-Hz bands. The result of correcting these period errors is a lower variance in the ITD measurements in those bands. It may also be that the peaks measured in low-frequency bands in rooms may be large and outside the maximum lag of the cross-correlation function such that the peaks captured here are actually the period errors. In other words, ITDs greater than 1 ms may occur naturally in rooms. Certainly the human auditory system is capable of localizing sounds with ITDs well outside of the ”physiological limit” of i1 ms [79, 18], and such ITDs may occur in rooms where reflections occur. 202 _3 [Ii 1 J l llllli l l l lllll 1 00 1 000 10000 Frequency (Hz) Figure 4.39. ILDs measured between two ears of a KEMAR in a reverberant room. Data points indicate average measured values, with error bars spanning one stan- dard deviation in each direction. Averages are calculated across 30 measurements, made at five different positions in the reverberant room, and standard deviations are calculated across average values from each position. At each position, the KEMAR and sound source were separated by 3.05 m with the KEMAR facing the sound source. 4.3.5 ILD Results The 1 /3—octave ILDs measured in the reverberant room are shown in Figure 4.39. As with measurements of ITD in this room, averages are calculated across five measurements at each of six positions in the room, for a total of 30 measurements. Standard deviations were taken across averages found at each of six positions in the room. Error bars span one standard deviation in each direction. Certain broad features of the ILD across center frequencies are similar to those seen in Figure 4.25 for ILDs measured in anechoic conditions. In low frequen- cies through about 400 Hz, there is an average positive ILD of about 0.5 dB, likely corresponding to an imbalance in gain between the left and right channels at the 203 3 I I I' I I I I I III I r I I I I I I 7 g 0 P03. 1 d a 3 A Pos. 2 i 2 ...... If _____________ (1‘ ______________________ I_.l:l__l_-'fos. 3 A _____ " 5 3% : 6 Pos. 4 i“. o . 5 “m ’t i5”? 1R- 0 ‘ a I : . OS 0‘ . 1 a~¢~=~¥u e- ' . .............. -2 _3 l Ili 1 A l 1 L1 I Ii 1 l l l l l Iii fi 100 1000 10000 Frequency (Hz) Figure 4.40. ILDs measured in a reverberant room at six different positions. The data points for each position are labeled with different symbols. Error bars extend- ing one standard deviation in each direction are all smaller than the size of the data points, and so are not shown. output of the preamplifier. The dip in ILD at and above 8 kHz is similar to the anechoic condition where the KEMAR was oriented at 0° relative to the direction of incident sound, and so this effect is likely internal in nature. The variance in ILD is quite noticeable across different positions in the room, as shown in Figure 4.39. However, ILDs across all five trials (each trial with a different MLS of order 19) at each position were very consistent within each 1 /3- octave band. Figure 4.40 shows the average ILDs measured at each position in each band. Averages and standard deviations were calculated across all five trials in each band for a given position. Error bars extend one standard deviation in each direction, but in almost all cases are smaller than the size of the data points, and are thus invisible. There is very little variance across trials at each position, but there is relatively little agreement across positions. 204 Despite the agreement in ILD across trials at a single position, the variation in ILDs across different positions in the room makes the ILD a poor binaural cue for localization in this condition. Ihlefeld and Shinn-Cunningham note that the mean ILD, calculated over several short-time measurements at a fixed location in a reverberant environment, is unreliable for determining sound source laterality, though some information on source location can be gathered by examining how the short-term ILDs vary over time [57]. Even if a listener were trained to associate certain ILD patterns with specific sound source localizations while standing at one position in the room however, those patterns of ILD would significantly change when the listener moves to another position in the room, invalidating the listeners previous associations. This is not surprising, since the sound energy is expected to be diffuse (independent of sound source location) in a reverberant space once it is some minimal distance from the listener (in the room used in the current study, this distance was found to be about 3 m). However, it seems that significant ITD cues are still available (Figure 4.37) if period errors can be ignored. At very high frequencies, as the room becomes more anechoic, ILD cues again become useful to those who are able to hear them [50]. 4.4 Experiment 11: Measurements of Binaural Pa- rameters and Room Characteristics in a Normal Room Thus far, binaural parameters have been measured in both anechoic conditions (Experiment 9) and highly reverberant conditions (Experiment 10). It is then left to measure binaural parameters and room properties in a normal room, one with properties more like what is typically experienced by most people on a day-to-day 205 basis. It is expected that the effects of reverberation as seen in the highly reverber- ant environment will be seen to a far lesser extent in such a room. Similarly, a de- parture from the ideal conditions of the anechoic environment is expected to lead to binaural coherences less than those seen in that environment, though greater than those found in the highly reverberant environment. A simple theory of reverberation in rooms is that a listener hears a sound com- posed of a ”direct” and a ”reverberant” component. Direct sound comes to a lis- tener’s ears from the sound source via a straight line path between the two and diminishes in intensity as the inverse-square of the distance between the source and listener. This is a good description of listening in an anechoic environment, where the only sound that reaches the listener is that which comes directly from the sound source. The reverberant component of the sound heard by a listener in a room comes from the sound reflected off the various reflecting surfaces (walls, floor, etc.). Since there are many paths for reverberant sound to take when traveling from the source to the listener, the timing and level of reverberant sounds will be chaotic, and will be different in each ear. The reverberant component of the sound interferes with the direct sound and reduces the binaural coherence 7 of the overall sound. This is easily seen in the coherences measured in the reverberant environment (Fig- ure 4.34), where the coherence is quite low for most frequencies. A simple model of binaural coherence in rooms assumes that the direct sound contributes to coherence, while the reverberant sound, which is assumed to have random direction of incidence and random phase over time, subtracts from co- herence. The ”total” sound is simply the sum of the direct and the reverberant sound. Let x represent a signal, and subscripts of D and R represent the direct and reverberant components respectively. When necessary, subscripts r and I will be added to indicate the right and left ear respectively. The cross-correlation function 206 between signals arriving at the left and right ear may then be written as: z %fontmDU) +xz,R +xr,R 3 O .3 0.03 0.03 D L g 0.02 0.02 c U ..— o.01 0.01 "’ o 0 1/6 1/3 1/2 2/3 5/6 Bandwidth (fraction of an octave) Figure 4.46. Average ITDs (open circles) plus and minus one standard deviation, and sizes of the standard deviations (closed symbols) measured in room 103. ITDs are measured in a 800 Hz band with a bandwidth as indicated on the abscissa as a fraction of an octave. ITDs approach the expected value for sounds incident at 0° of 0 ms and the variances across measurements decrease as the bandwidth is increased. 218 such errors are usually isolated to a certain frequency band. Assuming the audi- tory system makes many measurements of ITD in 1 / 3-octave bands, period errors could easily be detected and corrected for in a room like 103. In cases where a period error appears sporadically within a band, the errant measurement of ITD is easily ruled out by comparing it to other ITD estimates within the same critical band (majority rules). In the case of a consistent period error within some band, the period error would be detectable as an ITD that is much different than the ITDs measured in surrounding critical bands. Such discrimination would become increasingly difficult at high frequencies, where the absolute difference in ITD be- tween the true ITD and a period error becomes small. This method of discrimina- tion depends on period errors occurring inconsistently within a critical band, or at least not occurring in many surrounding bands. Such behavior was observed in room 103 and could be expected in similar auditory environments. In a highly reverberant environment, like that measured in Experiment 10, period errors are so pervasive that this model for error correction of period errors might fail or be- come far less reliable. This would certainly lead to difficulty or confusion in sound localization as is expected in a highly reverberant environment. 4.4.5 ILD Results The ILDs measured for room 103 in 1/3-octave bands are shown in Figure 4.47. Methods of calculation were identical to those used to calculate ILDs in the rever- berant room (Experiment 10). Average ILD values were calculated across all 30 trials (five at each position), and error bars extend one standard deviation in each direction. The variance in ILD, like that seen in the reverberant room (Figure 4.39) is al- most entirely across different positions in the room, and is very small across trials measured at any particular position. The sizes of the variances are not significantly 219 I o _2411[ I I j IIIII] I I I IIIJI 100 1000 10000 Frequency (Hz) Figure 4.47. ILDs measured between two ears of a KEMAR in room 103. The pre- sentation of data is the same as that used in Figure 4.39. Data points indicate av- erage measured values, with error bars spanning one standard deviation in each direction. This represents the average of the points shown in Figure 4.48, with er- ror bars extending one standard deviation in each direction. The trend in ILDs is similar to that found in the reverberant environment. 220 I I IIIIII L : QTVTI'!OPos.l ' E -" 5 A Pos. 2 '_ ,,,,,, O ,,,,,,,,,,,,,, ED Pos. 3 2 .. f fl 5 E *0 Pos. 4 .. . EQPflPl SEA XPosS d 1 - a” ’91:“ ; 1;"; 316 Pos 6 - ,. rte: m. A . CD ' g i r 1 ‘ 9" ' ' 2" ' ' 3 P '. I ‘3 " ,‘. :\\'\ a 9‘: [r . ‘ o - m -------- --r. ...-m ,sw . - o b .i 11‘ I. \. I o C] a t s ‘a'. 29. 3 o . 4 -1 - ...... a. .......... i0 ...................... - C -.,':‘.° .' E a I —2 h- ..................... |'..;:?'...‘...'; .................................. : A ‘.: I . 6 . _3 1 I I 1 1 n J I I I II 100 1000 10000 Frequency (Hz) Figure 4.48. ILDs measured in room 103 at six different positions. This figure is akin to Figure 4.40 for ILDs measured in the reverberant room. The data points for each position are labeled with different symbols. Error bars smaller than the size of the data points are not shown. different in room 103 from what was found in the reverberant room (two-sided two-sample t-test, t = 0.68, df = 40, p = 0.499). The trend in ILDs across frequency is similar for both room 103 and the rever- berant room. Certain features, such as a dip in ILD at 8 and 10 kHz, a local peak at 3.15 kHz, and slightly positive ILDs in the lowest frequency bands, are common to both the measurements made in room 103 and those made in the reverberant room. In fact, the two sets of measurements have a modest Pearson product-moment cor- relation coefficient of r = 0.70. Also, as for the reverberant room, the pattern of ILDs was quite different across positions in the room, as shown in Figure 4.48. The correlation between ILDs measured in two very different environments — room 108 and the reverberation room — may indicate an effect that contributes to the overall pattern of ILDs only in the presence of significant reverberant en- 221 ergy. For instance, the dip in ILD at 8 kHz, which appeared at all positions in both rooms, seems to suggest a systematic internal effect, but such a dip does not ap- pear in the ILD at 8 kHz measured in the anechoic environment (Figure 4.25). The same can be said for certain other features of the pattern of ILDs such as the peak in ILD at the 3.15-kHz band. These features are absent in the ILDs measured in ane- choic conditions, while other broad features, such as the tendencies for ILDs to be slightly positive at low frequencies, are evident in both reverberant and anechoic conditions. 4.5 Summary and Conclusions In the measurement of several binaural properties in KEMAR as a function of the incident angle of the sound, several deviations from theoretical expectations were noticed. In Experiment 9, ITD results agreed rather well with the theoretical frame- work set forth by thevkin [78] and Kuhn [62] (Equations 4.2, 4.4, and 4.5). The general shape of the curves as a function of incident angle — sinusoidal at low frequencies, becoming increasingly triangular with increasing frequency — obeys these equations, and attempts to fit actual ITD data to these equations, yields head radii that decrease as a function of frequency from 8.74 cm at 2.5 kHz to 7.8 cm at 20 kHz. Results of coherence measurements as a function of incident angle in the ane- choic room were surprising. The measured coherence was found to drop for inci- dent angles at and around :l:45°. The extent of this effect was found to be greatest at low frequencies, and decreased until it was no longer noticeable at and above frequencies of about 3.15 Hz. This effect was found to be due to the pinna of the ear nearest the sound source. The pinna introduce both jitter and linear dispersion in the ITD spectrum, though it is predominantly the latter which leads to incoher- 222 ence. The interaural coherence due to a noise in the ITD spectrum with mean zero and variance (72 is given by Equation 4.26 to be: ”r = Exp [— £02302] This incoherence is also observed in measurements taken in the ear canals of human listeners, though the degree of incoherence is significantly less than in KEMAR. Human listeners seem unperturbed by their own inherent coherence properties. In lateralization experiments, listeners showed no difficulty in detect- ing the direction of shift (to the left or to the right) of sounds that mimicked the coherence characteristics measured in their own ears. When given similar task us- ing noise recorded in KEMAR, the task became quite difficult, but listeners were still able to make some use of ITD cues to detect the direction of the shift. This suggests a familiarity with one’s own coherence characteristics. Since the extent of the incoherence seems to depend on the incident angle of the sound source, it may also provide a cue for localization. In rooms, measurements made at several locations with a ”head-on” arrange- ment (0°) show good consistency across frequency bands and positions with re- spect to ITD estimation. This is accomplished by locating the lag of a peak in the cross-correlation function of the signals in the left and right ears. The cross- correlation function is limited in calculation to some maximum lag, rm. Selection of this maximum lag has consequences for the resulting estimation of the ITD be- cause of the existence of period errors. Period errors can only be observed in fre- quency bands where 1 /fc < rm, where fc is the center frequency of the band. In those bands, the ITD estimation can be corrected when period errors occur. In bands for which 1/fc > Tm, period errors can not be detected. This leads to a larger degree of variance in the resulting average ITD measurements within those bands. The question of how coherences come to be so low in certain frequency bands 223 in rooms seems to be answered by the variance in the ITD Spectra. Thus, incoher- ence found for certain incident angles of sound in anechoic conditions, which is due to the pinnae, is similar in nature to incoherence measured in rooms for sound sources straight ahead of the listener. The expected coherence in both situations obeys Equation 4.31. Linear dispersion in the ITD has a negligible effect on the coherence. The amount of variance in the ITD spectrum required to cause a no- ticeable drop in coherence is greater at lower frequencies (since the coherence is a function of woo) and this may be in part the cause for the very high coherences observed in the lowest frequency bands even in reverberant conditions. 224 APPENDIX A Test for Interaural Differences The methods used in the first chapter of this dissertation had the goal of elim- inating interaural differences by confining the sources to the front-back dimen- sion. However, interaural differences can not be completely eliminated, no matter how accurately the experimental system is aligned. Nevertheless, it is thought that the small interaural differences that exist in a geometry such as is used in the experiments on informational masking presented here have no perceptual impor- tance [70]. To test for the perceptual effect of interaural differences, which might arise be- cause the listener is inadvertently misaligned or because of individual anatomical asymmetry, a test experiment was run in which misalignments were deliberately introduced such that the ears were not equidistant from each loudspeaker. The test effectively used a dramatic misalignment, equivalent to a 5° rotation of the listener. Accordingly, one ear was effectively 65 us closer to the front loud- speaker and 65 us farther away from the back loudspeaker, for a total difference in arrival time of 130 us. The same difference, but with opposite sign, occurred in the other ear. ~ Seven listeners, including B and N from the previous experiments, were pre- 225 sented with CRM stimuli of the same type used in Experiment 1 (target plus two speech distracters) through Sennheiser HD 414 headphones. The task was identical to that of previous experiments. The distracters were passed through delay-and-add filters before they were combined with the target speech. Two ex- perimental scenarios were tested — one in which the delay between the maskers was 870 us in the left ear and 1130 ys in the right, the ”different delays” scenario; and a second condition in which the delay between the maskers was the same in both ears 1000 ys, the ”same delays” scenario. A reference delay of 1000 ys was chosen because that delay is a typical value beyond the range at which release from energetic masking occurs (Experiment 2). For 1000 ms there is substantial release (but not the greatest release) from infor- mational masking. These delay scenarios were tested at SNRs of —4 dB, 0 dB, and +4 dB, for a total of 2 x 3 = 6 conditions. Differences in performance between the scenarios were examined within and across listeners for each SNR. The most relevant SNR is 0 dB, at which Experi- ment 1 was performed. At 0 dB, the average listener performed marginally better with different delays than with same delays. The mean improvement in percent correct, plus and minus one standard deviation, was 7 :l: 16%. This was not a sig- nificant improvement (one-sided Wilcoxon Signed Rank test, N = 5, W+ = 3.0, p = 0.140). There were also differences in performance across listeners — three listeners performed slightly better with different delays, two performed slightly better with same delays, and two showed no difference in performance. For lower SNR, there was a greater improvement in performance under the different-delays condition, but the difference in performance did not reach the 0.05 level of signif- icance (N = 5, W+ = 1.5, p = 0.069). For higher SNR, the differences in per- formance between the same-delay and different-delays conditions were entirely negligible (N = 6, W+ = 11.0, p = 0.583). Note here that a worst-case scenario 226 (a 5° rotation of the listener) has been assumed. In reality, the error in alignment was almost certainly smaller. Thus, ruling out differences in the delay-and-add filtering between the two ears as a possible source of unmasking seems justified. 227 APPENDIX B ' Separated-Source Presentation with Speech Distracters The positive delay conditions of Experiment 1, particularly the conditions for which '1' > 1 ms, can be expected to elicit a precedence effect shift in the perceived location of the distracters. The precedence effect should shift the perceived location of the distracters from front to back. The masking release caused by this perceptual shift is hypothesized to be comparable to the release that would be obtained if there were an actual physical shift in the location of the distracters. Several experiments have been performed which have measured the release from both energetic mask- ing and informational masking when the distracters are moved from the front, where they were collocated with the speech target, to the back [40, 56, 75, 99]. To test this hypothesis using the geometry of the previous experiments and the CRM stimuli, the experiment of Freyman et al. [40] was repeated. Three conditions were tested: (1) The Front-Only baseline, as in Experiment 1, with SNR of 0 dB. (2) The Front-Only baseline, as in Experiment 1, with SNR of 4 dB. (3) A separated- source test with the target presented from the front loudspeaker and the speech distracters presented from the back loudspeaker at a SNR of 0 dB. The first two 228 conditions were the F0 and FO+4 dB conditions. The third condition was new. Four listeners were tested, including listeners B and N from Experiment 1 and two inexperienced listeners, A and E. Listeners sat quietly and faced the front loudspeaker without moving [their heads throughout each run. Listeners were positioned such that their ears were half way between the front and back loud- speakers. Each run consisted of five practice trials followed by 30 trials for which data were collected and scored as in Experiment 1. Each listener went through five runs of each condition. The results of the experiment are presented in Table B.1. Performance in the separated-source condition was slightly worse than that in the FO+4 dB condition, indicating a release from masking somewhat less than 4 dB as a result of spatially separating the target and distracters. In detail, the average percent correct perfor- mance across listeners in the separated condition was 75 :l: 4.8%, an increase over the FO baseline performance of 38 d: 7.4%. This increase in performance in the F-B condition corresponds to an average release of 3.5 dB. This is comparable in mag- nitude to the release seen in Experiment 1 between the FO baseline performance and the best performance at any other delay for the average listener (this occurred at r = 2 ms, where the average release was 3.5 dB). The release from masking seen in this separated-source experiment is larger than that seen at almost all the delays in Experiment 1. However, the comparison is not quite fair because the observed release in Experiment 1 is reduced by the fact that there is twice as much masker power in Experiment 1. In the end, the hypothesis that the effect of a perceptual shift should be compa- rable to the effect of an actual physical shift in the location of the distracters is sup- ported by the comparison of the separated-source experiment and Experiment 1. However, the symmetry of the results in Experiment 1, whereby performance was similar for both positive and negative delays, proves that the release from masking 229 (a) (b) (C) Listener FO FO+4 dB F—B A 26 :l: 8.3% 71 i 5.0% 75 :l: 8.4% B 34 :l: 8.6% 82 i 4.5% 66 :l: 7.8% E 22 :l: 9.0% 75 4: 3.8% 58 i 6.9% N 38 d: 9.0% 73 i 4.7% 75 :l: 8.3% Avg 30 i 7.4% 75 i 4.8% 68 a; 7.4% Table B.1. Separated-source Experiment: Percentage of correct responses (plus or minus one standard deviation) for each listener by condition. (a) Front-Only base- line condition. (b) Front-Only condition with +4 dB target level relative to a single distracter. (c) Target in front and distracters in back. Averages and standard de- viations for each listener are calculated across five runs. Averages and standard deviations across listeners are shown in the bottom row. caused by the perceived relocation or delocation of the distracters involves more than just the localization precedence effect. 230 APPENDIX C Wiener filtering One common method of reducing noise from a signal is by Wiener filtering. As opposed to traditional noise filters which attempt to remove specific parts of the spectrum of a noisy signal attributed to the noise alone, the Wiener filter takes a statistical approach to filtering, and requires some knowledge of the spectral properties of the signal and additive noise. Both processes are assumed to be linear and stationary. Though a causal Wiener filter must be physically realizable, in the following experiments, Wiener filtering will only be performed a posteriori, and so a non-causal filter is utilized. The Wiener filter uses a minimum mean square error criterion filter design. Excellent introductions to the Wiener filter can be found in several texts [85, 89]. For a signal, x (t), corrupted by additive stationary white noise, n (t), let y (t) be the observed signal such that y (t) = x (t) + n (t). The output of a filter w (t) on that signal and noise combination is: 9? (t) = WU) <8 [x (t) + n (0] The symbol (8 represents the convolution operation. The estimator 52 (t) will be related to y (t) by a linear combination of a range of known values of y (t) between 231 times to and t f via the filter w by: 9“ t)=/tfw(t—e)y(e)de to The orthogonality principle states that for linear mean-square error estimation, the error in the estimate, 56, must be orthogonal to the observations, 3]. Noting that 5: (t) = x (t) — :2 (t), we then have as an expectation in time: t E{[x(t)—/fw(t—e)y(e)de]y(r)}=0 togrgtf to Expanding the argument of the expectation operation gives: t E{x(t)y(r)—/tfw(t—e)y(e)y(r)de}=0 togrgtf 0 This can be written in terms of cross correlation functions. Let RM (0: — ,B) = E [p (a) q (8)] be the cross correlation between two jointly wide-sense stationary functions, p and q. Then we can write the above equation as: t ny(t—r)—/fw(t—8)Ry(e—T)de=0 togrgtf (C1) to This is often referred to as the Wiener-Hopf integral equation [89]. Let W (s) be the Laplace transform of w (t), and (bx y (s) be the Laplace trans- form of R x y (t). If the period of observation is very long, then the observations y and the desired estimate 2 are jointly stationary as long as the filter is time- invariant. The solution to the Wiener-Hopf equation for a noncausal filter in Laplace space is then: (ny(s) W s = ( ) (pm) In the case where the noise is uncorrelated with the input signal x, (1)” becomes (C2) simply (Dxx and (by becomes (1)” + (Dun. Thus, Equation C.2 can be reduced to one involving only autocorrelations. If the noise is assumed to be white then its autocorrelation function is simply a delta function (1 in Laplace space). Now, something must be known about the autocorrelation of the input signal, x. This 232 is far less stringent a requirement than having to know the entire signal, however, and an approximate autocorrelation will still afford the Wiener filter quite a lot of filtering ability. The mean square error of the estimation, after utilizing the orthogonality prin- ciple, is: E [22] = E{[x — 44000)} Again, in terms of cross correlation functions, this error becomes: E [:22] = Rx (0) —/ttfw(t-e) Ryx(t—£) de (C3) 0 In the system in which it was utilized in these experiments, the Wiener filter was used to reduce the noise in the IR. Equation C.2 shows that the Wiener filter requires a priori knowledge of the autocorrelations of both the noise and the signal. The corrupting noise is usually assumed to be white, and its autocorrelation is equal to its variance. The noise can be effectively isolated by examining the tail of the measured IR, where the IR is expected to be zero. Any deviation from zero in the IR at very high times, such as the last quarter of the IR, is then assumed to be due entirely to the noise. The Wiener filter was given the power of the signal in the tail of the IR measured by the RTN and then operated on the entire IR to arrive at a noise-reduced IR, hw. 233 APPENDIX D Guide to Acronyms The following acronyms and symbols are commonly used in this dissertation, and so a friendly guide is provided for reference. 7: interaural coherence 7e: interaural envelope coherence ADD: added, delayed distracter ASW: auditory source width CC: cross-correlation d f : degrees of freedom FFT: fast Fourier transform FIR: finite impulse response FO: front-only HP: horizontal plane HRTF: head-related transfer function IIDC: integrated impulse decay curve ILD: interaural level difference IPD: interaural phase difference 234 IR: impulse response ISO: International Organization for Standardization ITD: interaural time difference IND: just-noticeable difference. IVC: Victor Company of Japan KEMAR: Knowles Electronics Manikin for Acoustic Research LFSR: linear feedback shift register LTI: linear, time-invariant MLS: maximum length sequence MSP: median sagittal plane PDF: probability density function RT60: 60-dB reverberation time RTN: random telegraph noise SNR: signal-to-noise ratio SPL: sound pressure level TF: transfer function 235 BIBLIOGRAPHY [1] A. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, ninth Dover printing, tenth GPO printing edition, 1964. [2] V. R. Algazi, C. Avendano, and R. O. Duda. Estimation of a spherical-head model from anthropometry. I. Audio Eng. Soc., 49, 2001. [3] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano. The CIPIC HRTF database. IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 2001. [4] Y. Ando. Architectural Acoustics - Blending Sound Sources, Sound Fields, and Listeners. Springer-Verlag, New York, 1998. [5] C. Avendano, V. R. Algazi, and R. O. Duda. A head-and-torso model for low-frequency binaural elevation effects. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1999. [6] R. L. Balakrishnan, U. andd Freyman. Speech detection in spatial and non- spatial speech maskers. I. Acoust. Soc. Am., 123:2680—2691, 2008. [7] D. W. Batteau. The role of the pinna in human localization. Proc. Royal Soc. Iondon. Series B, Biological Sciences, 168:158—180, 1967. [8] S. Bedard, B. Champagne, and A. Stephenne. Effects of room reverberation on time-delay estimation performance. IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr. 1994. [9] L. L. Beranek. Acoustics. American Institute of Physics, New York, 1986. [10] L. L. Beranek and H. P. Sleeper Jr. The design and construction of anechoic sound chambers. I. Acoust. Soc. Am., 18:140—150, 1946. [11] L. Bernstein. Personal communication, Sept. 2007. 236 [12] L. R. Bernstein and C. Trahiotis. Why do transposed stimuli enhance bi- naural processing?: Interaural envelope correlation vs envelope normalized fourth moment. [13] L. R. Bernstein and C. Trahiotis. The effects of randomizing values of inter- aural disparities on binaural detection and on discrimination of interaural correlation. I. Acoust. Soc. Am., 102:1113—1120, 1997. [14] L. R. Bernstein, S. van de Par, and C. Trahiotis. The normalized interaural correlation: Accounting for NoS7r thresholds obtained with Gaussian and ”low-noise” masking noise. I. Acoust. Soc. Am., 106:870—976, 1999. [15] I. Blauert. Sound localization in the median plane. Acustica, 22:205-213, 1969/70. [16] I. Blauert. Spatial Hearing. The MIT Press, Combridge, Massachusets, 1983. [17] I. Blauert and W. Lindemann. Spatial mapping of intracranial auditory events for various degrees of interaural coherence. I. Acoust. Soc. Am., 79:806-813, 1986. [18] H. C. Blodgett, W. A. Wilbanks, and L. A. Ieffress. Effect of large interaural time differences upon the judgment of sidedness. I. Acoust. Soc. Am., 28:639— 643, 1956. [19] R. S. Bolia, W. T. Nelson, M. A Ericson, and B. D. Simpson. A speech corpus for multitalker communications research. I. Acoust. Soc. Am., 107:1065—1066, 2000. [20] I. Borish and]. B. Angel]. An efficient algorithm for measuring the impulse response using pseudorandom noise. I. Audio Eng. Soc., 31:478—487, 1983. [21] A. W. Bronkhorst. The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acta Acust., 86:117—128, 2000. [22] D. Brungart, B. Simpson, T. Darwin, C. Arbogast, and G. I. Kidd. Across- ear interference from parametrically degraded synthetic speech signals in a dichotic cocktail-party listening task. I. Acoust. Soc. Am., 117:292—304, 2005. [23] D. Brungart, B. Simpson, and R. L. Freyman. Precedence-based speech segre- gation in a virtual auditory environment. I. Acoust. Soc. Am., 118:3241-3251, 2005. [24] M. D. Burkhard and R. M. Sachs. Anthropometric manikin for acoustic re- search. I. Acoust. Soc. Am., 58:214—222, 1975. 237 [25] C. I. Cheng and G. H. Wakefield. Spatial frequency response surfaces (SFRS’s): An alternative visualization and interpolation technique for head- related transfer functions (HRTF’s). IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999. [26] E. C. Cherry. Some experiments on the recognition of speech with one and two ears. I. Acoust. Soc. Am., 25:975—979, 1953. [27] I. Chun, B. Rafaely, and P. Joseph. Experimental investigation of spatial cor- relation in broadband reverberant sound fields. I. Acous. Soc. Am., 113:1995— 1998, 2003. [28] M. Cohn and A. Lempel. On fast M-sequence transforms. IEEE Trans. Info. Theory, 23:135—137, 1977. [29] Z. A. Constan. All Things Coherence. PhD thesis, Michigan State University, 2002. [30] R. K. Cook, R. V. Waterhouse, R. D. Berendt, S. Edelman, and M. C. Ir. Thompson. Measurement of correlation coefficients in reverberant sound fields. I. Acoust. Soc. Am., 27:1072-1077, 1955. [31] I. F. Culling, H. S. Colbum, and M. Spurchise. Interaural correlation sensi- tivity. I. Acoust. Soc. Am., 110:1020—1029, 2001. [32] R. O. Duda and W. L. Martens. Range dependence on the response of a spherical head model. I. Acoust. Soc. Am., 104:3048-3058, 1998. [33] C. Dunn and M. Hawksford. Distortion immunity of mls-derived impulse response measurements. I. Audio Eng. Soc., 41:314—335, 1993. [34] N. I. Durlach, K. I. Gabriel, H. S. Colbum, and C. Trahiotis. Interaural corre- lation discrimination: II. Relation to binaural unmasking. I. Acoust. Soc. Am., 79:1548—1557, 1986. [35] N. I. Durlach, C. R. Mason, G. I. Kidd, T. L. Arbogast, H. S. Colbum, and B. G. Shinn-Cunningham. Note on informational masking (l). I. Acoust. Soc. Am., 113:2984—2987, 2003. [36] A. Farina. Simultaneous measurement of impulse response and distortion with a swept-sine technique. 108th Convention of the Audio Engineering Society, 2000. [37] W. E. Feddersen, T. T. Sandel, D. C. Teas, and L. A. Ieffress. Localization of high-frequency tones. I. Acoust. Soc. Am., 29:988—991, 1957. 238 [38] R. L. Freyman, U. Balakrishnan, and K. S. Helfer. Spatial release from infor- mational masking in speech recognition. I. Acoust. Soc. Am., 109:2112—2122, 2001. [39] R. L. Freyman, U. Balakrishnan, and K. S. Helfer. Effect of number of mask- ing talkers and auditory priming on informational masking in speech recog- nition. I. Acoust. Soc. Am., 115:2246—2256, 2004. [40] R. L. Freyman, K. S. Helfer, and U. Balakrishnan. Spatial and spectral factors in release from informational masking in speech recognition. Acta Acustica, 91:537-545, 2005. [41] R. L. Freyman, K. S. Helfer, D. D. McCall, and R. K. Clifton. The role of perceived spatial separation in the unmasking of speech. I. Acoust. Soc. Am, 106:3578-3588, 1999. [42] K. I. Gabriel and H. S. Colbum. Interaural correlation discrimination 1: Band- width and level dependence. I. Acoust. Soc. Am., 69:1394—1401, 1981. [43] B. R. Glasberg and B. C. J. Moore. Derivation of auditory filter shapes from notched-noise data. Hear. Res., 47:102—138, 1990. [44] S. W. Golomb. Shift register sequences. Holden-Day, San Francisco, 1967. [45] D. W. Grantham and F. L. Wightrnan. Detectability of a pulsed tone in the presence of a masker with time-varying interaural correlation. I. Acoust. Soc. Am., 65:1509—1517, 1979. [46] H. Haas. On the influence of a single echo on the intelligibility of speech. Acustica, 1:49-58, 1951. [47] W. M. Hartmann. Signals, Sound, and Sensation. Springer-Verlag, New York, 1998. [48] W. M. Hartmann. How we localizae sound. Physics Today, pages 24—29, Nov. 1999. [49] W. M. Hartmann. The cross-correlation function, Feb. 2007. [50] W. M. Hartmann and Rakerd. B. Localization of sound in reverberant spaces. I. Acoust. Soc. Am., 105:1149, 1999. [51] W. M. Hartmann, B. Rakerd, and A. Koller. Binaural coherence in rooms. Acta Acust., 91:451-462, 2005. [52] I. Hebrank and D. Wright. Spectral cues used in the localization of sound sources on the median plane. I. Acoust. Soc. Am., 56:1829—1834, 1974. 239 [53] E. Hecht. Optics. Addison Wesley, Reading, Massachusets, 4th edition, 2001. [54] G. B. Henning. Detectability of interaural delay in high-frequency complex waveforms. I. Acoust. Soc. Am., 55:84—90, 1974. [55] T. Hidaka, L. L. Beranek, and T. Okano. Interaural cross-correlation, lateral fraction, and low- and high-frequency sound levels as measures of acoustical quality in concert halls. I. Acoust. Soc. Am., 98988—1007, 1995. [56] I. I. Hirsh. Relation between localization and intelligibility. I. Acoust. Soc. Am., 22:196—200, 1950. [57] A. Ihlefeld and B. G. Shinn-Cunningham. Effect of source location and lis- tener location on ild cues in a reverberant room. I. Acoust. Soc. Am., 115:2598, 2004. [58] F. Iacobsen and T. Roisin. The coherence of reverberant sound fields. I. Acoust. Soc. Am., 108:204—210, 2000. [59] L. A. Ieffress, H. C. Blodgett, and B. H. Deatherage. Effect of interaural corre- lation on the precision of centering a noise. I. Acoust. Soc. Am., 34:1122—1123, 1962. [60] C. H. Keller and T. T. Takahashi. Binaural cross-correlation predicts the re- sponses of neurons in the owl’s auditory space map under conditions simu- lating summing localization. I. Neurosci, 16:4300—4309, 1996. [61] G. I. Kidd, C. R. Mason, P. S. Deliwala, W. S. Woods, and H. S. Colburn. Reducing informational masking by sound segregation. I. Acoust. Soc. Am., 95:3475—3480, 1994. [62] G. F. Kuhn. Model for the interaural time differences in the azimuthal plane. I. Acoust. Soc. Am., 62:157—167, 1977. [63] M. Kuster. Spatial correlation and coherence in reverberant acoustic fields: Extension to microphones with arbitrary first-order directivity. I. Acoust. Soc. Am., 123:154—162, 2008. [64] M. R. Leek, M.E. Brown, and M. F. Dorman. Informational masking and auditory attention. Percept. Psychophys., 50:205—214, 1991. [65] I. M. Lindevald and A. H. Benade. Two-ear correlation in the statistical sound fields of rooms. I. Acoust. Soc. Am., 80:661—664, 1986. [66] R. Y. Litovsky, H. S. Colburn, W. A. Yost, and S. I Guzman. The precedence effect. I. Acoust. Soc. Am., 106:1633—1654, 1999. 240 [67] R. Y. Litovsky, B. Rakerd, T. C. T. Yin, and W. M. Hartmann. Psychophysical and physiological evidence for a precedence effect in the median sagittal plane. I. Neurophysiol, 77:2223—2226, 1997. [68] R. W. Marsh. Table of irreducible polynomials over GF(2) through degree 19. Office of Technical Services, US. Dept. of Commerce, Washington, DC, 1957. [69] D. McFadden and E. G. Pasanen. Binaural detection at high frequencies with time-delayed waveforms. I. Acoust. Soc. Am., 63:1120—1131, 1978. [70] I. C. Middlebrooks and D. M. Green. Sound localization by human listeners. Ann. Rev. Psych, 42:135—159, 1991. [71] B. C. J. Moore and B. R. Glasberg. Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. I. Acoust. Soc. Am., 74:750—753, 1983. [72] N. Moray. Attention in dichotic listening: Affective cues and the influence of instructions. Quart. I. Exp. Psych, 11:56—60, 1959. [73] M. Morimoto, H. Fujirnori, , and Z. Maekawa. Discrimination between au- ditory source width and envelopment. I. Acoust. Soc. Iapan, 46:449—457, 1990. [74] T. Okano, L. L. Beranek, and T. Hidaka. Relations among interaural cross- correlation coefficient (iacce), lateral fraction (lfe), and apparent source width (asw) in concert halls. I. Acoust. Soc. Am., 104:255—265, 1998. [75] R. Plomp. Binaural and monaural speech intelligibility of connected dis- course in reverberation as a function of azimuth of a single competing sound source (speech or noise). Acustica, 34:200—211, 1976. [76] B. Rakerd, N. L. Aaronson, and W. M. Hartmann. Release from speech-on- speech masking by repeated presentation of the masker. I. Acoust. Soc. Am., 119:1597—1605, 2006. [77] S. K. Roffler and R. A. Butler. Factors that influence the localization of sound in the vertical plane. I. Acoust. Soc. Am., 48:1255—1259, 1968. [78] S. N. thevkin. The Theory of Sound. Pergamon Press, Oxford, 1963. [79] K. Saberi, Y. Takahashi, R. Egnor, H. Farahbod, I. Mazer, and M. Konishi. De- tection of large interaural delays and its implication for models of binaural interaction. I. Assoc. Res. Otolaryng., Onlinez80—88, 2001. [80] D. V. Sarwate and M. B. Pursley. Crosscorrelation properties of pseudoran- dom and related sequences. Proc. IEEE, 68:593-619, 1980. 241 [81] M. R. Schroeder. New method of measuring reverberation time. I. Acoust. Soc. Am., 37:409—412, 1965. [82] M. R. Schroeder. The ”Schroeder frequency” revisited. I. Acoust. Soc. Am, 99:3240—3241, 1996. [83] M. R. Schroeder. Integrated-impulse method measuring sound decay with- out using impulses. I. Acoust. Soc. Am., 66:497—500, 1997. [84] T. M. Shackleton, R. H. Arnott, and A. R. Palmer. Sensitivity to interaural correlation of single neurons in the inferior colliculus of guinea pigs. I. Assoc. Res. Otolaryngology, 6:244—259, 2005. [85] M. D. Srinath, P. K. Rajasekaran, and R. Viswanathan. Introduction to statisti- cal signal processing with application. Prentice-Hall, Inc., New Jersey, 1996. [86] C. Trahiotis, L. R. Bernstein, R. M. Stern, and T. N. Buell. Interaural correla- tion as the basis of a working model of binaural processing: An introduction. In A. N. Popper and R. R. Fay, editors, Sound Source Localization, pages 238— 271. Springer, New York, 2005. [87] S. van de Par and A. Kohlrausch. Analytical expressions for the envelope correlation of narrow-band stimuli used in CMR and BMLD research. I. Acoust. Soc. Am., 103:3605—3620, 1998. [88] S. van de Par, C. Trahiotis, and L. R. Bernstein. A consideration of the nor- malization that is typically included in correlation-based models of binaural detection. I. Acoust. Soc. Am., 109:830—833, 2001. [89] H. L. van Trees. Detection, estimation, and modulation theory. John Wiley 8: Sons, Inc., New York, 2001. Part I. [90] M. Vorliinder and H. Bietz. Comparison of methods for measuring reverber- ation time. Acustica, 80:205—215, 1994. [91] H. Wallach, E. B. Newman, and M. R. Rosenzweig. The precedence effect in sound localization. Am. I. Psychol., 57:315—336, 1949. [92] C. S. Watson, W. J. Kelly, and H. W. Wroton. Factors in the discrimination of tonal patterns: 11. selective attention and learning under various levels of stimulus uncertainty. I. Acoust. Soc. Am., 60:1176—1185, 1976. [93] F. L. Wightrnan and D. J. Kistler. Resolution of front-back ambiguity in spa- tial hearing by listener and source movement. I. Acoust. Soc. Am., 105:2841— 2853, 1999. 242 [94] R. S. Woodworth and H. Scholsberg. Experimental Psychology. Holt, Rinehart, and Winston, New York, 3rd edition, 1962. [95] N. Xiang, I. N. Daigle, and M. Kleiner. Simultaneous acoustical channel measurement via maximal—length-related sequences. I. Acoust. Soc. Am., 117:1889-1894, 2005. [96] N. Xiang and M. R. Schroeder. Reciprocal maximum-length sequence pairs for acoustical dual source measurements. I. Acoust. Soc. Am., 113:2754—2761, 2003. [97] W. A. Yost. The cocktail party problem: Forty years later. In R. H. Gilkey and T. R. Anderson, editors, Binaural and Spatial Hearing in Real and Virtual Environments, pages 329—347. Erlbaum, Hillsdale, NJ, 1997. [98] W. A. Yost and E. R. Hafter. Lateralization. In W. A. Yost and G. Gourevitch, editors, Directional Hearing, page 62. Springer, New York, 1987. [99] P. M. Zurek. Binaural advantages and directional effects in speech intelligi- bility. In G. A. Studebaker and I. Hockberg, editors, Acoustical Factors Aflect- ing Hearing Aid Performance. Allyn and Bacon, Needham Heights, MA, 2nd edition, 1993. [100] I. Zwislocki and R. S. Feldman. Just noticeable differences in dichotic phase. I. Acoust. Soc. Am., 28:860—864, 1956. 243 IIIIIIIIIIIIIIIIIIIIIIIIIIII