. .w}...»...: 43, :1 .3110 it. . . . . V . A: F . . . . . . . . V . . . . h). V159“... ,, A . V . V 39 t? Lfifilh... .. . . . . . V . . . 13H. . 55 “5d,. 5%.. . V . . . . , 1....«iz..w‘.;. Mum“? V . . V V V , , V . . . , V . V V .MWNHI.wflV\~fn..OJ . V V . Van-‘11:. .4»...th Em ; a! \Ilu..llt. 2.0. k. ($15”, h . O. . kg... my. .. , . . f u .. . . .r, . $31.... :nmfif :rtznriulrfiwwr #3 ‘Pr... 5 .5 a“ ‘3..th ‘15:... .35 023‘ I . I .V atsflalm iMV' ‘51 iv t. 3% 3 Va. . I\ 1......“ .2 EH“: .1. on {IELVIHIGR HP ’- it. i txadkh. I 1.th . > >15 .huasfirluumu mm.- a! ti.“ 3 : . u 7:. tn. bvf‘ ‘ 4.x... ’3 W11 x: V: 32...; :6... .1. 136.12.? {ma-h . [WIQIearx‘Iflfln 39“! Q? 7: s t. :I'l.) 5. .#<..Vllr.q( .11.”..W£u1.uu.—o.::...t.\x., . . _ .. 1- .3. . .. 1.. 1K- 1k .nu .2...” urn... 1.... .‘1 meet:- 3 on} This is to certify that the dissertation entitled DETECTION AND LOCALIZATION OF SOUNDS: VIRTUAL TONES AND VIRTUAL REALITY presented by PETER XINYA ZHANG has been accepted towards fulfillment of the requirements for the PhD. degree in Physics and Astronomy (mt/14 4...... ( Major Professor’s Signature Date MSU is an Afiinnative Action/Equal Opportunity Institution 7———_-v———r LIBRARY MIChIgan State W University ..-.-.-‘-,-.-._V_.-.-.-.-t-,-.-.-._ «.-.-.-.-.-.-.-.-.-,-—-.--.-.-.-.-.-.-...._ PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 2/05 p:lClRC/DateDue.indd-p.1 DETECTION AND LOCALIZATION OF SOUNDS: VIRTUAL TON ES AND VIRTUAL REALITY By Peter Xinya Zhang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Physics and Astronomy 2006 ABSTRACT DETECTION AND LOCALIZATION OF SOUNDS: VIRTUAL TON ES AND VIRTUAL REALITY By Peter Xinya Zhang Modern physiologically based binaural models employ internal delay lines in the pathways from left and right peripheries to central processing nuclei. Various models apply the delay lines differently, and give different predictions for the detection of dichotic pitches, wherein listeners hear a virtual tone in the noise background. Two dichotic pitch stimuli (Huggins pitch and binaural coherence edge pitch) with low boundary frequencies were used to test the predictions by two different models. The results from five experiments show that the relative dichotic pitch strengths support the equalization-cancellation model and disfavor the central activity pattern (CAP) model. The CAP model makes predictions for the lateralization of Huggins pitch based on interaural time differences (ITD). By measuring human lateralization for Huggins pitches with two different types of phase boundaries (linear-phase and stepped phase), and by comparing with lateralization of sine-tones, it was shown that the lateraliza- tion of Huggins pitch stimuli is similar to that of the corresponding sine-tones, and the lateralizations of Huggins pitch stimuli with the two different boundaries were even more similar to one another. The results agreed roughly with the CAP model pre- dictions. Agreement was significantly improved by incorporating individualized scale factors and offsets into the model, and was further improved with a model including compression at large ITDs. Furthermore, ambiguous stimuli, with an interaural phase difference of 180 degrees, were consistently lateralized on the left or right based on individual asymmetries — which introduces the concept of “earedness”. Interaural phase difference (IPD) and interaural time difference (ITD) are two different forms of temporal cues. With varying frequency, an auditory system based on IPD or ITD gives different quantitative predictions on lateralization. A lateralization experiment with sine tones tested whether human auditory system is an IPD-meter or an ITD-meter. Listeners estimated the lateral positions of 50 sine tones with IPDs ranging from —150° to +150° and with different frequencies, all in the range Where signal fine structure supports lateralization. The estimates indicated that listeners lateralize sine tones on the basis of ITD and not IPD. In order to distinguish between sound sources in front and in back, listeners use spectral cues caused by the diffraction by pinna, head, neck and torso. To study this effect, the VRX technique was developed based on transaural technology. The technique was successful in presenting desired spectra into listeners’ ears with high accuracy up to 16 kHz. When presented with real source and simulated virtual signal, listeners in an anechoic room could not distinguish between them. Eleven experiments on discrimination between front and back sources were carried out in an anechoic room. The results show several findings. First, the results support a multiple band comparison model, and disfavor a necessary band(s) model. Second, it was found that preserving the spectral dips was more important than preserving the spectral peaks for successful front/ back discrimination. Moreover, it was con- firmed that neither monaural cues nor interaural spectral level difference cues were adequate for front / back discrimination. Furthermore, listeners’ performance did not deteriorate when presented with sharpened spectra. Finally, when presented with an interaural delay less than 200 its, listeners could succeed to discriminate front from back, although the image was pulled to the side, which suggests that the localizations in azimuthal plane and in sagittal plane are independent within certain limits. Copyright by Peter Xinya Zhang 2006 This work is dedicated to my loving family and teachers. ACKNOWLEDGMENTS With my strong interests in music and physics, working in psychoacoustics at Michigan State University has been a great joy. I would like to thank Dr. William Hartmann, who has introduced me into this field, and guided me through my graduate study. Besides his valuable academic advice, I was strongly impressed by the fact that he always gives his students opportunities to present at various conferences and helps them get recognized. He has been very patient with me as an international student, and has helped me with my English and presentation skills, which I found very essential for my future career. He and his wife, Christine, have also supported me in other various ways, including taking me to Boston to a year. I thank Dr. Brad Rakerd for letting me use his lab and anechoic chamber. I have learned from our useful discussions. I thank Dr. Steve Colburn, Dr. Barbara Shinn-Cunningham and Dr. Nat Durlach for the guidance and discussions, especially during the year that I stayed at the Hearing Research Center, Boston University. I benefit greatly from the discussions with Ms. Yongfang Zhu on many statistical problems, and she has been very supportive through my research. I would like to thank Dr. Frederick Phelps and his wife Marion for the help both with my academic development and with my life. I owe great appreciation to Letong Xu, who was my physics teacher in both middle school and high school. His humor and understanding of science and life have shaped me since childhood. Last but not least, I thank my thesis committee members, besides those previously mentioned, for very helpful and constructive advice: Dr. John Middlebrooks, Dr. Ewan Macpherson, Dr. C.-—P. Yuan, Dr. Michael Harrison and Dr. William Pratt. vi PREFACE Binaural hearing, i.e. auditory perception with two ears, is crucial to humans. With two ears, people can better detect acoustical signals, and localize sound images in space. It has been a very active field in auditory research. The new developments in this field have improved performance on devices aiding patients with hearing dis- abilities (e.g. hearing aids and cochlear implants) as well as for people with normal hearing (e.g. stereophonic recording and virtual audio reality), as used in home- theater systems and computer games. Many models have been suggested to describe binaural hearing. To test these models, both broadband stimuli and sine-tones have been used in experiments. The study in this thesis experimented on human responses for a special group of broadband stimuli, namely dichotic pitches, as well as sine-tones. These experiments examined the fundamental question of how the human binaural system works, which will lead to better understanding of the auditory system and may promote new algorithms and audio products. Most of the sources for background in this thesis are in the Journal of Acoustical Society of America (JASA ). New experimental results are often presented at vari- ous conferences, including the biannual conferences of Acoustical Society of America (ASA), and the Binaural Bash, an annual conference on binaural hearing hosted by Boston University. The work presented in this thesis benefitted from helpful discus- sions and feedback at those conferences. In the experiments introduced in this thesis, human subjects responded to various binaural stimuli, and their responses were recorded. Besides individual differences, as always found in psychophysical experiments, subjects demonstrated similar results in these various tasks. The general tendencies, as well as the individual differences, improve understanding of detection and localization by the human binaural system. The thesis explored various aspects, and yet related topics, on binaural hearing. vii Chapter 1 measured the pitch strength of two different dichotic pitch stimuli, namely Huggins pitch and the binaural coherence edge pitch, in order to discriminate between two well—known binaural models. This work has been published in JASA. Chapter 2 experimented on lateralization of Huggins pitch with two differt-rnt phase boundaries, and the results were also compared with lateralization of sine-tones. The motivation was to test the quantitative model predictions on lateralization. This work has been submitted to JASA and is currently under review. Chapter 3 also focuses on lat- eralization of sine-tones, but. with a different focus. It was to test whether human auditory system lateralizes sound images on left and right based on cues of interaural phase difference or interaural time difference. This work has been published in J ASA. Unlike Chapters 1 through 3, which describe headphone experiments, Chapter 4 is on loudspeaker experiments. With the technique newly developed in this chapter, simulation through two loudspeakers was so accurate that listeners could not dis- criminate between real signal and simulation. With this technique, various features in the spectra were examined to test cues for localization in front and back. This work has been presented at various conferences, including the annual conference of ASA and the Binaural Bash. In general, the experiments presented in this thesis have achieved results on var- ious tasks by human subjects, and provide psychoacoustical evidence for better un— derstanding of human binaural mechanism on both detection and localization cues. viii TABLE OF CONTENTS LIST OF TABLES ............................... LIST OF FIGURES .............................. Binaural models and the strength of dichotic pitches 1.1 Introduction ................................ 1.1.1 Dichotic pitches .......................... 1.1.2 Binaural models .......................... 1.1.3 Motivation for the experiments on binaural pitch strength .............................. 1.2 Huggins pitch ............................... 1.2.1 Huggins pitch stimulus ...................... 1.2.2 Predictions of various models .................. 1.3 Experiment 1: Huggins pitch detection ................. 1.3.1 Method .............................. 1.3.2 Results ............................... 1.4 Experiment 2: Huggins pitch discrimination .............. 1.4.1 Method .............................. 1.4.2 Results ............................... 1.5 Experiment 3: Huggins pitch discrimination with three alternatives . 1.5.1 Method .............................. 1.5.2 Results ............................... 1.6 Discussion of Huggins pitch ........................ 1.7 BICEP ................................... 1.7.1 BICEP stimulus .......................... 1.7.2 Predictions of various models .................. 1.8 Experiment 4: BICEP detection ..................... 1.8.1 Method .............................. 1.8.2 Results ............................... 1.9 Experiment 5: BICEP discrimination .................. 1.9.1 Method .............................. 1.9.2 Results ............................... 1.10 Discussion of Huggins pitch and BICEP ................ 1.11 Discussion of pitch strength ....................... 1.11.1 Summary of the experiments ................... 1.11.2 The spatial masking interpretation ............... 1.11.3 Cross-correlation model ..................... 1.11.4 The exponential CAP model ................... 1.12 Conclusions ................................ ix 17 19 19 19 20 20 20 22 22 24 24 2 Binaural models and the lateralization of dichotic pitches 29 2.1 Introduction ................................ 29 2.2 Huggins pitch stimulus .......................... 30 2.3 Binaural models for the Huggins pitch ................. 30 2.4 Experiment 1: Lateralization of Huggins pitch with linear-phase bound- ary ..................................... 33 2.4.1 Method .............................. 33 2.4.2 Results ............................... 35 2.5 Experiment 2: Lateralization of Huggins pitch with stepped-phase boundary ................................. 53 2.5.1 Method .............................. 53 2.5.2 Results ............................... 54 2.5.3 Discussion ............................. 64 2.6 Experiment 3: Lateralization of sine-tone ................ 65 2.6.1 ‘ Method .............................. 66 2.6.2 Results ............................... 66 2.7 Discussion ................................. 81 2.7.1 Summary of the experiments .................. 81 2.7.2 Comparison of overlapped areas ................ 86 2.7.3 Models ............................... 88 2.7.4 Earedness ............................. 101 2.8 Conclusions ................................ 103 2.9 Appendix ................................. 105 2.9.1 Auxiliary experiments with narrow-band stimuli ....... 105 2.9.2 Calculation of overlapped area ................. 114 3 Lateralization of sine-tones 117 3.1 Introduction ................................ 117 3.2 Method .................................. 120 3.2.1 Stimulus .............................. 120 3.2.2 Listeners .............................. 120 3.2.3 Procedure ............................. 121 3.3 Results ................................... 123 3.4 Discussion ................................. 135 3.4.1 Individual differences ....................... 135 3.4.2 Comparison of standard deviations ............... 135 3.4.3 Fits to models ........................... 136 3.4.4 Three-parameter model ...................... 138 3.4.5 Monotonicity ........................... 139 3.5 Conclusion ................................. 139 4 Virtual Reality 141 4.1 Introduction ................................ 141 4.2 Method .................................. 145 4.2.1 Spatial setup ........................... 145 4.2.2 Alignment of loudspeakers and chair .............. 146 4.2.3 Signal generating and recording ................. 148 4.2.4 Stimuli and listeners ....................... 150 4.2.5 Transaural technique ....................... 154 4.2.6 Preliminary experiments ..................... 156 4.2.7 Calibration sequence ....................... 160 4.2.8 Optimizing the simulation .................... 165 4.2.9 Confirmation test ......................... 177 4.2.10 Hearing level ........................... 179 4.2.11 Accuracy of synthesis at the ear-drums ............. 182 4.3 Experiments ................................ 190 4.3.1 F lattening experiments ...................... 191 4.3.2 Peaks and dips .......................... 211 4.3.3 Monaural and binaural cues ................... 217 4.3.4 Varying stimuli .......................... 225 4.3.5 Competing cues .......................... 236 4.4 Auxiliary testing in ordinary room ................... 240 4.4.1 Source speakers at 5 feet ..................... 241 4.4.2 Source speakers at 1.2 feet .................... 242 4.4.3 Signal with slow onset and offset ................ 244 4.4.4 Front / back discrimination .................... 246 4.5 Conclusion ................................. 250 A References 252 Al References for Chapter 1 ......................... 252 A2 References for Chapter 2 ......................... 255 A3 References for Chapter 3 ......................... 256 AA References for Chapter 4 ......................... 257 xi 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 3.1 3.2 3.3 3.4 3.5 3.6 4.1 4.2 LIST OF TABLES Judgement of ambiguous points for Huggins pitch (linear-phase) . . . 51 Correlation coefficients comparing linear-phase and stepped-phase bound- aries for HP— with O-FIITD ....................... 64 Judgement of ambiguous points for Huggins pitch (stepped-phase) . . 65 Correlation coefficients comparing sin-7r with HP— with O-FIITD. The last row was copied from Table 2.2 .................... 77 Judgement of ambiguous points for sine-tone .............. 81 Correlation coefficients of lateral responses. The bottom block shows the results for both HP+ and HP— (or both sin-0 and sin-7r). . . . . 85 Average of overlapped area. of two normal distributions fitting the ex- perimental data .............................. 87 Best—fit with three-parameter model on ITD .............. 95 Best-fit with three-parameter model on IPD .............. 96 Percentage of responses on each side for HP— and sin—7r with O-FIITD 101 Stimulus I for the experiment ...................... 122 Eliminated stimuli ............................ 123 Best-fit with linear model ........................ 131 Best-fit with three-parameter model ................... 131 T-tests for STD(ITD) < STD(IPD) .................. 136 T-tests on RMSE ............................. 137 Information on listeners ......................... 154 Average score of percent correct over all listeners in preliminary front/back experiment ................................. 159 xii 4.3 4.4 4.5 4.6 4.7 Bands for each listener .......................... Boundary frequency in Experiments 5 and 6 .............. Value of the convolution function Sj ................... Reverberation time of Room 10B (ordinary room) ........... Correct score and externalization (0 to 3) of confirmation runs in Room 108 (ordinary room). Externalization was not measured for those runs with a star symbol for listeners W and X. However, W and X did not report less externalization for those runs, compared with previous runs, therefore the externalization scores for those two listeners in the top block can be taken as approximate externalization for the bottom two blocks. ................................... xiii 212 226 241 LIST OF FIGURES 1.1 Interaural phase of Huggins pitch stimuli ................ 1.2 Central activity pattern for Huggins pitch stimuli ........... 1.3 Percentage of correct responses of Experiment 1 ............ 1.4 Percentage of correct responses of Experiment 2 ............ 1.5 Percentage of correct responses of Experiment 3 ............ 1.6 Interaural phase of BICEP stimuli .................... 1.7 Percentage of correct responses of Experiment 4 ............ 1.8 Percentage of correct. responses of Experiment 5 ............ 2.1 Interaural phase of Huggins pitch stimuli with linear-phase boundary 2.2 Central activity pattern for Huggins pitch stimuli with linear boundary 2.3 Lateralization of HP+ (linear-phase) by listener C ........... 2.4 Lateralization of HP+ (linear-phase) by listener L ........... 2.5 Lateralization of HP+ (linear-phase) by listener W .......... 2.6 Lateralization of HP+ (linear-phase) by listener X ........... 2.7 Lateralization of HP+ (linear-phase) by listener Z ........... 2.8 Lateralization of HP— (linear-phase) by listener C ........... 2.9 Lateralization of HP— (linear-phase) by listener L ........... 2.10 Lateralization of HP— (linear-phase) by listener W .......... 2.11 Lateralization of HP— (linear-phase) by listener X ........... 2.12 Lateralization of HP— (linear-phase) by listener Z ........... xiv 11 14 16 18 21 23 31 32 37 38 40 41 43 44 45 46 47 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 2.25 2.26 2.27 2.28 2.29 2.30 2.31 Lateralization of HP— (linear-phase) with zero FIITD. The solid lines are the best-fit straight lines. The dashed lines show a slope of —1 expected from the CAP model. The error-bars are :I:1 standard deviation. 49 Interaural phase of Huggins pitch stimuli with stepped-phase boundary 54 Lateralization of HP+ (stepped-phase) by listener C .......... 55 Lateralization of HP+ (stepped-phase) by listener W ......... 56 Lateralization of HP+ (stepped—phase) by listener X .......... 57 Lateralization of HP+ (stepped-phase) by listener Z .......... 58 Lateralization of HP— (stepped-phase) by listener C .......... 59 Lateralization of HP- (stepped-phase) by listener W ......... 60 Lateralization of HP— (stepped-phase) by listener X .......... 61 Lateralization of HP— (stepped—phase) by listener Z .......... 62 Lateralization of HP— with zero FIITD. Open symbols: stepped-phase boundary. Solid symbols: linear-phase boundary. The solid lines are the best-fit straight lines. The dashed lines show a slope of —1 expected from the CAP model. The error-bars are i1 standard deviation. . . . 63 Lateralization of sin-0 (analogous to HP+) by listener C ....... 67 Lateralization of sin-0 (analogous to HP+) by listener W' ....... 68 Lateralization of sin-0 (analogous to HP+) by listener X ....... 69 Lateralization of sin-0 (analogous to HP+) by listener Z ....... 70 Lateralization of sin-7r (analogous to HP—) by listener C ....... 71 Lateralization of sin-7r (analogous to HP——) by listener W ....... 72 Lateralization of sin-7r (analogous to HP—) by listener X ....... 73 Lateralization of sin-7r (analogous to HP—) by listener Z ....... 74 XV 2.32 2.33 2.34 2.35 2.36 2.37 2.38 2.39 2.40 2.41 2.42 2.43 2.44 2.45 2.46 2.47 3.1 Lateralization of sin-7r (analogous to HP—) and HP— with zero F IITD. Open symbol: sin-7r. Solid symbol: HP— with linear-phase boundary. The solid lines are the best-fit straight lines. The dashed lines Show a slope of —1 expected from the CAP model. The error—bars are i1 standard deviation. ............................ Lateralization of HP+ and sin—0 for four listeners ........... Lateralization of HP— and sin-7r for four listeners ........... Overlapped area (the shaded, whose area is A) of two normal distribu- tions fitting the data at 700 Hz with 0—FIITD between the linear- and stepped-phase boundary for listener C ................. Lateralization of linear—phase Huggins pitch vs. ITD .......... Lateralization of stepped-phase Huggins pitch vs. ITD ........ Lateralization of sine-tones vs. ITD ................... Lateralization of linear-phase Huggins pitch vs. IPD .......... Lateralization of stepped-phase Huggins pitch vs. IPD ........ Lateralization of sine—tones vs. IPD ................... Interaural phase of linear-phase boundary. Power exists only at fre- quencies where a phase difference is shown ................ Interaural phase of stepped-phase boundary. Power exists only at fre- quencies where a phase difference is shown ................ Lateralization of phase boundary region ................ Laterality of the boundary region of HP+, percentage of responding AB as moving to the left and BA as moving to the right, as predicted by the CAP model ............................ Laterality of the boundary region of HP—, percentage of responding AB as moving to the right and BA as moving to the left, as predicted by the CAP model ............................ Overlapped area of two standardized normal distributions ....... Sine-tone lateralization by listener A .................. xvi 92 93 94 97 98 99 106 110 115 3.2 3.3 3.4 3.5 3.6 3.7 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 Sine—tone lateralization by listener C .................. 125 Sine—tone lateralization by listener W .................. 126 Sine—tone lateralization by listener Z .................. 127 Sine-tone lateralization by listener X .................. 128 Sine-tone lateralization compared with IPD .............. 133 Sine—tone lateralization compared with ITD .............. 134 Setup of loudspeakers in the anechoic room ............... 146 Block diagram of signal generating and recording ........... 149 Gain function AL1 of source signal ................... 153 Gain function ALQ of source signal ................... 153 Results of preliminary front / back experiment .............. 158 Spectrum of the signal sent to the front source speaker ........ 161 Ear-canal spectra for the front speaker playing the source signal X (f) 161 First recording of the a synthesis speaker playing X (f) ........ 162 First recording of the {3 synthesis speaker playing X (f) ........ 162 Spectra of Simulation 1 sent to the synthesis speakers ......... 163 Ear-canal spectra for Simulation 1 for the front source ........ 163 Left ear-canal spectra for the real and virtual signals ......... 164 Right ear-canal spectra for the real and virtual signals ........ 164 Simplified model with symmetrical head and synthesis speakers setup symmetrically ............................... 167 Second recording of the a synthesis speaker playing Simulation 1 Au( f) 170 Second recording of the 6 synthesis speaker playing Simulation 1 AQ( f) 170 xvii 4.17 Spectra of Simulation 2 sent to the synthesis speakers ......... 4.18 Ear-canal spectra for Simulation 2 for the front source ........ 4.19 Spectra of the penultimate simulation sent to the synthesis speakers . 4.20 Ear-canal spectra for the penultimate simulation for the front source . 4.21 Real vs. virtual recording in right-ear with one eliminated component 4.22 Left ear-canal amplitude spectra for the real and virtual signals 4.23 Right ear-canal amplitude spectra for the real and virtual signals 4.24 Ear-canal phase differences for the front real and virtual signals. . . . 4.25 Flow diagram of the calibration sequence for the front source speaker 4.26 Source signal sent to the front loudspeaker ............... 4.27 Block diagram of Bekesy tracking .................... 4.28 Source signal with listener A’s hearing thresholds ........... 4.29 Source signal with listener C’s hearing thresholds ........... 4.30 Source signal with listener D’s hearing thresholds ........... 4.31 Source signal with listener E’s hearing thresholds ........... 4.32 Source signal with listener F ’3 hearing thresholds ........... 4.33 Source signal with listener L’s hearing thresholds ........... 4.34 Source signal with listener M’s hearing thresholds ........... 4.35 Source signal with listener P’s hearing thresholds ........... 4.36 Source signal with listener R’s hearing thresholds ........... 4.37 Source signal with listener S’s hearing thresholds ............ 4.38 Source signal with listener V’s hearing thresholds ........... 4.39 Source signal with listener X’s hearing thresholds ........... xviii 172 174 174 175 176 176 177 178 180 181 184 186 187 4.40 Variation of ear-canal spectra with different probe-tip positions. The numbers on the legend are approximate distance from the probe—tips to the KEMAR microphones. ...................... 190 4.41 Amplitude spectra for the front source in right ear in Experiment 1, flattened below 8 kHz. .......................... 196 4.42 Amplitude spectra for the back source in right ear in Experiment 1, flattened below 8 kHz ........................... 196 4.43 Amplitude spectra for the front. source in left ear in Experiment 2 . . 200 4.44 Amplitude spectra for the back source in left. ear in Experiment 2 . . 200 4.45 Result of Experiments 1 and 2 (part 1) ................. 201 4.46 Result of Experiments 1 and 2 (part 2) ................. 202 4.47 Amplitude spectra for the front source in left ear in Experiment 3 . . 206 4.48 Amplitude spectra for the back source in left ear in Experiment 3 . . 206 4.49 Amplitude spectra for the front source in left ear in Experiment 4 . . 209 4.50 Amplitude spectra for the back source in left ear in Experiment 4 . . 209 4.51 Result of Experiments 3, 4 and 4A ................... 210 4.52 Amplitude spectra for the front source in right ear in Experiment 5 . 214 4.53 Amplitude spectra for the back source in right ear in Experiment 5 . 214 4.54 Amplitude spectra for the front source in right ear in Experiment 6 . 215 4.55 Amplitude spectra for the back source in right ear in Experiment 6 . 215 4.56 Result of Experiments 5 and 6 ...................... 216 4.57 Amplitude spectra for the front source in right ear in Experiment 7 . 218 4.58 Amplitude spectra for the back source in right ear in Experiment 7 . 218 4.59 Results of Experiment 7 ......................... 220 4.60 Amplitude spectra for the front source in right ear in Experiment 8 . 222 xix 4.61 4.62 4.63 4.64 4.65 4.66 4.67 4.68 4.69 4.70 4.71 4.72 4.73 ISLD for the front source in Experiment 8 ............... Result of Experiment. 8 .......................... Amplitude spectra for the front source in left ear in Experiment 9 . . Amplitude spectra for the front source in right ear in Experiment 9 Result of Experiment 9 .......................... First derivative of level spectra for the front source in left ear in Ex- periment 9 ................................. Second derivative of level spectra for the front source in left ear in Experiment 9 ............................... Phase differences for the front source in Experiment 10 ........ Phase differences for the back source in Experiment 10 ........ Result of Experiment 10 ......................... Result of Experiments 1 and 2 and 11 (part 1) ............. Result of Experiments 1 and 2 and 11 (part 2) ............. Result of Experiment 10 in ordinary room (Room 10B). An open sym- bol shows the mean and standard deviation of four runs. Solid symbols are results for listeners W and X, who did not complete four runs for this experiment. Listener W did two runs for each ITD, and the error— bars are absolute errors. Listener X did only one run for ITD of 200 11s, and therefore there is no error-bar for him .............. 224 227 227 229 230 230 232 232 234 237 238 247 Chapter 1 Binaural models and the strength of dichotic pitches 1 . 1 Introduction 1. 1.1 Dichotic pitches In research on human hearing, dichotic pitch (Cramer and Huggins, 1958; Bilsen and Raatgever, 2000, 2002) has been widely studied. A dichotic pitch stimulus con- sists of white noises in both ears. When looking at the stimulus at one ear, nothing in the amplitude or phase spectrum is frequency specific. The listener can hear only white noise with just one ear. However, there is a certain interaural phase relation- ship between the two noises in different ears. Hence, when listening with both ears, the listener can hear a clear pitch that can be matched consistently to within a few percent (Hartmann, 1993). There are two types of dichotic pitches, namely the pure-tone—like pitches and the complex-tone like pitches. The pure-tone-like pitches sound like a sine tone in the noise background, because the interaural phase relationship specifies a single fre- quency leading to a binaural perception. The Huggins pitch (Cramer and Huggins, 1958; Guttman, 1962), the binaural edge pitch (Klein and Hartmann, 1980; Frijns et al., 1986), and the binaural coherence edge pitch (Hartmann and McMillon, 2001) belong to this category. The complex-tone-like pitches sound like a complex tone with many harmonics in the noise background, because the interaural relationship speci- fies a series of harmonically related frequencies. This category includes the Fourcin pitch (Fourcin, 1962, 1970), the dichotic repetition pitch (Bilsen, 1972; Bilsen and Goldstein, 1974), and the multiple phase shift pitch (Bilsen, 1976). 1. 1.2 Binaural models To explain the origin of dichotic pitches, different models were considered. The experiments on pitch strength, described in this chapter, focused on two well-known models, namely, the equalization—cancellation (EC) model (Kock, 1950; Durlach, 1960, 1972) and the central activity pattern (CAP) model (Raatgever and Bilsen, 1986). The goal of the experiments in this chapter was to discover which model best describes the relative pitch strength with different phase configurations. The EC model was first suggested by Kock (1950) and greatly elaborated by Durlach (1960, 1972). It was developed from experiments on the masking level dif- ference (MLD). The EC model states that, to form a central spectrum, the binaural system subtracts the signals in the left- and right-ears. To get the best signal-to—noise ratio, before the subtraction, there is an equalization process, in which the amplitudes at two ear-pathways are set identical, and a certain interaural phase-shift is applied to one ear-pathway. The practical way to get a phase-shift is to add interaural delay lines. Since then, newer versions of the EC model were suggested, such as the modi- fied EC model (e. g. Culling et al., 1998a,b), in which the phase shiftng is obtained by applying delay lines tuned in frequency. Akeroyd and Summerfield (2000) suggested the reconstruction comparison (RC) model. It has five steps, with a smart way of getting the dichotic pitches on the central spectrum in detail. It is still an EC type model, in the sense that it applies the EC model in detecting the dichotic pitches. Therefore it is still a subtraction type model, and gives the same prediction on pitch strength as the EC model. On the other hand, Raatgever and Bilsen (1986) suggested the CAP model, stating that the binaural system adds the signals in the left- and right-ears in channels tuned both in frequency and in interaural time delay (ITD), and the result is a central activity pattern in the frequency-ITD plane. More popular than the addition model is the model suggested by Jeffress (1948). which calculates the binaural cross-correlation in the tonotopic-ITD plane. However, Hartmann and Zhang (2003) showed that these two models are equivalent. Therefore, the CAP model is also a Jeffress type model. 1.1.3 Motivation for the experiments on binaural pitch strength The goal of the experiments in this chapter was to compare the EC model and the CAP model. Both of these two models apply interaural delay lines. According to contemporary models of the binaural system, as the delay increases greatly, the number of neurons decreases, and hence the fidelity decreases (Stern and Trahiotis, 1995). This effect is usually modeled by the central weighting function p(T), deter- mined from MLD experiments. In the MLD experiments, it was found that the MLD in the NOS7r configuration (i.e. noise in phase and signal out of phase at the two ears) is larger than the MLD in the N7rS0 configuration (i.e. noise out of phase and signal in phase at the two ears), especially at low frequencies (Colburn, 1977). The p(7') function was invented to allow the EC model to explain this result. But p(T) is also used in the CAP model. In the CAP model, the central weighting favors the sound image closer to the medial plane (the so—called “centrality”), which is necessary in the localization task, given the fact that there are multi-images at integer number of cycles away on the ITD axis (Raatgever, 1980; Raatgever and Bilsen, 1986; Bilsen and Raatgever, 2000). The experiments in this chapter were on the pitch strength of the dichotic pitches at low frequencies, where long delay lines were required. By examining how differently the. pitch strength degraded with different phase configurations, it. could be tested whether the EC model or the CAP model gives the better prediction. 1 .2 Huggins pitch 1.2.1 Huggins pitch stimulus Huggins pitch stimuli contain broad-band white noise for each ear-channel. Out- side a small frequency band (the so-called “phase boundary region”), the interaural phase difference (IPD) is fixed at. a certain value, 00- Within the phase boundary region, the IPD increases linearly from 990 to 050 + 360° as frequency increases. For frequencies less than about 1300 Hz, the listener can hear a pitch on top of the back- ground noise. The pitch can be matched with a sine tone at the boundary frequency, defined as the center frequency of the phase boundary region. At the boundary fre- quency, the IPD is (.130 + 180°. To make the language simpler, it is useful to introduce the concept of background phase, which is just defined as (to. Three different phase configurations (Figure 1.1) were considered in the following experiments, and they are: 1. HP-: with the background phase $0 = 0°. (At the boundary frequency, the signals at the two ears are inverted, which gives “HP—” as its name.) 2. HPQ: with the background phase C50 = 90°. (The “Q” in its name means quadrature.) 3. HP+: with the background phase do 2 —180°, equivalent to a simple inversion of HP— at one ear. (At the boundary frequency, the signals at the two ears are 4 350' 4270' 9- i 180 90* ER ‘ fb Frequency 450* _J350: HPQ. 9. , . l 270 .......... - ----_ 180r 90' fa Frequency fb Frequency Figure 1.1: Interaural phase of Huggins pitch stimuli in phase, which gives the name “HP+”.) 1.2.2 Predictions of various models According to the EC model, to detect the HP— signal, the binaural system simply subtracts the left- and right-ear signals, and the residual in the boundary region gives the pitch sensation; however, for the HP+ signal, before the subtraction, a certain delay line has to be applied to one ear. Due to the central weighting, the binaural system discounts long-delay neurons. Therefore the EC model predicts that the listener would detect HP— more easily than HP+, i.e. the pitch strength of HP— is greater than the pitch strength of HP+. This difference should occur at all boundary frequencies. However, at high boundary frequencies, (e.g. near 500 Hz), the pitch strength is so strong that the listener can easily detect the pitch with both HP+ and HP— configurations. Only at low frequencies where delay lines approach the limits, is the difference in pitch strength expected to be noticeable. Therefore the experiments in this chapter were performed in the low frequency range. On the other hand, according to the CAP model, the binaural system adds the left- and right-ear signals, and the peak at the boundary frequency that is closest to the medial line on the frequency-ITD plane gives the pitch sensation. Figure 1.2 shows the frequency-ITD plane for all the three phase configurations of Huggins pitch. On the figure, Strong excitation by the neurons appears as bright bands, and weak excitation appears as dark bands. For HP—, the peaks (marked with ovals in Figure 1.2) appear at large ITD; for HP+, the peak appears at O-ITD, i.e. a simple addition would “do the job”. Therefore the CAP model predicts that the pitch strength of the HP— is less than the pitch strength of HP+. There is another model that needs to be mentioned here. Green (1966) suggested a variation on the EC model, stating that the cancellation can be done by either addition or subtraction. It is a very convenient model for binaural edge pitch (Klein 71? HP :5 Q >. 2 5 :3 0" 8 Lu HP+ lntemal interaural delay 2' (ms) Figure 1.2: Central activity pattern for Huggins pitch stimuli and Hartmann, 1980), and it may find more support from recent physiology focusing on the excitation and inhibition in the medial superior olive (Grothe, 2000; Brand et al., 2002). According to Green’s variation, without any internal delay lines, HP— can be detected by subtraction and HP+ can be detected by addition. However, the worst case is for HPQ, no matter subtraction or addition was used, there has to be a 90° internal delay applied to one ear-channel before the cancellation. Therefore Green’s variation predicts that the pitch strength of the HPQ signal is worse than the pitch strength of the HP+ or HP— signals. This is actually the reason that HPQ was included in our experiments. 1.3 Experiment 1: Huggins pitch detection 1.3.1 Method Stimulus The listener was presented with a two—channel stimulus. The signal in the left- ear channel was white noise with 16384 components of equal amplitude and random phase. For the Huggins pitch stimulus, the right-ear channel was identical to the left- ear channel except that the phases of the components were artificially varied (as in Figure 1.1). Outside the phase boundary region, the IPD was set to be 450: 0° for HP—, 90° for HPQ, and 180° for HP+. Within the phase boundary region, the IPD increased linearly with frequency by 360°, i.e. from $0 to $0 + 360°. 450 + 360° is equivalent to (:50, therefore the IPD is continuous over all frequencies. The bandwidth of the phase boundary region was 6% of its center frequency, i.e. the boundary frequency. For the noise stimulus without the Huggins pitch, the right-ear channel was simply a phase-shifted version of the left-ear channel by a fixed IPD of $0 at all frequencies. The listener had to distinguish the two types of intervals using information within the boundary region only, because the spectra outside the boundary region were identical for both intervals. The stimuli were calculated by an array processor (Tucker Davis AP2) on the computer, and converted to audio by 16—bit DACs (Thcker Davis DD1). The sample rate per channel was 20 ksps (kilo-samples per second). With a continuously cycled memory buffer of 32768 words, the period was 1.6384 seconds, and the frequency spacing between adjacent components was 0.61 Hz. The maximum frequency of the broadband noise as converted was 10 kHz, and the output signal was low-pass filtered at 8 kHz with Brickwall filters at a rate of —115 dB/octave. The phase configuration of the two channels was tested by adding or subtracting them on an oscilloscope immediately before sending them to the headphone power amplifiers. The cancellation was good to at least 40 dB. Listeners Three listeners were in this experiment, M (female, age 42), W (male, age 61), and X (male, age 26). All listeners had normal hearing except W who had a bilateral hearing loss above 8 kHz, typical of males of this age. Listeners W and X had considerable experience in dichotic listening. Listener M had no previous experience in similar experiments. Procedure The listener heard two intervals with duration of 500 ms, and there was a silent gap of 500 ms in between. One of the intervals was a Huggins pitch stimulus, and the other one was diotic noise. The listener would decide which interval was the Huggins pitch stimulus, and press the corresponding button on a response box. The percentage of correct responses was considered as an estimate of the pitch strength of Huggins pitch stimuli. The phases of the components were randomized on each interval. Each run con- tained 80 trials, 10 trials with each of the eight nominal boundary frequencies: 100, 125, 160, 200, 250, 315, 400, and 500 Hz. The trials were presented in a random order. On each trial, the boundary frequency was randomly varied by an amount within a :I:6% rectangular distribution. Every run presented Huggins pitch stimuli with a fixed phase configuration, HP—, HPQ, or HP+. The listener did runs with HP—, HPQ, and HP+ in a quasi-random order and partly sequential. Totally, each listener did ten runs with each phase configuration. The level of the stimuli was 65 dB for each ear-channel, and the spectrum level was 26 dB re 10‘12 Watt / (m2 . Hz). During the experiments, the listener sat in a double-walled sound-treated room, and listened to the stimuli via Sennheiser HD 480-II headphones. 1.3.2 Results Figure 1.3 illustrates the results of Experiment 1, showing the percentage of correct responses vs. the boundary frequency, for the stimuli with all three different phase configurations, and for each listener as well as the average. Observations In Figure 1.3, two trends appeared for all listeners: 1. For each phase configuration (HP—, HPQ, or HP+), the percentage of correct responses (PC) decreased as boundary frequency decreased. 0 2. At each fixed boundary frequency, PC decreased as the background phase (0 , 90° and 180° corresponding to HP—, HPQ and HP+, respectively) increased. (Most useful information was obtained from the frequencies, 125, 160 and 200 Hz. For the lower frequencies, PC is very close to or below 50%, the limit of guessing. For the higher frequencies, the deviation of PC is too small to Show 10 80 60 4o 20L . 4o 20 Figure 1.3: Percent correct '5 O 1 - (c) Huggins pitch detection for listener X .: ‘ 0 HP- (O-degree-IPD outside p.b.r.) A HPQ (SO-degree-IPD outside p.b.r.) ' Cl HP+ (lBO-degree-IPD outside p.b.r.) ‘ (d) Huggins pitch detection for all listeners 1 I 100 I 25 160 200 250 3 I 5 400 500 Boundary frequency (Hz) Percentage of correct responses of Experiment 1 this trend.) This experiment indicates that PC, as a measure of pitch strength, shows the ordering HP— > HPQ > HP+, consistent with the prediction by the EC model. Statistics To perform statistical tests on the ordering, ceiling and floor effects have to be considered. At high frequencies, the percentage of correct responses approaches 100%. Due to the statistical distribution, the ordering is not obvious. This is the ceiling effect. On the other hand, at low frequencies, the percentage of correct responses is close to or even below 50%, which is the limit for random guessing. Hence the ordering might be random, and does not represent the ordering of pitch strength. This is the floor effect. To perform a meaningful test between HP— and HP+, comparisons were performed only between the pairs, at least one of which led to a PC between 55% (5% above random guessing limit) and 95% (5% below perfectness). There were 15 such pairs for all three listeners and eight frequencies. All of them showed that the performance of HP— was better than the performance of HP+. A one-tailed t-test at the 0.05 level found that 14 out of the 15 pairs showed significant advantage in favor of HP-. 1.4 Experiment 2: Huggins pitch discrimination Experiment 1 directly studied detection of Huggins pitch, and considered it as a factor of pitch strength. However, the listener might use other cues, instead of pitch, to detect Huggins pitch stimuli. Furthermore, one usually refers the pitch to the musical sense of highness or lowness. Hence Experiment 2 studied discrimination of the boundary frequencies of Huggins pitch stimuli. Cramer and Huggins (1958) actually used discrimination task in their first reports on dichotic pitch effects. Other 12 methods, such as sine-tone matching task, have also been used by other researchers (e.g. Guttman, 1962). 1.4.1 Method The stimuli, listeners, and protocol were the same as in Experiment 1. The only difference is that, on each trial in Experiment 2, both intervals were Huggins pitch stimuli. The boundary frequency in the second interval was either 6% (a semitone) higher, or 6% lower, than that in the first interval. After comparing the two inter- vals in each trial, the listener would respond which pitch was higher, and press the corresponding buttons on the response box. 1.4.2 Results Figure 1.4 shows the results of Experiment 2. Except for one special case (i.e. listener X and 160-Hz boundary frequency), the percentage of correct responses for all three listeners shows the same ordering as in Experiment 1: HP— > HPQ > HP+, especially at low frequencies. Not surprisingly, the average plot in the bottom panel demonstrates the same ordering, in favor of the EC model. The same statistical test as in Experiment 1 was performed between HP— and HP+ for the 11 pairs, which were selected in such a way that the percentage of correct responses of at least one in each pair was between 55% and 95%. All of the 11 pairs showed advantage in favor of HP—, and 10 of them were found significant by a one—tailed t-test at the 0.05 level. 13 ' (b) Huggins pitch discrimination for listener W a 1 l n n J v I Percent correct - (c) Huggins pitch discrimination for listener X n n L n n n r v v v I I v t 40- 0 HP- (O-degree-IPD outside p.b.r.) A HPQ (SO-degree—IPD outside p.b.r.) 20 Cl HP+ (lSO-degree-IPD outside p.b.r.) ' (d) Huggins pitch discrimination for all listeners ‘ - 4 100 I 25 160 200 250 315 400 500 Boundary frequency (Hz) Figure 1.4: Percentage of correct responses of Experiment 2 14 1.5 Experiment 3: Huggins pitch discrimination with three alternatives Experiment 3 was very similar to Experiment 2, except that it was a three- alternative—force—choice experiment. The random guess limit on Experiment 2 was 50%, and therefore the useful range of the data was between 50% and 100%. The advantage of Experiment 3 is that it reduced the random guess limit to 33.3%, giving a larger useful range between random guessing and perfect performance. 1.5.1 Method The experiment was the same as Experiment 2, except that in Experiment 3, the boundary frequency in the second interval was 9% (1.5 semitones) higher, or 9% lower, or the same as in the first interval. (Because the task is more difficult than Experiment 2, a larger difference on frequency, 9%, was used.) The listener had to decide what the pitch relationship was between the two intervals and respond by pushing one of the three buttons on the response box. 1.5.2 Results Figure 1.5 shows the results of Experiment 3. Similar to Experiment 2, the per- centage of correct responses by all three listeners shows clear ordering of HP— > HPQ > HP+, except for one special case (listener X at 200 Hz). The average plot on the bottom panel of Figure 1.5 also exhibit the same ordering. Similar statistical test as in Experiments 1 and 2 was performed. Among the 18 selected pairs, PC of at least one of which was between the range 38% (5% above the limit of random guessing) to 95% (5% below perfectness), 12 of them showed a significant advantage for HP— over HP+ at the 0.05 level, and all of them showed the order HP - > HP+. The ordering is the same as in Experiments 1 and 2, supporting the EC model. 15 A A A A I A A I A I Percent correct 10 HP- (O-degree-IPD outside p.b.r.) 20 ' A HPQ (SO-degree-IPD outside p.b.r.) ‘ - Cl HP+ (lBO-degree-IPD outside p.b.r.)- 0 - ‘(dl HPIdiscrimination for all liste‘ners 100 125 160 200 250 315 400 500 Boundary frequency (Hz) 4 Figure 1.5: Percentage of correct responses of Experiment 3 1.6 Discussion of Huggins pitch In summary, all of the above Huggins pitch experiments showed clear and statisti- cally significant ordering at low frequencies: the performance decreased for decreasing boundary frequency; and the performance was best on HP—, worst on HP+, and in- termediate on HPQ, which supports the EC model, and disfavors the CAP model. 1.7 BICEP 1 .7. 1 BICEP stimulus The binaural coherence edge pitch (BICEP) is another type of dichotic pitch. Similar to Huggins pitch stimulus, the left—ear channel of the BICEP stimulus is also white noise. However the right-ear channel was generated from the left-ear channel signal in a different way (Figure 1.6): below a boundary frequency, the signals in the left- and right-ear channels are incoherent, i.e. cross-correlation20; above the boundary frequency, the left- and right-ear signals are coherent. The phase difference in the coherent region between the two ears is fixed at a certain value, called the interaural phase difference (IPD). For all IPDs, the listeners can hear a pitch on top of the background noise. The pitch can be matched with a sine tone at a frequency about 4% below the boundary frequency (Hartmann and McMillon, 2001). Parallel to HP—, HPQ and HP+, three IPDs, 0°, 90°, and 180° were used for the BICEP stimuli, and they were called BICEP-0, BICEP-90 and BICEP—180, respectively. The BICEP stimuli was generated with the same instruments and protocol as generating the Huggins pitch stimuli in Experiments 1 through 3. Besides the BICEP introduced in the last paragraph (BICEP-coherent-above), there is another type of BICEP, which has incoherent components above a boundary frequency, and coherent components below the boundary frequency (BICEP-coherent- 17 360» , . , 270: - _ - j BICEP-O 180- ° 90- ¢L'¢R ML fe Frequency f 350» . 270’. '. BICEP-90: ft‘WR 180 » x: . fe Frequency 350- .. .. , 270. BICEP-180. 180- 90- Vt'fa f. Frequency Figure 1.6: Interaural phase of BICEP stimuli 18 below). The boundary frequencies in the experiments in this chapter were very low (below 500 Hz). At this frequency range, the BICEP-coherent—above is stronger than the BICEP-coherent-below, probably because there are more coherent components in the BICEP-coherent-above stimulus to cancel. Therefore, only BICEP-coherent- above stimulus was used in the following experiments. 1.7.2 Predictions of various models According to the EC model, to perceive the BICEP, the coherent components of BICEP-0 can be cancelled by a simple subtraction. However, for BICEP-180, the coherent components need to be phase shifted before taking the subtraction. The phase shift is probably done by internal delay lines. Therefore the EC model predicts that at low boundary frequencies, it is the easiest to perceive BICEP-0, and most difficult to perceive BICEP-180, and with intermediate difficulty for BICEP-90, the same order as the predictions for corresponding Huggins pitch stimuli. On the other hand, the CAP model predicts the pitch strength of BICEP in the opposite order. Green’s modified version of the EC model predicts that BICEP-90 is the most difficult to perceive. 1.8 Experiment 4: BICEP detection 1.8.1 Method Experiment 4 was on BICEP detection, parallel to Experiment 1. The protocols and listeners were the same as in Experiment 1, but BICEP stimuli replaced Huggins pitch stimuli. The listener’s task was to respond which one of the two intervals was a BICEP stimulus. The boundary frequencies were varied randomly over a range of of i5% of the nominal boundary frequency. In the preliminary experiments, the listeners received almost perfect scores very easily with boundary frequencies above 19 250 Hz. Therefore only the five boundary frequencies no higher than 250 Hz were used in the following experiments, making a shorter run with only 50 trials. 1.8.2 Results Figure 1.7 shows the results of Experiment 4. Similar to the results of Experi- ment 1, the performance decreases with decreasing boundary frequency. Except for one special case (listener X at 125 Hz), the percentage of correct responses was in the order: BICEP—0 > BICEP-90 > BICEP—180, in favor of the EC model. As in Ex- periment 1, similar statistical test was performed between BICEP-0 and BICEP-180. Totally, 7 pairs were selected in such a way that at least one interval in each pair led to a PC between 55% and 95%. On one-tailed t-test at the 0.05 level, all the 7 pairs showed significant advantage on BICEP-0 over BICEP-180, supporting the EC model. 1.9 Experiment 5: BICEP discrimination 1.9.1 Method Experiment 5 was on discrimination of BICEP, parallel to Experiment 2 on Hug- gins pitch discrimination. In each trial, after hearing two intervals with different boundary frequencies, the listener responded by responding which one of the two in— tervals was higher in pitch. Because the BICEP was more difficult to hear than the Huggins pitch stimuli (as observed by Akeroyd et al., 2001), the boundary frequency on the second interval was varied by :t9% (1.5 semitones) from the boundary fre- quency on the first interval, which was larger than the i6%—variation in Experiment 2. Furthermore, the three-alternative task (as in Experiment 3) would be even more difficult, therefore only the two—alternative task was performed. 20 100 so: IAJAAAIAL so} ---—- -—------—-—-—-—————————— ——————————— 40:- 20} i .' (a) BICEP detection for listener M '0- 1 l l l I I I Y V r Y 100 80 _ so ’ ---------------------------------------- 40: * N O ALIA. I (b) BICEP detection for listener W L H A + q A l I' T y f r 100 Percent correct m C 03 O v'v ALLAAAA A O U' " ‘LlAIL 20’- ' (c) BICEP detection for listener X i too_ so: so} 40: O BICEP (O-degree-IPD coherence) 1 ' A BICEP (90-degree-IPD coherence) '. 20} Cl BICEP (l80-degree-IPD coherence) 1 0: (d) BICEP detection for all listeners l 100 125 160 200 250 315 400 500 Boundary frequency (Hz) Figure 1.7: Percentage of correct responses of Experiment 4 21 1.9.2 Results Figure 1.8 shows the results of Experiment 5. Due to the ceiling and floor effects, little information was revealed from Figure 1.8 at 400, 100, or 125 Hz. At the remain- ing frequencies between 160 and 315 Hz, listeners M and W showed that the pitch strength in general decreased in the order of BICEP-0 > BICEP-90 > BICEP—180, with two exceptions (listener M at 315 Hz and listener W at 250 Hz); however lis- tener X’s data were similar for all three phase configurations. and did not show such strong trend. On the average plot on the bottom panel, the same ordering, although not so strong, as in Experiment 4 appears: BICEP-0 > BICEP-90 > BICEP-180. Parallel to Experiment 4, 17 paired comparisons were performed between BICEP—0 and BICEP-180, and the 17 pairs were selected so that at least one interval in each pair led to a PC between 55% and 95%. 13 out of the 17 pairs showed advantage on BICEP-0 over BICEP—180, and 8 out of the 13 pairs showed significant advantage with a one-tailed t-test at the 0.05 level. The general ordering of performance from this experiment supports the EC model and disfavors the CAP model, as the previous experiments. 1.10 Discussion of Huggins pitch and BICEP Huggins pitch and BICEP are two different types of dichotic pitch stimuli. Yet, from the five experiments in this chapter, both stimuli showed ordering of perfor- mance at low frequencies in favor of the EC model, even with two different tasks (i.e. detection and discrimination). This result suggests that the EC model reveals a general binaural mechanism to perceive various dichotic pitch stimuli. 22 (a) BICEP discrimination for listener M A A I A A I I I I I I I I I I l00 - - so : : so : : 4o IllT‘ftf ’3 20 I I (b) BICEP discrimination for listener W A A I A I A A I I I I I I Percent correct 100- 80 60 O BICEP (O-degree-IPD coherence) A BICEP (SO-degree-IPD coherence) 20 ' Cl BICEP (lBO—degree-IPD coherence) ' (d) BICEP discrimination for all listeners 100 l25 160 200 250 315 400 500 Boundary frequency (Hz) Figure 1.8: Percentage of correct responses of Experiment 5 23 1.11 Discussion of pitch strength 1.11.1 Summary of the experiments To test various models of binaural pitch formation, five experiments were per- formed on detection and discrimination of Huggins pitch and BICEP stimuli. These experiments explored especially at low boundary frequencies, where the performance got poor. The results show that the performance decreased as the frequency de- creased; and for the same frequency, the performance decreased as the background phase of Huggins pitch or the IPD of BICEP increased, i.e. the percentage of correct responses was in the order of HP— > HPQ > HP+ and BICEP—0 > BICEP-90 > BICEP-180. These results support the equalization-cancellation model, and disfavor the central activity pattern model, nor do the results agree with the predictions of Green’s modified EC model, which predicts that the worst. performance occurs on HPQ and BICEP-90. There is other experimental evidence complying with the above two trends (i.e. performance failing with decreasing frequency and increasing delay). For frequency dependence, Wilbanks and Whitmore (1967) interpreted their MLD experiments as indicating that there is internal noise at low frequencies, reducing interaural coherence that possibly disturbing the perception of dichotic pitch at low boundary frequencies. For delay dependence, different phase configurations played a big role at the bound- ary frequencies between 200 and 300 Hz, and the maximum delays (180° at those frequencies) were between 1.7 and 2.5 ms, agreeing with van der Heijden and Trahi- otis (1999), who interpreted the results of their MLD experiments by assuming that comparable delay lines would begin to fail. In order to thoroughly understand both trends, it is necessary to employ a model failing in the limit of both low-frequency and long delay. Fortunately several binaural models do have weight on both frequency and delay (Raatgever, 1980; Stern et al., 1988; Shackleton et al., 1992). 24 1.11.2 The spatial masking interpretation There are other interpretations for the experimental results in this chapter. Bilsen (2000) noted that, compared with HP—, the pitch in HP+ was masked more by background noise. Listeners in our experiments also agreed with this finding. This phenomenon can be interpreted by spatial masking. For HP—, the background phase is 0°, hence the noise is NO, generally heard as compact in the center of the head, whereas at the boundary frequency, the interaural phase difference is 180°, hence the pitch (i.e. the signal) is Srr, which is lateralized to one side of the head (either left or right). Because both noise and signal are compact and far away from each other, there is little interference between them. Therefore the listener can hear the pitch easily. For HP+, however, the background phase is 180°, hence the noise is N 7r, distributed in a large space on both sides (Figure 1.2). The pitch at the boundary frequency has interaural phase difference of 0°, compact in the center of the head. Because the noise is diffuse, there is little space between the noise and signal, leading more interference between them. Therefore it is harder for the listener to perceive HP+ than HP—. The lateralization interpretation introduced above was confirmed by the feedback from listeners, and by the experiments in the next chapter. However some comments against spatial masking interpretation need to be made as well. 1. All models of dichotic pitch formation focus on the contrasts on the tonotopic (frequency) domain, not on the spatial domain, which disfavors the spatial masking interpretation. 2. The EC model can also explain the better perception of HP—. In order to cancel the noise background of HP+ with a phase shift of 180°, internal delay lines with different delay values tuned in channels have to be used, leading error in the cancelling process. The error is especially large with long internal delays. 25 On the other hand, cancelling the noise background of HP— does not require internal delay line, and therefore gives less error and better perception. 3. As the boundary frequency increases, the lateralization of the pitch in HP— would move closer toward the center (Figure 1.2). Therefore considering the spatial masking effect only, one would predict that the pitch interference be- tween noise and pitch would be stronger as the boundary frequency increases, contradictory to the experimental findings in this chapter. 4. Further evidence against the spatial masking interpretation is the experiments on detection in noise by Good et al. (1994). They compared their results with the results of experiments on localization in noise, and found no evidence of distinct relation between detection and localization. 1.11.3 Cross-correlation model Another model worthy of mentioning is the cross-correlation model proposed by Colburn (1977), which was used to interpret the results of various experiments, such as MLD experiments. Dominitz and Colburn (1976) proved that a normalized cross- correlation model gives the same prediction as an EC model when signal-to—noise ratio is small. Although Colburn’s model also makes use of cross-correlation function like the CAP model, Colburn’s model looks for dips instead of peaks to detect binaural signal because binaural system is most sensitive to deviations from cross-correlation of 1 (Gabriel and Colburn, 1981). Thus for HP—, to detect the dip at the boundary frequency, no internal delay is needed; whereas for HP+, internal delay is required (Figure 1.2). Therefore, assuming the binaural system discounts long delay neurons, Colburn’s model also predicts that the pitch strength of HP— is stronger than HP+, agreeing with the prediction of the EC model and the experimental results in this chapter. 26 1.11.4 The exponential CAP model Hartmann and Zhang (2003) introduced a modification to the CAP model in their appendix, exponentially rectifying the inputs from the left— and right-ears to the central processor. This modification nonlinearly transforms the prediction of the CAP model to a modified Bessel function [0 as in Equation 1.1. It is possible that the sharpness of the peaks in this modified CAP model explains the experimental results. w. T) = 10 (fire) + gar) + 2 a 9,, COSICDRff) — am — 27ml) (1.1) where '7( f, 'r) is modified prediction, f is frequency, T is interaural delay, 45L and a}; are phases in the left and right ears, respectively, and gL(T) and 93(7) are synchrony coefficients for the left and right ears, respectively. For HP+, the CAP model needs no internal delay line to detect it. Therefore with T = 0, 9L and 93 should be approximately the same. By contrast, for HP—, ’7’ 74 0, thus 9;, yé 93. Given that Colburn (1973) and Stern and Colburn (1978) considered the synchrony coefficient g(r) no greater than m, 7( f, T) with various pairs of gL and 93 less than 5 were performed. The calculation showed that peaks for 9;, 75 93 are never sharper than the peaks for 91, = 93. Further assuming the sharper the peak is, the stronger the pitch is, this modified CAP model still predicts that HP+ would be stronger than HP—, in conflict with the experimental results in this chapter. 1.12 Conclusions To test models of dichotic pitch perception, five experiments were performed on detection and discrimination of Huggins pitch and binaural coherence edge pitch (BICEP) at low boundary frequencies. The experiments showed that at the same boundary frequency, the pitch strength of Huggins pitch was in the order of HP— > 27 HPQ > HP+, and the pitch strength of BICEP was in the order of BICEP-0 > BICEP-90 > BICEP-180. These results supported the equalization—cancellation (EC) model, and similar models including the cross-correlation model (Colburn, 1977), the modified EC model (Culling et al., 1998a), and the reconstruction—comparison model (Akeroyd and Summerfield, 2000). The results does not agree with the predictions of the central activity pattern (CAP) model, although the CAP model gives predictions on lateralization that were qualitatively verified by our listeners. These qualitative results on lateralization actu- ally motivated the experiments in the next chapter. Nor did the experimental results in this chapter favor Green’s modification on the EC model, which predicts that HPQ and BICEP-90 are the hardest to detect. In summary, all of the five experiments in this chapter on pitch strength of the dichotic pitch stimuli result in the same conclusion in favor of the EC model and other similar models. 28 Chapter 2 Binaural models and the lateralization of dichotic pitches 2.1 Introduction In the previous chapter, experiments were performed on pitch strength for two dichotic pitch stimuli, Huggins pitch and binaural coherence edge pitch (BICEP), to compare various binaural models for pitch formation. The results support the equalization-cancellation (EC) model, and disfavor the central activity pattern (CAP) model. Besides the pitch strength, the CAP model also gives predictions on lateralization of the dichotic pitch. In informal testing, our listeners lateralized Huggins pitch in the same way as the CAP model predicts, at least qualitatively. This result implied that the CAP model might reveal the correct mechanism for lateralization. To explore quantitatively, experiments were performed on lateralization of Huggins pitch, one of the two dichotic pitches used in the previous chapter. 29 2.2 Huggins pitch stimulus Huggins pitch, as introduced in Chapter 1, contains broad-band white noise in each ear channel. The signals in two ears have identical amplitude spectra, but different phase spectra. Outside a small “phase boundary region” (as defined in Chapter 1) in the frequency domain, the interaural phase difference (IPD) is fixed at a “background phase” of $0 (as defined in Chapter 1). Inside the phase boundary region, the IPD increases linearly from $0 to (pa + 360°, with increasing frequency. Later in this chapter, another version of Huggins pitch will be introduced. To distinguish it from that stimulus, the Huggins pitch introduced in this paragraph is noted as “Huggins pitch with linear-phase boundary”. In Chapter 1, three phase configurations, namely HP—, HPQ and HP+, were used. In the experiments in this chapter, only two of them were used. They were HP- and HP+, corresponding to the background phases of 0° and 180°, respectively (Figure 2.1). 2.3 Binaural models for the Huggins pitch The CAP model explicitly predicts the lateralization of the binaural stimuli, in- cluding Huggins pitch. According to the CAP model, the binaural system adds the signals at two ears in frequency channels with various interaural delays, and the result is a 3-D map of central activity pattern, in terms of frequency and interaural delay. As Hartmann and Zhang (2003) have proved, the CAP prediction is equivalent to a preeminent model by Jeffress (1948), which calculates the binaural cross-correlation in the plane of frequency and interaural time difference (ITD). Figure 2.2 shows the calculated results for the case of Huggins pitch. The vertical axis is frequency, and the horizontal axis is ITD. The bright bands in Figure 2.2 represent high correlation, corresponding to strong neuron excitation in the CAP model; and the dark bands rep- 30 360' .4 270 . HP- . 9- . 180 90* WR- fb frequency 90 HP+ . (b) ’ fb frequency Figure 2.1: Interaural phase of Huggins pitch stimuli with linear—phase boundary resent low correlation, corresponding to weak excitation in the CAP model. Assuming that the peak at the boundary frequency (marked with ovals) gives the perception of Huggins pitch, and that the corresponding ITD gives the laterality of the pitch, the CAP model predicts that HP+ should always be lateralized in the center, whereas HP R should be lateralized in one side (either left or right). Moreover, as boundary frequency varies, the lateral position of HP— should follow the dark curves on either side, given by a hyperbolic function as in Equation 2.1. T = 1/(2fb) (21) Where T is the corresponding ITD, and fb is the boundary frequency. Other models, such as the equalization-cancellation (EC) model, do not give ex- phcit predictions on lateralization. Therefore the main purpose of this chapter is to 31 Frequency (Hz) Internal interaural delay I (ms) FiE§11re 2.2: Central activity pattern for Huggins pitch stimuli with linear boundary 32 test whether the listeners lateralize Huggins pitch according to the hyperbolic curve, as predicted by the CAP model. The other models will be discussed as well at the end of this chapter. The lateralization of Huggins pitch has been examined before by other groups (Raatgever and Bilsen, 1986; Grange and Trahiotis, 1996; Akeroyd and Summerfield, 2000). The experiments introduced in this chapter used numerical estimates instead of an acoustical pointer, and were in more detail and with a wider frequency range, so that the results could be compared with the quantitative predictions by the CAP model. 2.4 Experiment 1: Lateralization of Huggins pitch with linear-phase boundary 2 - 4. 1 Method 5 t imulus The stimuli in this experiment were generated in the same way as in Chapter 1. The two-channel stimuli of Huggins pitch with linear-phase boundary were presented to listeners through headphones. The signal in the left ear was broad-band white noise With 16384 components of equal amplitude and random phase. The signal in the right ear had identical amplitude spectrum as in the left ear, however its phase spectrum Was Calculated from the phase spectrum in the left ear as follows: outside the phase boundary region, the IPD was fixed at the background phase of $0; inside the phase bOLllldary region, the IPD incremented from dig to $0 + 360°, as frequency increased. Became 450 is the same as gbo + 360°, the IPD varied continuously over the entire freclllency range. The phase boundary width was 6% of the boundary frequency, "9' the center frequency of the phase boundary region. In addition, a frequency- 33 independent interaural difference were sometimes added, as will be discussed in the procedure section. An array processor on the computer (Tucker Davis AP2) calculated the spectra, and converted them into audio signals by 16—bi't DACs (Tucker Davis DD1). The sample-rate per channel was 20 ksps (kilo-samples per second). With a continuously- cycled memory-buffer of length 32768 words, the period was 1.6384 seconds, and the frequency-spacing between adjacent components was 0.61 Hz. Only one period was presented to the listener, with a duration of 1.6384 seconds. The maximum frequency of the broad—band noise as converted was 10 kHz, and the output signal was low- pass filtered at 5 kHz with Stanford SR640 filters at a rate of -—115 dB /octave. The accuracy of the phase spectra in two ear-channels was tested by adding or subtracting the stimuli and displaying the result on a spectrum analyzer. The cancellation was gOOd to at least 40 dB. Listeners Five listeners were in this experiment, and they were C (female, age 61), L (female, age 29), W (male, age 62), X (male, age 27), and Z (male, age 28). All listeners had normal hearing except W who had a mild bilateral hearing loss above 8 kHz, typical 0f males of this age. Listeners W, X and Z had considerable experience in dichotic listening. Listeners C and L had little or no previous experience. All listeners were r igllt-handed. Pro Cedure On each trial, the listener heard a single interval of Huggins pitch stimulus with llnea-I‘~phase boundary, and the listener could listen as many times as desired. The listener’s task was to assign a number (from —40 to +40, corresponding to extreme 1 . . . . eft, and extreme right, respectively) to the lateral posrtion of the Huggins pitch (not 34 the background noise). It was an absolute estimation task. Because there is intrinsic association between the boundary frequency and the lateralization, with more and more experience, the listener might learn to judge lat- eralization based on frequency cues. In order to prevent the listener from using such cues, five delays were added randomly to the right channel from trial to trial. To be distinguished from ITD, the five delays are called frequency-independent interaural time difference (FIITD), and the term ITD is used to refer to the total interaural time difference, i.e. F IITD plus the IPD-equivalent delay introduced by Huggins pitch stimulus. The five FIITDs were set to be —1000, -500, 0, +500 and +1000 ,us. An FIITD was applied by adding a linearly-increasing phase shift to all frequency components, and a negative delay was added with a linearly-decreasing phase shift. Five boundary frequencies, i.e. 200, 315, 500, 700 and 1000 Hz, were used in this eXperiment, and they covered the frequency range where Huggins pitch is reliable. Each experimental run contained 25 trials, one trial with each of the five boundary frequencies and with each of the five FIITDs. On each trial, the phases of the com- POnents were randomized, and the boundary frequency was randomly varied by an a111()unt within i5% with rectangular distribution. The order of the trials in each min was scrambled. Every run used a fixed type of stimulus, HP+ or HP-. The listener did runs with HP+ and HP— alternately. Totally, each listener did ten runs for each type of stimulus. There was no feedback in this experiment. The level of the Stimuli was 65 dB, making the spectrum level 28 dB re 10~12 Watt/(m2 - Hz). The listener heard the stimuli via Sennheiser HD 520-II headphones while seated in a C1011 ble—walled sound-treated room. 2 '4 - 2 Results A good way to present the results is a scatter plot, as shown in Figure 2.3 through Fi r . . . . gun“- 2.12, wlnch shows every data pomt With each configuration. On each figure, 35 there are five panels corresponding to five FIITDs. The independent variable, the boundary frequency, is presented as the vertical axis. The dependent variable, the lateralization judgement from —40 to +40, is presented as the horizontal axis. This unusual presentation was employed because the lateralization judgement represents the position on left and right, thus it is more intuitive to present them on the hori- zontal axis and easier to compare with the CAP prediction in Figure 2.2. The solid curves or lines are predictions by the CAP model. To plot the predictions on a scatter plot with the data, three assumptions were made: 1. Listener’s judgement was linear with ITD. 2. The judgement of +40 corresponds to +2500 as. An ITD of +2500 ps was the largest ITD in the experiments, which occurred for HP— at a boundary frequency of 200 Hz. This assumption was based on the postulation that lis- tener gained experience of giving large numbers to the largest ITD over the experimental runs. 3- According to the principle of centrality (Jeffress, 1972; Hafter and DeMaio, 1975; Stem, et al., 1988), listeners prefer the image closer to the midline to aliases with greater laterality. The triangles on the figures are alias points, separated from the solid curves by multiples of 360°. They are the predictions ignoring the centrality assumption (the thi rd assumption above). Lateralization for HP+ I‘Tigures 2.3 through 2.7 show the results for HP+ stimuli with linear-phase bound- ary- The CAP predictions (solid curves) were straight lines for 0, +500 us, and for b01111dary frequencies below 500 Hz for +1000 #5, because for HP+, at boundary frequency, the IPD was 0°, hence the ITD was just FIIT D, which was constant on 36 1000 _ 700 ~ 500 ’ 300 ’ 200 1000 _ 700 - 500 r 300 Frequency (Hz) 200 1000 _ 700-- 500 - 300 * 200 _. .Lq...:..:..;q...:..:..EL..:(DQ.OOO ....... «l- . ......... hm - L LL L L A L I I l l A L L . I A A l A A P1 --40 -20 O 20 40-40 -20 0 20 40 Lateralization Judgment (fixed scale) Figure 2.3: Lateralization of HP+ (linear—phase) by listener C 37 1000 .31. ..... 'ki...'...' .......... ‘ 2.166.808}.I ) g 700 _ ,,,,, 0......u...”.....g...-«O~‘ 60'8” '16 ' 500 . 6.....006 ........... 9 ........ 0”“"6'm0 ‘_ 300 6681: -------------------------- . 200 ' 29'99'1'": ° ‘ 1000 - 3--3*'-«T-c'>b..'..g ' 5.1.35.3...'*T..T . ' us l g 700 [ """"""" ”mo-8 ....O. ----- ...o. ..... . 3 500 - "no ---------- oe mama ..... ...... C ‘” l D g .................. 0.6, ....6....o....0....6 L 300 r , u. 200 -9-:---.---;-o---<.> ------------- .- ------ . ---.--<.>§ 1000 L.L..'...i..'..!...'...".i..g...'.,3.68.2956 700 f .. ------------- A ----------- 0°"66’8 J“ 500 ' """"" h -------------- é ........... 8"”860 4. 300 _ ----------- mesa..- 200 Lilo . . . . . . PL. . -40 -20- 0 20 40-40 -20 0 20 40 Lateralization Judgment (fixed scale) Figure 2.4: Lateralization of HP+ (linear-phase) by listener L 38 1000 500 ’ 300 * 200 1000 300 Frequency (Hz) 200 1000 700 - 500 r 300 - 200 t 700 700 * 500 ' ....................... $.39...) 20‘ ‘ ‘40 -4b‘ de‘ o Lateralization Judgment (fixed scale) Figure 2.5: Lateralization of HP+ (linear-phase) by listener W 39 v v v v v v I y v v t ‘ T T 1000 if. °°°°°°° A ------------- Q ---e-680u..o.‘_ . 1. 700 ’ ---- t- ------------- (QWGO‘O'. J 500 : A: ------------- 6‘85)" .............. ‘ ,,,,,,,,,, h 300 _ ............... o o ......................... ,_ ......... 081 200 ‘ "' ':"':"°:°":';"°°::":°O;L-U- I00:.0:.';°9{?W;-o:ooo‘ooo:ooin..:.:..:..; .1 moo ..... . ........ ...... i 300 > o ...................... ‘ Frequency (Hz) zoo : . _ . . 1000 _ r ' ' 700-. 500 ’ 300 ' ............... .. loctgool4 200 .‘1 .1 ..11 ir‘iLL. -40 -20 0 20 40 -40 -20 0 20 40 Lateralization Judgment (fixed scale) Figure 2.6: Lateralization of HP+ (linear-phase) by listener X 40 1 000 700 - - 500 300 200 1000 _ 700 500 ’ 300 * Frequency (Hz) 200 1000 700*- 500 300 200 .. ........o?9 ‘ ....... .. .HL....4.O.O. . 8.9.0..“ d 44 -40 -2o 0 20 ‘40 -40 -2o 0 20 4o Lateralization Judgment (fixed scale) Figure 2.7: Lateralization of HP+ (linear-phase) by listener Z 41 each panel in the figures. For boundary frequencies above 500 Hz for +1000 [18, the points corresponding to FIITD, i.e. the points on the solid vertical line, were only marked as alias points (solid triangles), because the predicted position on the solid curves were closer to the center of the head and therefore were preferred due to the principle of centrality. In general, the results agreed with expectation. For zero F IITD, listeners’ judge- ments were about the center. For finite FIITDs (+500 and +1000 us), listeners’ judgements were offset to the side as predicted, and arranged in vertical lines indi- cating little dependence on boundary frequency, especially for boundary frequencies below 500 Hz. However, for finite FIITDs (+500 and +1000 us), there are some discrepancies between listeners’ judgements and the predictions by the CAP model. 1 . For boundary frequencies above 500 Hz, each listener tended to favor the alias points on one side based on personal preference. This phenomenon will be discussed later on the topic of “earedness”. 2 . The data points that formed in vertical lines, especially with boundary frequen- cies below 500 Hz, tended to lie on the outside of the predicted lines. This phenomenon can be explained by the following conjecture. In each run with HP+, the maximum IT D was only +1000 as, smaller than the maximum IT D of +2500 us in HP— runs, which was used to predict the judgement of +40. However, listeners tended to forget the scale established in the HP— runs, and to give large numbers for the maximum ITDs in each run. LateI‘alization for HP- The results for HP— stimuli with linear-phase boundary are shown in Figures 2'8 through 2.12. The dashed curves on the figures correspond to IPD of +180°. If the principle of centrality holds, the data points should all fall between these dashed 42 Frequency (Hz) 1000 _ ' ...: ...... . ............. 700 - 500 ' . 300 . zoo . ,5. ......................................... 1000 700 500 . -- 300 200 1000 .L: ...... . ........ ; :1000 us 700 ............. 500 ' b .... ........ 300 - 0 6'00 200 k; l A 1 L . . l Lm‘ l ! A l J L —4o -20” o 20 “40-46 326 A ‘0 ‘20” 4o Lateralization Judgment (fixed scale) Figure 2.8: Lateralization of HP— (linear-phase) by listener C 43 F requency (Hz) 1000 ' ..‘.". ..... .1..:..:. .1..:...,p (>6- ’ j-iooo us, g . 866' . 700 > 500 ' 300 ’ 200 . 1000 700 * 500 ~ 300 200 1000 _ 700 ’ 500 300 - 200 ‘20. ‘ 40 -40‘ -2o Lateralization Judgment (fixed scale) Figure 2.9: Lateralization of HP— (linear-phase) by listener L 44 1000 700 r 500 300 200 1000 _ 700- -- 500 . -- 300 Frequency (Hz) 200 . 1000 _ 700 500 ’ 300 ’ 200 Lateralization Judgment (fixed scale) Figure 2.10: Lateralization of HP-— (linear-phase) by listener W 45 Frequency (Hz) 1000 ' ' ' .:.,.'...'. ........ ' . ..... ,. .1..;....4'..5 "-1000 us g . . j ' : 700 ................. ....0...£. 60.. .. : a I i : \ I, g 500 ’ O ---------- A ------ t ........ it ..... 3 I . 4) [I I I, ‘ 300. .........,. ............... .o 033. ........ , . I” ' I . 5 / a 200 : : l : . . ‘ ‘ ‘ :-:--\# 4- {urn:u:“;..1“...":".": ..... 99 1000 700- 500 300 200 1000 700 500 300- § 5 \ .1 ............ 08.9.8 .1; . ALkLLAAl 200 20 40 -4o -20 o 20 4o Lateralization Judgment (fixed scale) Figure 2.11: Lateralization of HP— (linear-phase) by listener X 46 Frequency (Hz) 1000 700 500- 300 > 200 1000 ’0 700- 500- 300- 200 1000: 700- 500- 300* 200 Lateralization Judgment (fixed scale) Figure 2.12: Lateralization of HP— (linear-phase) by listener Z 47 curves. The solid curves are predictions by the CAP model. For HP— with zero FIITD, the IPD is +180° at the boundary frequency, i.e. the center frequency of the phase boundary region. Therefore for any FIITD, the predicted judgement on lateralization, i.e. the solid curves on the figures. is given in Equation 2.2. 40 1 Judgement : W ' (FIITD i 5%) (2.2) where fb is the boundary frequency; 2.5 X 10‘3 is the largest ITD in the experiments (2500 us) in units of seconds; and the + sign is chosen to minimize the absolute value of the judgement, as a result of the principle of centrality. For finite FIITDs (+500 and +1000 as), the data by listeners W, X and Z followed the prediction of the CAP model fairly well. For listeners C and L, the data fell outside most of the time. Expanding the judgement by a constant expanding scale factor, i-e. an ITD less than +2500 ,us corresponding to the lateralization judgement of +40, the results by listeners C and L can roughly fit the prediction as well. For zero FIITD, however, the listeners’ judgements (except for listener W with boundary frequencies of 700 and 1000 Hz) did not show strong dependence on bound- ary frequency, and formed approximately vertical lines. This result was quite different from the predicted curves by the CAP model. This discrepancy was significant be- cause the original motivation to all the experiments in this chapter was to test this COndition. To examine this special condition in more detail, the data points with Zero FIITD were plotted on logarithmic scale in Figure 2.13. As mentioned before, each listener had his/ her own personal preference of hearing HP— with zero FIITD on the left or right side. To compare among the listeners and avoid personal prefer- ences’ the vertical axis is the mean of the absolute value of lateralization judgements. The hOrizontal axis is boundary frequency. According to the prediction by the CAP model (Equation 2.2), for zero FIITD, the lateralization judgements should be in— 48 Lateralization Judgment (fixed scale) Figure 2.13: Lateralization of HP— (linear-phase) with zero FIITD. The solid lines are the best-fit straight lines. The dashed lines show a slope of —1 expected from the P model. The error-bars are +1 standard deviation. CA 40’ 20’ 10- A _A L , 260 360 260 360 A 560‘760 {duo Frequency (Hz) 49 ’ 560’760 1000 versely proportional to boundary frequency. Hence on the log-log plots (Figure 2.13), the slope should be — 1, which is indicated by the dashed lines. The solid lines are the best linear fit to the data. Surprisingly, except for listener W, all the other listeners’ best-fit lines are much less steep than the predicted dashed lines. Visual inspection of the graphs of finite FIITDs suggests that the data of HP— roughly showed frequency dependence predicted by the CAP model; for zero FIITD, the data showed much less dependence on boundary frequency than the prediction by the CAP model. Statistical evidence will be given in Section 2.7.2. Earedness For HP —— with zero F IITD in Figures 2.8 through 2.12, the predicted solid curves by the CAP model are on both sides. Listeners did not average the two sides and respond the center, but in reverse, responded with one side or the other. Interestingly, listeners did not respond equally on left- and right-side, but instead, chose a preferred side fairly consistently. For instance, listener X always responded on the right (50 out of 50). Listeners L and Z responded mostly on the right (92% of the trials for listener L and 88% of the trials for listener Z). Listener C responded on both left- and right-side, but preferred the right-side (70% of the trials). As discussed before, listener W’s results for boundary frequencies of 700 and 1000 Hz were anomalous. considering only the lowest three boundary frequencies, i.e. 200, 315 and 500 Hz, listener W always responded on the left-side (30 out of 30). In general, listeners C, L, X and Z tended to be right-eared listeners, and listener W tended to be a left-eared listeller. Besides HP— with zero FIITD, there are other conditions with ambiguous predic— tion by the CAP model. For HP+, these ambiguous points include 500—Hz boundary With +1000—us FIITD, and 1000-Hz boundary with +500-us FIITD. For HP—, these arnbiguous points include 1000-Hz boundary with +1000-us FIITD, and zero FIITD 50 listener l C L W X Z left 22 6 61 0 23 right 88 103 49 109 80 Table 2.1: Judgement of ambiguous points for Huggins pitch (linear—phase) as discussed in the last paragraph. With ten repeated measures for each condition, there are totally 110 ambiguous points on each scatter plot. Table 2.1 lists the num- ber of ambiguous points that each listener responded on left or right. The numbers on left and right should usually add up to 110, the total number of ambiguous points. However, sometimes they add up to less than 110, which is because the listener re- sponded zero (the center) to some of the ambiguous points, and those responses were not counted for either left or right. In Table 2.1, listeners C, L, X, Z showed preference towards the right side, and among them, listeners L and X showed strong preference. Furthermore, listeners C, L and X were so right eared that, for HP+ With F IITD of —500 ,us, they chose alias points (solid triangles) on the right—side instead of following the principle of centrality and responding the center. On the other hand, listener W had some preference towards the left side. As discussed in the last paragraph, listener W had strong preference towards the left side for low boundary frequencies. Because of these individual preferences on hearing Huggins pitches on one side, a Concept of “earedness” can be introduced, analogous to “handedness”. However, no 001‘ relation has been found between earedness and handedness. D ls(:ussion Experiments on lateralization of Huggins pitch stimuli with linear-phase boundary Show that except for one special condition, i.e. HP— with zero F IITD, the CAP model r011ghly predicts the tendency of listener’s judgement as boundary frequency and FIITD varied. For example, for HP— with finite FIITDs, as frequency increases, the a“’eraged laterality varied along with the predicted curves, except for some alias points 51 on the other side. For HP+, mostly listeners’ responses tended to form straight lines as predicted; for HP+ above 500 Hz with FIITD of +1000 us, listeners’ responses also followed the predicted curve due to centrality, except that, when the prediction was on an unfavorable side, listeners tended to choose the alias point on the preferred side. By contrast, for HP— with zero FIITD, listeners’ judgement had little dependence on boundary frequency, and the slope on the log-log plot was much less steep than —1, the prediction by the CAP model. Besides having linear-phase boundary, other methods generating the Huggins pitch have been used before. When Cramer and Huggins first introduced Huggins pitch (1958), they used an all—pass filter to generate the phase-boundary region. In their stimuli, the phases within the phase boundary region increased from 0° to 360° mono- tonically, but not linearly. Quite differently, Akeroyd and Summerfield (2000) gener- ated an interaurally-decorrelated band as phase—boundary. On the other hand, Yost (1991) generated the Huggins pitch by applying a fixed interaural phase shift within the phase boundary region. Similar to Yost’s method, we also generated Huggins pitch using a phase boundary with a fixed interaural phase difference of 180°, named as “stepped-phase boundary” in the following text. The auxiliary experiments in Section 2.9.1 study lateralizing the narrow-band StiInuli of the phase boundary region by itself (without the background noise) with different delays. Results of those experiments showed that the results for the stepped- phase boundary agreed with the prediction by the CAP model, at least qualitatively; Whereas the results for the linear-phase boundary were mostly opposite to the predic— tiOn. Furthermore, the results for the linear-phase boundary were also less consistent (with larger standard deviation) than the stepped-phase boundary (with zero stan- dard deviation). It is not hard to imagine why the linear-phase boundary could be less consistent to lateralize: Within the linear-phase boundary, the interaural I)llases varied by 360°, a range covering all the possible values for phases. Therefore 52 the frequency components within the linear-phase boundary carry various ITD cues, without giving the listener a clear, single cue for lateralization. Thus the fact that, when presented with linear—phase boundary, listeners responded differently from the prediction of the CAP model can be explained as that. when hearing a diffuse image of linear-phase boundary, the listeners’ earedness dominated the performance. It is possible that the multiple cues lateralizing the linear-phase boundary causes the failing of the CAP model in Experiment 1. This motivated Experiment 2 with stepped-phase boundary. 2.5 Experiment 2: Lateralization of Huggins pitch with stepped-phase boundary Experiment 2 employed a different type of Huggins pitch stimulus, namely Huggins pitch with stepped—phase boundary. In contrast to the linear-phase boundary in EXperiment 1, the stepped-phase boundary can easily be lateralized by itself, i.e. Wi thout the noise background, determined by the IPD. The goal of this experiment Was to test whether listeners’ lateralization judgement with this new stimulus would follow the predictions by the CAP model. 2 - 5.1 Method Experiment 2 was identical to Experiment 1 except that the linear-phase boundary in the Huggins pitch stimulus was replaced with a stepped-phase boundary. Within the stepped-phase boundary region, the IPD was fixed at 180° with respect to the ba-Ql 700 ' 500* 300"" é o~amno ------ Luedggr, 200- L 0 ....... W. .. quz .. A l A A A 1 A l A A A -40 420’ 0 20 ’40-407-20’ 0 20 4o Lateralization Judgment (fixed scale) Figure 2.25: Lateralization of sin-0 (analogous to HP+) by listener W 68 Frequency (Hz) 1000 ’ 700 * 500 - 300 200 1000 _ 700 . 500 r 300 200 1000 700 - 500 * 300 t .......................... . .................... 0 .‘ ................ r ...... (woe ............ ‘ 20° . . . L- . . "L. . . . . . .9?”:€*°':'":":"” -40 -20 0 20 40-40 -20 O 20 40 Lateralization Judgment (fixed scale) Figure 2.26: Lateralization of sin-0 (analogous to HP+) by listener X 69 Frequency (Hz) 1000_-' 700* 500* Q . 300 """ ° """" 8 a - 200- ° 9 E99 . . m 1000 ). ...... ........ ‘. ........... ‘ ........ *.... . .0 I13 . 700 .......... * ............... g. ..... * .......... J 500 e ‘ (A) O O v 8 200 ' """" . "':";":'9":'Z'1'8'r9'T'r'r' ‘ . _._ 6' . . 1000 -i.f.<‘,..'..1..:.' '. ..... . . ........ i._:..:,.'...'...:..'. ...... 53,: ...... *...:- :500 us 6’ 8 p :1000 us '9 700 t ; ; g 8 ..... flung .................... ‘ ............. ‘ . 500 ~ ; g g a t ------------ e- o ---------- owe 48 ----- .L Q . a a 300» a 8 ' a a 200%99E8 ’40 ‘20 0 20 40 ‘40 "20 0 20 40 Figure 2.27: Lateralization Judgment (fixed scale) Lateralization of sin-0 (analogous to HP+) by listener Z 70 Frequency (Hz) 1000 _ 700 - 500- 300 200 1000 700 500* 300 200 1000 _ 700 ' 500* 300 . I o I b , e ' =’ -40 ’-20 0 20 40 -40 -20 0 20 40 Lateralization Judgment (fixed scale) Figure 2.28: Lateralization of sin-1r (analogous to HP—) by listener C 71 Frequency (Hz) 1000 _ 700 - 500 300 200 1000 _ ' 700 500 - 300 * 200 1000 700 ' 500 * 300 200 \ ‘ I..o\ ‘ L A A A n I A n A I A A A l -40 -20’ 0 20 40-40 Lateralization Judgment (fixed scale) Figure 2.29: Lateralization of sin-7r (analogous to HP—) by listener W 72 1000 700 500 - 300 200 . 1000 _ 300 * Frequency (Hz) 200 . . 1000 _ 700 * 500 * 300 * 200 A E A A A —40’ L20 ’ ’0’ ’ ’20’ ’ ’40-40’l20' 0 20’ ’ ’40 Lateralization Judgment (fixed scale) Figure 2.30: Lateralization of sin-1r (analogous to HP—) by listener X 73 ’40 . .20. . 420 ' o... .53. us :0 . when ’40 —40 ’20 '20 1000 1000 +3 o 0 0 3 0 00 7 ANIV secondment. 1000 _ . 700 ’ 0 -40 Lateralization Judgment (fixed scale) Figure 2.31: Lateralization of sin-7r (analogous to HP—) by listener Z 74 Frequency (Hz) 1000_ 700 500 300 * 200 1000 _ 500 * 300 200 1000. 700 500 * 300 * 200 700 I i \ I .. I .......... 00m? A . I ......... -40 -20 0 ’20 ’40-40 —20"0”’20' 40 Lateralization Judgment (fixed scale) Figure 2.30: Lateralization of sin-7r (analogous to HP—) by listener X 73 Frequency (Hz) 1000 I ' ' '. .'.. ..'..... .....'...".'...3- Lind." 700 .. -..“, ........ ..... a...” ........... 500. \ .. ....i ........ #0.... 300. . 0 ......... ,. ............... 200 , , . . . . . . . . ':' J" 1000 , . . . . . . '0 700 - 500 * 300 * 200 1000 _ he - us 700. ........ . ... ... .. .. ............. .... 500 . . ‘ ................ i ........ {noun 300 - 200 g QALA+A1 A A A l A A A l 40—‘20 —2o 0 20 40 Lateralization Judgment (fixed scale) Figure 2.31: Lateralization of sin-w (analogous to HP—) by listener Z 74 2. For the stimulus of sin-0, listeners C, W and Z demonstrated slightly less varia- tion for the data points on the scatter plots, and there were fewer cases that the listeners chose alias points (solid triagles) on the preferred side instead of the predicted position due to centrality (solid curves), compared to Experiments 1 and 2. Lateralization for sin-7r with O-FIITD The open symbols in Figure 2.32 are results for the special case of sin-7r (analogous to HP—) with zero FIITD on logarithmic scale. The solid lines are best linear fits for the data. The solid symbols without error—bars and the dotted best fit lines are results for linear-phase boundary from Figure 2.13 for comparison. As discussed in the section on stepped-phase boundary, the results of stepped-phase boundary and linear-phase boundary are very similar. Therefore, to make Figure 2.32 easier to read, only results for linear-phase boundary are included. By comparing the open symbols and the solid lines with the solid symbols and the dotted lines, it can be observed that although the results for listeners W and X were offset downwards, the slope of each listener was almost identical to the slope for linear-phase boundary (as well as stepped-phase boundary). Especially, it was a little surprising to find that, with all the responses around the center, the pattern and slope for listener X still resemble his results with Huggins pitch stimuli. Parallel to the calculations for Experiment 2, correlation coefficients between Hug- gins pitch stimuli with linear-phase boundaries and sine-tones were calculated for the data-points in Figure 2.32 as in Equation 2.4, and Table 2.4 shows the results. The correlation coefficient, cc. is 2?: Lil? _ Ehp Lzsn _ Esn \/ [2:11 (ng — WY] [2:1 (Le — 1w] Lateralization Judgment (fixed scale) 200 300 ’500’700’ 1000 200 300 ’500’70070’00 Frequency (Hz) Figure 2.32: Lateralization of sin-7r (analogous to HP—) and HP— with zero FIITD. Open symbol: sin—7r. Solid symbol: HP— with linear-phase boundary. The solid lines are the best-fit straight lines. The dashed lines show a slope of -1 expected from the CAP model. The error-bars are +1 standard deviation. 76 where 2' is the index for the five dat a-points on the log-log plot for each phase boundary, U” is the averaged laterality for Huggins pitch stimuli with either linear-phase or stepped—phase boundaries, L3" is the averaged laterality for sine-tones, and Z is the mean of L,. listener J C W X Z sine vs. linear 0.95 0.92 0.98 0.91 sine vs. stepped 0.69 0.76 0.86 0.78 linear vs. stepped 0.67 0.88 0.90 0.81 Table 2.4: Correlation coefficients comparing sin-7r with HP— with 0—FIITD. The last row was copied from Table 2.2. In Table 2.4, the correlation coefficients between the sine-tones and the Huggins pitch stimuli with linear-phase boundary were very high (all above 0.9, with an average of 0.94 for the four listeners), and the correlation coefficients between the sine—tones and the Huggins pitch stimuli with stepped-phase boundary were fairly high (all above 0.7, with an average of 0.77 for the four listeners). These high correlation coefficients further confirmed the similarity in Figure 2.32 between the lateral responses for sin- 7r and HP— with 0-FIITD. This result suggests that there might be some common mechanism in lateralizing a sine-tone and the boundary frequency of Huggins pitch stimulus. Paradox on correlation coefficients There are two problems posed by Table 2.4. First, although the correlation co- efficients are in general large, those comparing sine-tones with Huggins pitch with stepped-phase boundary are always smaller than those comparing sine—tones with Huggins pitch with linear-phase boundary. Because the stepped-phase boundary can be lateralized by itself whereas the linear-phase boundary cannot, one might predict the reverse. Second, one would expect that there is more similarity between the two types of Huggins pitch than between Huggins pitch and sine-tones. However, Table 77 2.4 shows that this statement is not always true. All listeners had larger correlation coefficients, demonstrating more similarity, for linear-phase boundary vs. sine-tones than for linear-phase boundary vs. stepped-phase boundary. It seems that the explanation for these paradoxes should be different for differ- ent listeners. For listener Z, because he did runs with stepped-phase boundary four years after the other two stimuli (linear-phase boundary and sine-tones), his data for stepped-phase boundary were less similar than his data for the other two stimuli, which led to smaller correlation coefficients when comparing with the stepped-phase boundary (the middle and bottom rows in Table 2.4), relative to the correlation coeffi- cients between the linear-phase boundary and the sine-tone (the top row in Table 2.4). For listener W, his responses at 700 Hz with 0—FIITD for HP— with stepped-phase boundary seemed to be an outlier, and made his data for stepped-phase boundary different from his data for the other two stimuli, which led to smaller correlation coefficients when comparing with the stepped-phase boundary. For listeners C and X, their results demonstrated very little frequency dependence (shown as slopes very close to zero in Figures 2.13, 2.23 and 2.32). Because the calculation depends on the assumption of linear dependence on frequency (Equation 2.5), the little frequency dependence (i.e. a and a close to zero) made the resulting correlation coefficients E 61 y - E F. t,' — . . m and i-z—). A Similar a (Equation 2.6) very sensitive to the error terms ( calculation considering all data-points (i.e. not just the ones with 0—FIITD) is done in Section 2.7.1. Because there would be better frequency dependence after including the data-points with finite FIITDs, we expect that these paradoxes would not occur. This has actually been confirmed to be true by the correlation coefficients in Table 2.6. 1 Lin (1113i + b + 61",), (2.5) Lf‘ ax,- + B + 6,”. 78 2?. (Lin — Z"2) (Ls; — Z“) ,/ [2:1 (Lin - 2:102] [2,; (L9 - W] 2.5.. (0:.- — i) + (+4)] [(11 — i) + (TI—“)l {2:1]<-*z—i>+(@)ll{a=ll<*i-i>+(€“+*“>l’} cc.(ln, st) = where cc. is the correlation coefficient, 2' is the index for the five data-points on the log-log plot for each phase boundary, L’" and L“ are the averaged lateralities for Huggins pitch stimuli with linear-phase and stepped—phase boundaries, respectively, and the letters with bars are the means. The equation shows the correlation coefficient comparing linear-phase and stepped-phase boundaries. Those comparing sine—tones with Huggins pitch stimuli (either linear-phase boundary or stepped-phase boundary) are similar. ILD cues on sine-tones and Huggins pitch stimuli One difference between lateralizing a sine-tone and a Huggins pitch, however, is that, when applying an interaural level difference (ILD) cue, listeners found that the laterality of Huggins pitch hardly changed (Raatgever, 1980; Raatgever and Bilsen, 1986)? As is well known, the laterality of sine-tones varies as ILD changes. Based on the experimental finding that varying ILD cues influences the laterality of sine—tones but not Huggins pitch stimuli, the following text suggests a possible explanation to account for the different results for sine-tones and Huggins pitch stimuli in the lateralization experiments. If one considers Huggins pitch as a pure time image, 2Unlike Raatgever and Bilsen's findings, Grange and Trahiotis (1996) reported that the intracra- nial position of Huggins pitch can be substantially varied by ILD cues. Our informal testing, however, tended to agree with Raatgever and Bilsen, and showed that listeners did not sense the change in lateral position for Huggins pitch with various ILD cues. 79 and considers a sine-tone as a combination of time image and level image, then for a sine-tone, the auditory system would combine the time image with a level image which points to the center with the setup in the experiments (zero ILD), and give a synthetic single judgement on laterality (Whitworth and Jeffress, 1961). Therefore, due to the level image, the judgement with sine—tones was biased to the center, compared with the judgement with corresponding Huggins pitch stimuli. It is not hard to imagine that different listeners have different weighting coefficients on the time and level images. For example, it appears that listener X had a big bias toward the level image, while listener C paid almost no attention to the level image and gave almost identical results to the Huggins pitch stimuli (Figure 2.32). But no matter how much the level image influences the judgement, it is always in the center for all frequencies and F IITD values. Thus the variation of the laterality of the sine-tones depended on the time image only. Therefore the laterality of sine-tone would preserve some characteristic features, e.g. the pattern and the slope of the average data, of the laterality of the corresponding Huggins pitch. Earedness Similar to the scatter plots for Huggins pitch stimuli, on each scatter plot for sine-tones in Experiment 3, there are 110 ambiguous points for which the interaural phase was 7r. Thus a listener could lateralize the ambiguous points either to the left or to the right, due to personal preference on earedness. Table 2.5 shows the number of judgements for each side for the four listeners. Compared with the results for Huggins pitch stimuli (Tables 2.1 and 2.3), it appears that listener C maintained her bias towards the right, and listener W maintained his bias towards the left. Listener X, on the other hand, demonstrated less bias as a right-eared listener. As can be seen on the scatter plots, listener X’s lateral judgements for sine-tones were compressed to the center, compared with his judgements for Huggins pitch stimuli, which would 80 listener I C W X Z left 9 83 36 62 right 101 20 61 38 Table 2.5: Judgement of ambiguous points for sine-tone lead to less bias to the right. The previous section suggested a conjecture attributing listener X’s more-centered judgements to the influence of a level image of the sine- tones pointing to the center. Listener Z was unusual in that he surprisingly switched to a left-eared listener to some degree. However, the bias to the left for listener Z was weak, compared with a left-cared person such as listener W. In general, except for listener Z, the results on earedness for all the other three listeners on earedness agreed with their results found in Experiments 1 and 2. 2.7 Discussion 2.7.1 Summary of the experiments For each individual listener, the scatter plots from Experiments 1, 2 and 3 in this chapter showed impressive similarity, although there were differences among listeners. To summarize the results, Figures 2.33 and 2.34 show the average of each of the ten points on the scatter plots at each condition. Totally, for each listener, there were 25 conditions (5 boundary frequencies x 5 FIITDs). The open and filled circles and the “+” signs are results for Huggins pitch stimuli with linear-phase boundary, Huggins pitch stimuli with stepped—phase boundary, and sine-tones, respectively. In Figures 2.33 and 2.34, there are ambiguous points, whose IPD was it and for which the CAP model predicts the laterality on both sides. In data analysis in this chapter, absolute values were usually taken before calculating the means, in order to eliminate the cancellation between the responses on the left—side and those on the right-side, which made the mean close to the center meaningless. However, in order to show listeners’ 81 HP+: linear O, stepped O, sin-0 + -1000 us -500 500 1000 = FIITD 0 1000 HZ 700 0 500 e 315 a. 200 SZ—e—ego 9’" 0 aw i f l i. -40 -4o--0--40 40 -40--0--40 Lateral Judgment (fixed scale) Figure 2.33: Lateralization of HP+ and sin-0 for four listeners 82 HP-: linear O, stepped O. sin-1i + -1000 us -500 0 500 1000 = FIITD c5 4- 1000 Hz *" 700 500 , 315 x 200 W X Z ',' : ‘ .+ .' ' : 'Hs -40--0--4o -40--0--40 —40--0--40 -40--0--40 -40--0--40 Lateral Judgment (fixed scale) Figure 2.34: Lateralization of HP~ and sin—7r for four listeners 83 individual preferences on earedness, the data-points in Figures 2.33 and 2.34 are all means without taking any absolute value. By simply looking at the plots, it is clear that the results for Huggins pitch stim- uli with linear-phase boundary and stepped-phase boundary were almost identical (the open and filled circles overlapped), and they were very similar to the results for sine-tones as well. Visual inspection is probably the easiest and the most direct way to compare two plots and find out whether they have similar patterns. However, to evaluate the similarity quantitatively, one has to replace the mere observation with statistics. One way is to calculate for each listener the correlation coefficients between two scatter plots, or between two different symbols in Figures 2.33 and 2.34. On each scatter plot, there are 25 conditions (5 boundary frequencies x 5 FIITDs), corre- sponding to the 25 data-points for each symbol for each listener in Figure 2.33 and in Figure 2.34. For each of the 25 conditions, there were 10 repeated measurements as described in the method section for Experiments 1 through 3. Correlation coefficients were calculated based on the average of those 10 repeated measurements. Normally the mean was taken, except that, for the ambiguous points, the absolute value was taken before taking the mean, in order to eliminate the cancellation between the responses on left and right. The correlation coefficients between the linear—phase and stepped-phase bound- aries were calculated as in Equation 2.7. The correlation coefficients between the Huggins pitch stimuli (either linear-phase boundary or stepped-phase boundary) and the sine-tones were calculated similarly. The results were shown in Table 2.6. 2331 (143" - D") (L? - E“) (2.7) \/]z?:. I g—. C D R \ 0 DO \ \ I I; (A) O ‘20 -2o ,a’A o - ,. o-: x o 1 ‘30:. I” A0 O . z’l’A A Aw 0i lo ’ ”/ A A’ll : "‘4() f: is I (3 : I : V L ‘,’: q “() E 0 AA ox A” 1 -10F o A Ox’ 0 A019” 5; 30 I O,” , ’ A: —2o.- A oA . .120 ' x’ O o ,I’ .1 ’30 ' A’ 0A 1 ,z’ o 0 Z ' 10 AAA. A D \ \ \ \ AIM-A Response I A O 4 7% ‘r ‘i, . w A O O l .— O l""T"' D \ \ \ \ \ on no N O I on O ‘p \ p“ >0 ‘ o O O D - (>0 0 B O I A C r \ l- \ f O N O VTVV'VVVV'TVVV‘TVVV D \ \ \ l -180 A —90‘ o A 90 ‘ 180 Interaural phase difference (deg) Figure 2.40: Lateralization of stepped-phase Huggins pitch vs. IPD 98 Response 'vvv'vvvv vvvv'vvvvjvvvv'vvv I I an N O O N O Vv'vvvv'vvvv'VVV—V I y—. C """"' I w o v‘rv'rv’vvv vvvv A A A A l A A A A -180 90 ‘ 130 Interaural phase difference (deg) Figure 2.41: Lateralization of sine-tones vs. IPD 99 phase boundary, stepped—phase boundary and sine-tones) shows that the RMSES in Table 2.9 are significantly larger than those in Table 2.8 at the 0.05-level. This result can be confirmed by visually comparing Figures 2.39 through 2.41 with Figures 2.36 through 2.38. Therefore these comparisons show that the fitting with ITD as the variable is better than the fitting with IPD, indicating that listeners appeared to use ITD as a temporal cue, not IPD. A more thorough investigation of this issue constitutes the content of Chapter 3. The RC model Since the experiments on dichotic pitch strength in Chapter 1 favored the EC model, instead of the CAP model, it is reasonable to expect the EC model to give correct prediction on lateralization of dichotic pitch stimuli. However, unlike the CAP model, the EC model does not predict explicitly on lateralization. Fortunately, Akeroyd and Summerfield (2000) suggested a reconstruction-comparison (RC) model, which models the perception of dichotic pitch in five steps. It determines the pitch height by an equalization-cancellation process and then segregates the pitch from the reconstructed background noise. After segregation, there would be residual peaks on the plane of frequency and interaural delay. If, further assuming that the interaural delay 0f the residual peaks determines the laterality of boundary frequency, the RC mOdel predicts that the boundary frequency of Huggins pitch would be lateralized in the same way as lateralizing a sine—tone by itself with the same frequency without the broadband noise background. This prediction is modest, compared with the prediCtion by the CAP model, because it does not say how exactly the boundary frequency would be lateralized, as the CAP model does, but rather, it leaves all the qu€Stions to the problem of lateralizing sine-tones. With this weakness, this modest prediction is supported by the fact that the scatter plots for Huggins pitch St" - . . . . . 11111111 (With either linear-phase boundary or stepped-phase boundary) are Similar to 100 listener C L W X Z Side L R L R L R L R L R linear-phase 24% 76% 8% 92% 74% 26% 0% 100% 12% 84% stepped-phase 30% 70% ~ - 76% 24% 0% 100% 18% 70% sine—tones 2% 98% — — 82% 18% 40% 50% 68% 28% Table 2.10: Percentage of responses on each side for HP— and sin-7r with O—FIITD the scatter plots for sine-tones. This similarity was exhibited by the large values of overlapped area on the top block of Table 2.7, comparing the Huggins pitch stimuli (both linear-phase boundary and stepped-phase boundary) with the sine-tones. 2. 7.4 Earedness The phase spectra of HP— with stepped-phase boundary and sin-7r are binaurally symmetric, and the spectra of HP— with linear—phase boundary are almost symmetric. From intuition, there seems to be no preference on one side over the other. However, the experimental results in this chapter show that listeners consistently lateralized HP - and sin-7r with 0—FIITD to one preferred side. Table 2.10 shows the percentage of the responses on each side for these points. Some of the scores for the left and right do not add up to 100% because some responses were in the middle and thus were counted neither as left nor as right. In general, except for one case (i.e. listener Z With sine-tones), listeners who participated in the experiments with more than one types of boundaries, i.e. listeners C, W, X and Z, maintained their preferred sides, although the level of preference might vary among different boundaries. This ear-preference, or earedness, can be easily demonstrated with a simple ex- per iInent. When presented with a HP— stimulus through headphones, the listener is asked to point out which side he hears the pitch. Then after reversing the head- p bones, the listener will usually be surprised to find that the pitch does not change 8' . . lde’ When performing this experiment, all of our listeners demonstrated no change 101 in sidedness. When Culling (2002) performed this experiment with 36 listeners, he found that 12 listeners heard the pitch on the right side independent of the headphone orientation, 14 listeners heard the pitch on the left side independent of the headphone orientation, 10 listeners were unsure about the lateralization for one or both head- phone orientations, and only 2 listeners responded that the pitch image moved from one side to the other when reversing the headphones. In summary, Culling found 75% of the listeners were either left—eared or right-eared. No correlation was found between earedness and handedness. Furthermore, in our experiments, the earedness found for individual listeners in Huggins pitch experiments was primarily maintained in sine-tone experiments, sug- gesting that some common mechanism in each individual listener’s auditory system might account for the earedness for various binaural stimuli. Unlike the experimental results in this chapter and in Culling’s experiments, Hafter et al. (1969) found that listeners lateralized the signal in an NOS7r stimu- lus about equally often to the left and right. The difference between the experiments by Hafter et al. and our experiments might be due to the different method. In each experimental run introduced in this chapter, stimuli with various interaural phases were presented, and the interaural level difference was kept zero, whereas Hafter et al- presented only interaural phases of 180° throughout an entire run, and the level of Signal-to-noise ratio was varied as a parameter. Therefore, given different experimen- tal (conditions, and different signals (i.e. Huggins pitch stimulus and masking level difference stimulus), the listener’s behavior on ear preference might be different. In general, the tendency on ear sidedness found in Huggins pitch experiments seems strong enough to justify the concept of left-eared or right-eared listeners. Ana- 10g to ambidextrous persons, it would not be surprising to find listeners who prefer both sides about equally often. 102 2.8 Conclusions Lateralization was measured for Huggins pitch stimuli in the HP+ and HP— phase configurations to test the prediction by the CAP model based linearly on the interaural delay. The CAP model predicts that listeners hear the pitch in HP+ at the center and hear the pitch in HP— on one side, which was confirmed by the experimental results in this chapter. Moreover, the CAP model predicts that the laterality of the pitch image in HP— should be a hyperbolic function of boundary frequency. To test this quantitative prediction, randomized interaural delays were added, which led to a large amount of data presented in the form of scatter plots. Surprisingly. the experiments in this chapter did not confirm the hyperbolic function predicted by the CAP model. Instead, the experiments found that 4 out of 5 listeners showed very little frequency dependence on laterality of the pitch image in HP—. According to the experimental results in Chapter 3, listeners’ responses are com- pressed at large ITDs. All of the five listeners in the experiments in this chapter showed large compression at large ITDs, leading to much less frequency dependence, especially for HP—. This compression may explain the failure of the quantitative prediction by the CAP model. An alternative model is the reconstruction—comparison (RC) model by Akeroyd and Summerfield (2000). For Huggins pitch stimuli, the RC model applies the eq ualization—cancellation model and a reconstruction process to segregate the bound- ary frequency from the background noise. Akeroyd and Summerfield suggested that “Only subsequent to this partitioning is the lateralization of each object calculated.” Hence unlike the CAP model, the RC model does not give a specific prediction on lat- eralization, but rather, it suggests that lateralizing Huggins pitch would be the same as lateralizing a sine—tone at the boundary frequency. This motivated the lateraliza- tiOIl experiments with sine-tones in this chapter, which confirmed that, 'as statistical COI11E>arison has shown, for each individual listener, the pattern of the scatter plot for 103 sine-tones is very similar to the patterns for Huggins pitch stimuli. It needs to be noted that the CAP model also predicts that lateralizing Huggins pitch is the same as lateralizing a sine—tone. What was not confirmed by our ex- periments was the further detail prediction by the CAP model, i.e. the hyperbolic function of laterality with respect to boundary frequency, without considering the compression at large ITDs. It seems unfair to say that the experimental results in this chapter favor the RC model over the CAP model, because the RC model leaves the exact question unanswered, and simply transposes the problem of lateralizing Huggins pitch to the problem of lateralizing sine—tones. However, the RC model does have the advantage of not making predictions against the experimental data in this chapter, and using the EC model in detection of Huggins pitch, which was supported by the experiments on pitch strength of Huggins pitch stimulus in Chapter 1. For ambiguous stimuli, such as HP— and sin—7r with O-FIITD, the signal has inter- aural phase difference of 1800 (Sn). When presented with these binaurally symmetric stimuli, listeners usually lateralized them consistently on one side due to personal pref- erence, which does not change over time or among various stimulus configurations. This observation, made informally by others (e.g. Culling, 2002; Akeroyd, 2003; and Bilsen, 2003), led to the concept of earedness, i.e. left-eared and right-eared listeners. There appears no correlation between earedness and handedness. In summary, the lateralization experiments in this chapter with Huggins pitch stimuli and sine-tones roughly confirmed the prediction by the CAP model, but failed to demonstrate the hyperbolic function that the CAP model predicts quantitatively, which may be due to not considering the large compression at large ITDs. Meanwhile, the results show similar patterns for Huggins pitch stimuli and sine-tones, consistent with the predictions by the RC model and the CAP model. 104 4 fl.“f_—-w— ___——_ ..- A 2.9 Appendix 2.9.1 Auxiliary experiments with narrow-band stimuli The auxiliary experiments introduced in this section were designed to test whether a listener could consistently lateralize the phase boundary region by itself. Method During an experimental run, a listener was sitting in a double-walled sound room, wearing headphones (Sennheiser HD414). A narrow-band stimulus was generated from a 65—dB Huggins pitch signal (with boundary frequency of 500 Hz) by eliminating all the frequency components except the ones within the phase boundary region, which was 6% wide (—3% to +13%) about the boundary frequency. This narrow-band signal is identical to the phase boundary region of the broad-band Huggins pitch signal, used in Experiments 1 and 2 in this chapter. Then the left-ear signal was delayed by adding a linearly-increasing phase shift to the frequency components. Two delays were applied, i.e. —400 and +400 as, less than 1000 as, the half period of 500-Hz boundary frequency. A negative delay means that the right-ear signal was delayed. In the following text, the signal with —400—ns delay is called “Interval A”, and the signal with +400-us delay is called “Interval B”. Each interval was 1.6 seconds long, with slow onset and offset smoothed by a cosine-window of 100-ms duration. In each run, a fixed type of phase boundary, either linear-phase (Figure 2.42) or stepped-phase (Figure 2.43), was presented to the listener. Each run contained 30 trials. On each trial, Interval A and Interval B were presented in a randomly-picked sequence, either “AB” or “BA”. Among the 30 trials, 15 of them were in “AB” sequence, and the other 15 of them were in “BA” sequence. After hearing the two intervals, the listener would respond whether the image moved to the left or right, and if the movement was not clear, the listener was asked to make a guess. 105 360 . _’ 270 Boundary . 9‘ . of HP- | 180 g » 90 ~ - . (a) 0 ,, fb=500 Hz frequency 180 . .4 go Boundary , 9‘ ’ of HP+ l 0 g _90 i (b) -180 fb=500 Hz frequency Figure 2.42: Interaural phase of linear-phase boundary. Power exists only at frequen- cies where a phase difference is shown. 106 360* 4270’ 9- . 180 PR - 90 Figure 2.43: Interaural phase of stepped-phase boundary. Power exists only at fre- 1. Boundary l of HP- l (a) q fb=500 I'IZ frequency Boundary . of HP+ (b) ‘ fb=500 I'IZ frequency quencies where a phase difference is shown. 107 1000 _T'r:"r'..' ' ' ' ' ' 'r' ' '\‘ . ' f I. *Boundory,’ \ (0) "Boundary: ‘. (b) . It: Of HP+ II E l 00f HP“ I, l ‘ I 700 . .................. +......, ...... + .................. ,. .................. ... .............. + .................. . V I, E “ I, “ >~ ’ II I \\ " II \\ ‘ g 500 - ------- a ----- f.......§.......r\ ----- o ------- .. ----°~----O--;’---I-----r----C--‘{--D- ---------- - g 3* x A 'B x A* A*,’ B A \B* 0’ I, \\ I I, \\ l 2 ......... ,1.’ .............. i .............. ‘. i, .................. ,5 ............... L .............. ‘. x ......... LL 300 ’ ,’ \ 0 1’ : \\ ' I u \ I i \ I : \ I : \ I, i \\ I, i \\ L. .’.... -- i ..... \ 4.. .’. ..... i ‘ - 200.’.............. ’ ........‘ Lateralization Judgment (fixed scale) Figure 2.44: Lateralization of phase boundary region Model predictions Applying the CAP model, one can predict the laterality of the phase boundary by itself. For the phase boundary of HP+ stimulus, the interaural phase difference (IPD) at the boundary frequency is zero, leading to an image at dead center. Thus the —400-,us delay (Interval A) moves the image to the left, and the +400-us delay (Interval B) moves the image to the right (“A” and “B” in Figure 2.44a)3. As for HP—, the IPD at the boundary frequency is 180°, leading to an image on the left- or right—side, depending on whether a listener is left—eared or right-cared, as discussed in the section of earedness. At 500-Hz boundary frequency, an IPD of 180° corresponds to iIOOO-ps delay. Due to the principle of centrality, both left-eared and right-cared listeners would perceive images within the central region between —180° and +180°, i.e. between —1000 ,us and +1000 #8 at 500 Hz. Thus a —400—us delay would lead to a position at 600(= 1000 — 400)-;is delay (“A” in Figure 2.44b); and a +400-us delay would lead to a position at —600(= —1000 + 400)-,us delay (“B” in 3Figure 2.44 uses the same scales as the previous scatter plots in this chapter. 108 Figure 2.44b). The positions outside the central region (“A*” and “B*” in Figure 2.44) are alias points. In summary, according to the CAP model, for both left-eared and right-cared listeners, when presented with the phase boundary of HP+ stimulus, they would hear the “AB” sequence as moving to the right, and the “BA” sequence as moving to the left (the solid symbols in Figure 2.44a); when presented with the phase boundary of HP— stimulus, the results would be opposite, i.e. they would hear the “AB” sequence as moving to the left, and the “BA” sequence as moving to the right (the solid symbols in Figure 2.44b); Listeners and results Three listeners, C, W and X, were in this experiment, and all of them had partic- ipated in the lateralization experiments on Huggins pitch in this chapter. To eliminate the effect of possible asymmetry of headphones, for each condition, after the regular run, the listener was asked to do another run with headphones reversed. With phones reversed, the generated +400-[1.8 delay (Interval B) became —400—ps delay (Interval A), and vice versa. If Interval A is still named as the inter- val with currently -—400—ps delay, then the prediction by the CAP model would be identical as with normal phones. The open squares in Figures 2.45 and 2.46 show the percentage scores on how well the listener followed the prediction by the CAP model, with stepped-phase boundary for HP+ and HP—, respectively. Each data point is an average of six runs, three with normal phones and three with reversed phones. All three listeners found the task very easy. The sound images were found to be compact with strong lateralization cues. Every listener had perfect score (100%) with no error-bars, as the CAP model predicts. The solid squares in Figures 2.45 and 2.46 show the results of linear-phase bound— 109 ES 100 - C] D C] - “o 90 L i .2 ’ Cl Stepped phase ‘ .9 8° _ I Linear phase with 360-deg boundary . E 70 . C Lineor phase with lOBO-deg boundary . 3 soL j 8 50 ---------------------------------------- - F . d 8, 40 _ O . 2 30 . E 1 C 20 F q 8 - - 3 10 _ ‘ a 0 k l l l q C W X Listener Figure 2.45: Laterality of the boundary region of HP+, percentage of responding AB as moving to the left and BA as moving to the right, as predicted by the CAP model 5 100 D I D T D I ‘ . J 'O 90 _ .. .9.” ’ Cl Stepped phase ‘ .9 80 _ I Linear phase with 360-deg boundary ‘ E 70 . 0 Linear phase with lO80-deg boundary 4 3 act j m 50 ---------------------------------------- -‘ a 0. f . a 4 ’ f . « g 30’. i I C 20 » . 8 o L ‘ a ‘ * i 4 a. 0 ’ . . . ‘ C W X Listener Figure 2.46: Laterality of the boundary region of HP—, percentage of responding AB as moving to the right and BA as moving to the left, as predicted by the CAP model 110 ary, with each data point averaged from six runs (three with normal phones, and three with reversed phones). Listeners’ scores were far from perfect. Actually, all the scores were below chance (50%), demonstrating a bias opposite to the prediction by the CAP model. For listener W and the HP+ condition for listener X, the scores were above 25%. Taking 25% and 75% as thresholds, the results between 25% and 75% can be interpreted as that listeners could not consistently lateralize the sound image. From the results, one can conclude that listener W could not lateralize the linear-phase boundary consistently, and listener X could not lateralize the linear-phase boundary of HP+ stimulus consistently. However, all the results for listener C and the results for listener X with HP— boundary were still beyond the threshold, indicating some consistency in their lateral judgements. It is possible that the linearly-varied phase gave them some cue. For a narrow- band signal, the delay of the envelope, i.e. the group delay, can be calculated by A45 AT: 360°-Af’ (2.9) where AT is the group delay, Ad is the change of phase in degrees within the narrow- band, and A f is its bandwidth. For the stepped-phase boundary, the interaural phase was fixed to be 180°, and Agb is zero. Thus the delay of the envelope is zero. For the linear-phase boundary, the interaural phase varied by 360°, and the bandwidth was 6% of the boundary frequency, i.e. 500 Hz. Therefore the delay of the envelope is _ 360° — 360° - (500 Hz . 6%) AT 2 33 ms, Where the positive delay means that the envelope in the right ear leads. When a delay At = i400ps = i0.4ms, is added to this narrow-band boundary, based 111 on Equation 2.9, the group delay becomes Aw =—-——:l:At= .j: .4 . 360°-Af 33ms 0 ms AT Compared with 33 ms, the change of i0.4 ms is small, and Equation 2.9 still approx- imately holds. On the recorded waveform, it was confirmed that, for the stepped- phase boundary, the envelopes in both ear-channels were in phase; whereas for the linear-phase boundary, the envelope in the right-channel was always leading by 33 ms, agreeing with the theoretical calculation above. However, because the delay of the envelope was approximately the same for both Stimuli A and B, it could not be used as a consistent cue to discriminate them. One possible strategy that listener C might use to lateralize the linear-phase boundary is that she might have segregated the boundary into two bands about the center frequency, lateralized the low-frequency components and the high-frequency components separately, and always chose the laterality of the high-frequency compo- nents as a consistent cue. If the phase variation was larger, there would be more than one cycle within the linear-phase boundary, and therefore, to succeed in segregating the components with positive and negative interaural phases, the listener would have to segregate the whole band into many very narrow strips and chose the even or odd number of strips. Hence we expected that the listener’s performance would be less consistent with larger variation in interaural phase. To test this idea, listeners did the same experiments with a boundary in which the interaural phase varied from 0° to 1080° (three complete cycles instead of one)4. According to Equation 2.9, the delay 0f the envelope was 1080° A = T 360°.(500 Hz-6%) = 99 ms, Wh ere the positive delay means that the envelope in the right ear leads. 4 Odd number of cycles were chosen so that the interaural phase at the center frequency was the 531118 as the original linear-phase boundary with one-cycle variation. 112 This delay was confirmed by comparing the envelopes on the recorded spectra. As discussed in the previous paragraph, this envelope delay could not be used as a consistent cue for discriminating Stimuli A and B. Listener C’s results for the new linear—phase boundary with phase—variation of 1080° were shown in Figures 2.45 and 2.46 as solid circles, each of which was aver- aged over four runs (two with normal phones and two with reversed phones). For comparison, listeners W and X also did experiments with this condition, but with only two runs (one with normal phones and one with reversed phones) for each data point (solid circle) shown in Figures 2.45 and 2.46. The solid circles show that the results of all three listeners were close to chance (50%) as expected, with small bias in the sense that all scores were below 50%. It is worth noting that all the listeners’ judgements with linear-phase boundary were opposite to the prediction by the CAP model. The listeners all reported that the linear-phase boundaries sounded more diffuse and were much more difficult to lateralize. Thus one way to understand the discrepancy between the results and the prediction might be that. when presented with such a diffuse stimulus, the listener’s earedness dominated the judgement. Thus a left-cared listener would hear the in- tervals on the left, at “A*” and “B” in Figure 2.44; whereas a right—eared listener would hear the intervals on the right, at “A” and “B*” in Figure 2.44. Therefore both left-eared and right-eared listeners would perceive “AB” trial as moving to the right, and “BA” trial as moving to the left, opposite to the prediction by the CAP model. For linear-phase with 360°-boundary, for which listeners had strong bias against the Prediction by the CAP model and the principle of centrality, informal listening con- fiI‘Ined that left-eared listener W mostly heard the stimuli (for both HP+ and HP—, and for both Interval A and Interval B) on the left and right-eared listeners C and X mostly heard the stimuli on the right. By contrast, when listening to stepped-phase bonndary, all listeners heard the stimuli as predicted, i.e. for HP+, listeners always 113 heard Interval A (—400~ITD) on the left and Interval B (+400-ITD) on the right; and the opposite for HP—. In general, the auxiliary experiments on phase boundary of Huggins pitch stimulus show that listeners lateralized the stepped—phase boundary by itself very consistently, and as predicted by the CAP model, at least qualitatively; while for the linear-phase boundary, listeners lateralized less consistently (and close to chance with interaural phase varying by a larger range of three cycles), and the results were opposite to what the CAP model predicts, which suggests that, when lateralizing a diffuse image, listeners’ earedness might dominate the task, and the principle of centrality might not apply. 2.9.2 Calculation of overlapped area This section introduces the detailed method used to calculate the overlapped area between two standardized normal distributions, as discussed in Section 2.7.2. A curve characterizing a standardized normal distribution is defined by its mean and standard deviation as in Equation 2.10. Figure 2.47 shows two possible circum- stances. On the top plot, the two curves have the same standard deviation (00). On the bottom plot, the two curves have different standard deviations (01 and 02). l("E—”)2 PDF($) = 1 e 2 ‘7 (2.10) a 27T Where PDF(:r) is the normalized probability density function of 3:, which satisfies [1: PDF($) - dr 2 1; p. is the mean; and o is the standard deviation. When the two curves have the same standard deviation 00 (the top plot in Figure 2£17), there is only one intersection of the two curves, whose value is the average of the means of the two curves. Without losing generality, it is assumed that #2 > m as Shown in Figure . Due to the symmetry, the areas of I and II are equal. Therefore 114 l l l I I l I I l -40 -30 -20 -10 O 10 20 30 40 Response Figure 2.47: Overlapped area of two standardized normal distributions the overlapped area (i.e. the total area of the shaded areas) is as shown in Equation 2.11. 1(r—p2)’ In 1 -— — A=2-r. ,0 =2 62 00 (12: 2.11 (0‘12 0) [waom ( ) where A is the overlapped area: (I) is the cumulative density function (CDF), which is defined as ff; PDF(:r) - dart; 1:0 is the horizontal coordinate of the intersection, which can be derived as :r0 = W; 00 is the standard deviation of the two curves; and #2 > m. This circumstance was especially important for comparing the experimental results with the prediction by the CAP model in Section 2.7.2 because the same standard deviation was assumed. When the tWo curves have different standard deviations 01 and 02 (the bottom plot in Figure 2.47), there will be two intersections of the two curves. Without sacrificing a11y generality. it is assumed that 01 > 02. The horizontal coordinates of the two 115 intersections are solutions of 1(r—/11)2 1(x—u2)2 1 6 2 0'1 2 1 6 2 02 . 0’1V27F 0'2V27l' Taking logarithm of both sides of the equation, and simplifying it, one can achieve (a? — 0%) 1:2 + 20110; — #120055 + [#30? — ”$03 + 2ofa§1n (33)] = 0. (2.12) 1 Letting a 3 of — 0%, b é 2 (p103 — pyjf), and c é p30? — #11203 + 20fa§ln (if), the solutions of Equation 2.12 can be written as _ —b j: x/b2 —— 4ac xi __ 2a (2.13) where 27+ is the greater solution and :c_ is the lesser solution. Then the overlapped area (i.e. the total area of the shaded areas) is A = A1+A11+A111 where A is the total overlapped area; A1, A” and A111 are the areas of the three shaded areas on the bottom plot of Figure 2.47, which are derived as A1 : q)($-a#2102)9 A” = ¢($+$#1‘01)—¢($-3#1301)3 A111 = 1—‘D($+a#2302)- When comparing two scatter plots, it would be very unlikely that the standard devi- ations of the two plots were the same. Thus this circumstance is most important for comparing two arbitrary plots. 116 Chapter 3 Lateralization of sine-tones 3. 1 Introduction It has been known, ever since the time of Lord Rayleigh (Strutt, 1907, 1909), that human listeners localize or lateralize tones with interaural level cues and interaural temporal cues. Interaural phase difference (IPD) and interaural time difference (ITD) are considered as two types of interaural temporal cues. IPD and ITD relate to each other by a function as in Equation 3.1. [PD 2 360° - fo (ITD) (3.1) where I PD is in degrees, 1 TD is in seconds, and f is the frequency in Hz. For a given frequency, IPD and ITD are proportional to each other, and therefore a lateralization model based on either of them would give virtually the same prediction. However, when allowing frequency to vary, the relationship between IPD and ITD gets complicated, and models based on IPD and ITD would give different predictions on human auditory perception of laterality or sidedness. Physically, ITD is a good function of azimuth in that, for a sound source at a given azimuthal position, the ITD varies little as frequency changes, whereas IPD 117 varies a lot (Kuhn, 1977, 1979; Constan and Hartmann, 2003). If human listeners develop their perception of laterality through experience with the physical positions of nearby audible sound sources, the ITD would be a cue more natural to adopt. Furthermore, many current auditory models follow the Jeffress model (Jeffress, 1948) and emphasize the ITD because, when perceiving sound signals with external delays, fixed internal delay lines compensating these external delays form a topo— graphic encoding based on ITD. However, since this encoding is confined to frequency channels, the ITD is equivalent to an IPD within any tuned channel, and thus it is difficult to argue that neural architecture offers hard evidence favoring ITD over IPD. Moreover, when studying human lateralization of broadband stimuli, the cross- frequency models are based on a common ITD through multiple frequency channels (Stern et al., 1988; Dye, 1988). Especially, the criterion of “straightness” applies to a frequency-independent ITD instead of a frequency-independent IPD. Since the concept of straightness is important for perception, taking ITD as a lateral one has an advantage. On the other hand, there are also good reasons to consider IPD as a valid one for lateralization. Mathematically, IPD has a built-in limit. When IPD reaches 180°, the perceived sidedness becomes ambiguous. Thus for whatever frequency, 180°-IPD is a limit for temporal cues, even though the ITD may still be within the physiological range of about 600 ps. Psychophysically, measurements of just—noticeable—differences (J ND) from the midline showed that the JND in IPD is roughly independent of the frequency of the sine-tones, whereas the JN D in ITD is not (Yost and Hafter, 1987). Physiologically, the neural pepulation appears to be distributed according to IPD, supported by the observed distribution of IPD-sensitive neurons across frequency in the inferior colliculus of guinea pig (McAlpine et al., 2001). This distribution suggests that IPD may be more fundamental for lateral perception. The most direct support for IPD over ITD is the experiments by Yost (1981). He 118 performed a series of experiments on lateralizing sine-tones with various interaural level differences (ILD) and various IPDs. For the IPD experiments, he measured the perceived lateral position of sine-tones at frequencies of 200, 500, 750, 1000 and 1500 Hz. The IPD varied from —150° to +150°, close to the 180°-limit. Listeners were asked to indicate the perceived location of the tone by moving a pointer to the corresponding position on a drawing of a head, and the response was then converted to a number from -10 (extreme left) to +10 (extreme right). Yost found that listeners responded close to i10 for an IPD of i150°, whatever the tone frequency. The shape of the curves of laterality vs. IPD appeared to be independent of frequency. However if one plotted the laterality against IT D, because of the wide frequency range in the experiments, the curves looked dramatically differ- ent. The experiments therefore suggested that the human binaural system was acting as an IPD meter and not as an ITD meter. Similar findings were also reported by Sayers (1964). Nevertheless there is a potential problem with Yost’s experiments. Although the tone frequency was varied among different blocks of the experiments, the frequency was fixed in each block. If listeners had the tendency to use the full range of available responses for each experimental block, then the results would appear to be the same, independent of the scale factor in Equation 3.1, i.e. the tone frequency. Being aware of this possibility, Yost performed spot checks with tone pairs at different frequencies. The checks supported the dominant position of the IPD. However it would be better if a full experiment can be performed to test listeners’ lateral judgements across frequency. Chapter 2 of this thesis introduces the experiments on lateralization of Huggins pitch. As a comparison, laterality of sine-tones was also measured. It was found that for each listener, the lateral judgements of sine—tones were very similar to those of Huggins pitch stimuli, and they all followed somewhat (although not perfectly) 119 the model prediction based on ITD, instead of IPD. To pursue those observations, an experiment on sine-tone laterality was performed using both IPD (chosen to be the same as in Yost’s experiments) and ITD as independent variables. The major difference from Yost’s experiments is that, in this experiment, the tone frequency was not fixed in each run. 3.2 Method 3.2.1 Stimulus The signals presented to listeners were pure sine-tones with various interaural phase differences (IPD). The waveforms of the sine-tones were calculated by an array processor (Tucker Davis AP2) on a computer, and were converted to audio signals by 16—bit DACS (Tucker Davis DD1). The sample rate per channel was 20 ksps (kilo- samples per second). With a continuously-cycled buffer of 32768 words, the period was 1.6384 seconds. The output of the signal was low-pass filtered at 2.5 kHz with Brickwall filters at a rate of -115 dB / octave. Recordings were made at the output of the filters, and the phases for the left and right channels were compared, which agreed with the calculated IPDs. Although the filters were known to add a phase-shift, the phase-shifts were the same for both channels, and hence the [PD was preserved. The onset and offset of the stimulus were both smoothed by a cosine-window of 100-ms duration. The left and right channels had simultaneous onsets and offsets, and the IPD was not applied to the onset or offset, which was confirmed by comparing the recorded envelopes for the left and right channels. 3.2.2 Listeners Five listeners, A (male, age 19), C (female, age 65), W (male, age 66), X (male, age 31) and Z (male, age 32) were in this experiment. They all had normal hearing 120 (i.e. with hearing thresholds within 15 dB of nominal throughout the frequency range of this experiment) except that W had a bilateral hearing loss above 8 kHz, far above the frequencies used in the experiment. Listeners C, W, X and Z were experienced in lateralization estimation, but listener A only had experience in making left/ right decisions. All listeners were right-handed. 3.2.3 Procedure On each trial, a listener heard a single interval of a sine-tone with certain IPD. The listener could listen as many times as desired, and was then asked to respond with a number (from —-40 to +40, corresponding to extreme left and extreme right, respectively) according to the lateral position of the sine-tone. It was an absolute estimation task. There were 25 trials in each run. Table 3.1 lists the values for ITD, IPD and frequency in one set of the stimuli (“Stimulus I”) used in the experiment. Each trial played one of the 25 stimuli. Of the 25 stimuli, 22 of them were with finite ITDs and IPDs. For finite ITDs and IPDs, there were five possible ITD values (i.e. 200, 400, 600, 800, 1000 as), and five possible IPD values (i.e. 30°, 60°, 90°, 120°, 150°). The ITD and IPD define the frequency, which can be calculated from Equation 3.1. Therefore totally, there should be 25 stimuli (5 ITDs x 5 IPDs). However, three of the 25 stimuli, as shown in Table 3.2, had frequency either above 1500 Hz (where the ITD and IPD cues were not reliable) or below 100 Hz (which was hard to hear). These stimuli were eliminated and not included on Table 3.1. On the other hand, the last three stimuli on Table 3.1 were with O-ITD and O-IPD, added as check-points, which would reveal some information on left/right bias during data analysis. The frequencies of the check-points were well distributed, covering the most important range in this experiment. To be symmetrical, both positive and negative ITDs and IPDs were used in the experiment. With positive ITDs and IPDs, the right channel 121 stimulus number ITD ( [1.8) IPD (°) frequency (Hz) 1 —200 —30 417 2 +400 +30 208 3 —600 —30 139 4 +800 +30 104 5 —- 200 —60 833 6 +400 +60 417 7 —600 —60 278 8 +800 +60 208 9 -—1000 —60 167 10 +200 +90 1250 11 -400 —90 625 12 +600 +90 417 13 — 800 —90 313 14 +1000 +90 250 15 —400 — 120 833 16 +600 +120 556 17 -800 — 120 417 18 +1000 + 120 333 19 —400 — 150 1042 20 +600 +150 694 21 —800 — 150 521 22 +1000 +150 417 23 0 0 167 24 0 0 333 25 0 0 694 Table 3.1: Stimulus I for the experiment 122 ITD (p8) IPD (°) frequency (Hz) 1000 30 83 200 120 1667 200 150 2083 Table 3.2: Eliminated stimuli led the left channel; and vice versa for the negative ITDs and IPDs. To make the experimental run shorter, two sets of stimuli (Stimuli I and II) were used. Stimulus I is listed in Table 3.1, and Stimulus II was identical to Stimulus I, except that the plus and minus signs of ITDs and IPDs were reversed. The three check-points existed in both Stimuli I and II. Each run contained trials from either Stimulus I or Stimulus II, and the order of the stimuli were scrambled on each run. The runs with Stimuli I and II were performed alternately. An experimental run lasted about 4 minutes, and listeners usually did several runs before a rest break. Every listener did totally ten runs for Stimulus I and ten runs for Stimulus II. The experiment was performed in a double-walled sound-treated room. The lis- tener heard the stimuli through Sennheiser HD 414 headphones while sitting on a chair. The level of the stimuli was 60 dB for each ear-channel. Listener’s responses were input to the computer through a keyboard. There was an LCD monitor in the sound room to facilitate the data—input. There was no feedback. 3.3 Results Results of the experiments are shown in Figures 3.1 through 3.5. The vertical axis was listener’s averaged responses. Please note that because listener X always responded with small values, corresponding to lateral positions close to the center, to get a better-viewed plot, the vertical range of the figure for listener X was from —10 to +10, smaller than the range from —40 to +40 for other listeners. To compare the results with both IPD and ITD, the data were plotted twice with respect to each of 123 Response Response 4Or U fl T I U I' T r V I 1 C /’ l 30 - / ‘ E A 0 / 9 l ' CD 20 I- / E 8 9/ 3 10C- 5 -1 b // d r / 1 0 b )7" ‘ > / + I ,’ 1 -10: / d i g ,’/ 5 l ._ ’ ’ O .‘ 20E O [I/ O O 8 : I9 I '30? o 8 1 t x, 8 j : l -40 . J ~150-120-90 -60 -30 0 30 60 90 120 150 Interaural phase difference (degrees) 40 r I r r I I 1 I r f if: 30% 43 20:- i 10:- -I 0 i l -40 1 n l 1 n n m n l l d -1000 -600 -200 0 200 400 500 8001000 lnierourol time difference (us) Figure 3.1: Sine-tone lateralization by listener A 124 Response Response 40: I T I 30 O V'VV V 20 g—o O " AA‘AAAA AAA viiv'vvvv v v LO LO lAAAL‘AAAAnAA l w 0 I' V'v' ' I O 9 -40. 1 [/g 61 8 c1) 1 n I § 1 -150-120-90 -60 -30 0 30 60 90 120 15 Interaural phase difference (degrees) 40 30 'U V'VVV 20 v'vvv 10 0 VV'VV—VV VVV l N O I AAlAAAA‘AAA 'V‘V—Vv'vvvv v‘ -1000 -600 -200 0 200 400 600 8001000 Interaural time difference (us) l Figure 3.2: Sine—tone lateralization by listener C 125 Response Response 40 30- 20 4o 30 20 l O O J AAIAAAAIA q -150- 120 -9 0 -60 -30 0 30 60 90 120150 Interaural phase difference (degrees) 5 U I V'ijTVVV'V VYTVrV I I U AAAnAAAAlALAAl 'v 'VV ‘ -1000 -600 -200 0 200 400 600 8001000 Interaural time difference (us) Figure 3.3: Sine-tone lateralization by listener W 126 Response Response 40 ’ U V U I I 5 U I U j 1 I o l 30’ o ,z , : Z O O ’8/ 8 1 20’- ' : : i 0 0 x8 0 o : I 9’ o 1 10? §/’7 1 b 1 Oh 35” 1 n I J 4 s if 0 3 -10: ’ 0 9 ‘. -20: 9 8 ,’ i ‘ : O ’9’ 8 O O ’ I 30:? I 0 O ‘ . O 1 —40L n a 1 n it 4 l l l " -150-120-90 -60 -30 0 30 60 90 12 150 Interaural phase difference (degrees) q V W t I 1 V U V r 40 30 20 'v'vva'VV‘V—Vtififi 10 p 0 fi b AIAAAA AAAJAAA I on O '7' "’ V T'V' l A L l -600 "200 0 200 400 600 8001000 Interaural time difference (us) ~1000 Figure 3.4: Sine-tone lateralization by listener Z 127 Response Response 5 - . 4 0 i _5 ,. . 1 -10 7 A 9 A A A A A A A L -150-120-90 -60 —30 0 30 60 90 120 150 Interaural phase difference (degrees) 10 I I 'r I I I I I I T ‘lO' A 9 A A n A A A A A .l -1000 -600 -200 0 200 400 600 8001000 Interaural time difference (us) Figure 3.5: Sine-tone lateralization by listener X 128 them. On each figure, the horizontal axis of the top panel was IPD in degrees, and the horizontal axis of the bottom panel was ITD in as. To display overlapped points clearly, the horizontal coordinates of some points were offset by a small amount. The actual IPD or ITD of any data-point was one of the eleven nominal values that was closest to the data-point on each plot. Each circle on the figures is an average of ten responses from ten experimental runs. According to both IPD- and ITD-models, listeners should respond on the right side with positive numbers for positive IPDs and ITDs (i.e. in the first quadrant), and respond on the left side with negative numbers for negative IPDs and ITDs (i.e. in the third quadrant). On the figures, all of our listeners did respond mostly in the correct quadrants. However, there were a few points from a few runs, which were in the wrong quadrants. Especially for stimuli with long ITDs, sometimes listeners heard them on the left, but sometimes heard the same stimuli on the right. It would be misleading to take the average of those points and get a mean close to zero. Therefore, during data analysis, the means were calculated in both the correct and incorrect quadrants individually. (The points on the x-axis with a response of zero were counted as in the wrong quadrant.) Those data-points that were averages over less than 10 runs were marked with numbers, instead of circles, in Figures 3.1 through 3.5, showing the number of values that were averaged for each of those data-points. To show consistent responses only, the figures ignored the averages of two or fewer points. The data-points for O—ITD and 0-IPD were simply averaged with no consideration of the quadrant, because the responses were about zero (the center) as predicted, and there were no bi-modal responses in two different quadrants as for finite IPDs and ITDs. The dashed, straight line on each figure was the linear best-fit (Equation 3.2) for the data-points with the optimal slope (a) and the optimal intercept ([3). The slope represents the scale factor (between IPD or ITD and the lateral response) for each individual listener. The intercept represents the overall left / right bias for the listener. 129 Some of the listeners, especially A and C, showed the well-known compression effect at large ITDs or IPDs (Yost. 1981). To illustrate this effect, the data were also fitted with a three-parameter model (Equation 3.3) with optimal parameters a, b and q, shown as solid curves on the figures. L = (r - a: + [3 (3.2) L=a~x°+b (3.3) where L is the lateral response, and 1: is either IPD or ITD, depending on the plots. The parameter a is a scaling factor, similar to a in the linear model. The parameter b represents the left/ right bias, similar to 6 in the linear model. What is special is the parameter q, which illustrates the level of compression with a power function. If q < 1, the curve is compressed; if q = 1, it is just the linear model; and if q > 1, there would even be expansion. The optimal parameters, either (a, 6) or (a, b, q), were found by minimizing the sum of squared errors. (Only the data-points in the correct quadrants were considered in the fitting process.) Because not all the points on the figures were averages over ten values, the number of values contributing to each averaged point was a weighting factor in the calculation. Thus the weighted sum of squared errors as in Equation 3.4 was minimized in the fitting process. The resulting root-mean—square—error was achieved by Equation 3.5. Tables 3.3 and 3.4 show the optimal parameters and the minimum weighted RMSES for the linear model and the three-parameter model, respectively. 11 mi A 2 SSE = ZZTLU (LU '— Lij) (3.4) i=1 j=1 RMSE = ,, 5551. (3.5) 21:1 21:1 "ii 130 where SSE is the weighted sum of squared errors; RM 3 E is the weighted root-mean- square—error; i is the index for the 11 values on the horizontal axis (either IPD or ITD, depending on the plots); j is the index for each data-point at each value of IPD or ITD; [3g is the vertical coordinate for points on the best-fit line (i.e. lateral response predicted by the model); L,,- is the vertical coordinate for each data—point, which is an average over nij responses (i.e. lateral response by the listener); and m, is the total number of points (i.e. circles and numbers) at each value of IPD or ITD on the figure. L=n(IPD)+fl L=a(ITD)+/? 115mm” ,6 RMSE a a RMSE A 0.237 —1.6 10.0 0.035 —2.2 6.8 C 0.300 1.2 12.4 0.045 1.4 5.8 W7 0.218 2.0 11.7 0.033 0.7 4.3 Z 0.210 0.3 10.7 0.033 —0.3 4.2 X 0.050 —0.2 1.7 0.007 —0.1 1.2 Table 3.3: Best—fit with linear model listener L = a (1191))" + b L = a (ITD)° + b b q RMSE a b q RMSE A 19.00 —1.8 0.06 3.9 4.42 —1.9 0.27 2.7 C 13.88 1.2 0.18 7.7 0.84 1.3 0.56 3.7 W 14.60 1.3 0.09 8.8 0.04 0.7 0.96 4.3 Z 20.26 0.3 0.02 6.7 0.31 —0.2 0.66 3.5 X 0.58 —0.2 0.48 1.4 0.04 —0.1 0.74 1.1 Table 3.4: Best-fit with three—parameter model For each listener, i.e. on each of Figures 3.1 through 3.5, the data—points on the top panel in general had larger scattering than those on the bottom panel, indicating that ITD is a more reliable variable in modeling listeners’ lateral judgements. Similarly, the left blocks (IPD) of Tables 3.3 and 3.4 have larger RMSES than the righthand blocks (ITD), also favoring ITD over IPD as a reliable variable for lateralization. To compare the results more clearly, Figures 3.1 through 3.5 were summarized 131 and re—plotted in Figures 3.6 and 3.7, showing lateral responses vs. IPD and ITD, respectively. On each figure, weighted averages (Equation 3.6) and weighted standard deviations (Equation 3.7) of the responses in the correct quadrants were shown for each listener, and the plot for each listener was offset so that results of all listeners can be displayed on one figure. The axis labels for positive IPD/ITDs were shown on the right vertical axis, and those for negative IPD/ITDs were shown on the left vertical axis. Solid curves and dashed lines were identical to those on the original figures (Figures 3.1 through 3.5). Comparing the summarized figures with the original figures, one can easily confirm that Figures 3.6 and 3.7 do replicate the distribution of data—points in Figures 3.1 through 3.5. ml 23:1 "29' ‘ La AVG,= m, (3.6) 23:17“) SSE,=Zn,,(L,, AVG,)2 jzl STD,— SSEJ' (3.7) where AVG is the weighted average for each IPD/ITD value; SSE is the weighted sum of squared errors; STD is the weighted standard deviation; 2' is the index for the eleven IPD/ITD values on each plot; j is the index for each data-point at a specific IPD/ITD value; L,,- is the lateral response for each data-point (circles or numbers) in Figures 3.1 through 3.5, which is an average over 71,, responses; and m,- is the total number of points (i.e. circles and numbers) contributing to the averaged point at each IPD/ITD value on the figure. Four of the five listeners, i.e. C, W, Z and X, also participated in the experiments in Chapter 2. For each listener, the optimal q-parameter for sine—tone experiments in Table 2.8 in Chapter 2 is much smaller than the corresponding q-parameter for ITD 132 Response 7‘9150-120-90 -60 -30 o 30 60 90 120150 Interaural phase difference (deg) Figure 3.6: Sine-tone lateralization compared with IPD 133 Response L -191000 —500 Interaural time difference (us) -200 0 200 400 600 8001000 Figure 3.7: Sine-tone lateralization compared with ITD 134 in Table 3.4 in this chapter. The reason is that, the largest interaural delay of the stimuli in Chapter 2 was 2500 as; whereas in the experiments in this chapter, the largest delay was only 1000 us. Therefore the results in Chapter 2 demonstrated more compression due to the. data points at large delays, which can be visually confirmed by Figures 2.36 through 2.38 in Chapter 2. As will be discussed in the next section, the IPD is not a consistent cue for listeners, therefore it is not meaningful to compare the q-parameters for IPD between the experiments in this chapter and those in Chapter 2. 3.4 Discussion 3.4.1 Individual differences The five listeners in this experiment showed individual differences in Figures 3.6 and 3.7. For example, listener X had a much smaller range of lateral judgement than other listeners. In Figure 3.7 (ITD), when fitted with the three-parameter model, result for listener W were very similar to the linear model, with the power q close to one; listeners A and C, on the other hand, showed noticeable compression with q < 0.6; listeners Z and X showed medium compression with q z 0.7. In Figure 3.6 (IPD), all listeners showed compression. However there were individual differences as well. Listeners A, W, and Z had parameters close to zero (q < 0.1); whereas listeners C and X showed moderate compression. 3.4.2 Comparison of standard deviations For each listener, the standard deviations (error-bars) in Figure 3.7 (ITD) were usually much smaller than those in Figure 3.6 (IPD). There were 11 nominal values on the horizontal axis on both of these two figures. Studying the standard deviations for those 11 data-points can reveal information on whether IPD and ITD were reliable 135 listener p-value significant at 0.05 level A 0.026 yes C 0.000 yes W 0.002 yes Z 0.000 yes X 0.034 yes Table 3.5: T-tests for STD(ITD) < STD(IPD) variables, based on which the listeners made their lateral judgements. An advantage of this study is that it is independent of any model and only examines whether listeners’ judgements gathered reliably to form any possible function. Because the data-points at 0—IPD and O-ITD were identical on both figures, only standard deviations at finite IPDs and ITDs were compared. One-tailed two—sample t-tests comparing those standard deviations showed that, for any of the five listeners, the standard deviations in Figure 3.7 (ITD) were significantly smaller (at the 0.05 level) than those in Figure 3.6 (IPD). The p—values for those t-tests are shown in Table 3.5. This result suggests that ITD is a much more reliable variable than IPD as a cue to lateralize sine—tones. It was found that the sine-tones with IPDs less than or equal to 90° were normally lateralized within the region of centrality, and lateralization corresponding to alias images occurred mostly for large IPDs. Thus Zhang and Hartmann (2006) tested the standard deviations excluding the data with IPDs greater than 90°, and found the same result as in this section, i.e. the standard deviations in the plot against ITD were significantly smaller (at the 0.02-level) than those in the plot against IPD. 3.4.3 Fits to models It is informative to compare the fitting results (presented on Tables 3.3 and 3.4) of the experimental data between the two models, i.e. the linear model and the three-parameter model, and between the figures with respect to IPD and ITD. 136 comparing RMSEs of p—value significant at 0.05 level IPD linear > IPD 3—par 0.011 yes ITD linear > ITD 3-par 0.073 no IPD linear > ITD linear 0.011 yes IPD 3—par > ITD 3-par 0.015 yes Table 3.6: T-tests on RMSE The intercept parameters ,6 and b, depicting left / right bias, are small for all listen- ers for both models, which is reasonable in that normal listeners should have close-to— symmetrical ears, and thus lateralize stimuli nearly symmetrically on left and right. Moreover, for each listener, the best-fit intercepts on both Figures 3.6 and 3.7, and for both the linear model and the three-parameter model, i.e. H on Table 3.3 and b on Table 3.4, were almost identical, varying by less than 0.2 out of the range from ——40 to +40. This result revealed that the left / right bias was a consistent feature of a listener, invariant through different models and different horizontal variables (i.e. IPD or ITD) It is intuitive that, compared with the linear model, by adding one more parameter q, the three-parameter model should always improve the fitting, leading to smaller weighted RMSEs. This is confirmed by the fact that the RMSES on Table 3.4 are always smaller than the corresponding RMSES on Table 3.3. For the left blocks (IPD) on both tables, the difference was significant at the 0.05 level by a one-tailed paired t-test. For the right blocks (ITD) on both tables, the difference was however insignificant, which occurred because in Figure 3.7 (ITD) the solid curves (the three- parameter model) are very close to the dashed lines (the linear model), especially for listeners W, Z and X, indicating small compression effect. Thus adding the nonlinear parameter does not significantly improve the fitting for the data with respect to ITD. The p—values and the significances for the t-tests are shown on the top two rows on Table 3.6. The most revealing comparison is between the RMSEs in the IPD-fit and the 137 RMSES in the ITD-fit. Comparing between the left and the right blocks on either Table 3.3 or Table 3.4, it is discovered that the RMSEs on the right block (ITD) are always much smaller than the corresponding RMSES on the left block, and this is true for both the linear model (Table 3.3) and the three-parameter model (Table 3.4). One-tailed paired t—tests showed that the difference is significant at the 0.05 level for both models. The bottom two rows on Table 3.6 show the p—values and the significances for the t-tests. The fact that the models fit better (with smaller RMSEs) for data displayed on ITD (Figure 3.7) than those displayed on IPD (Figure 3.6) suggests that ITD is a more reliable variable to be used in those models. 3.4.4 Three—parameter model When fitting with the three—parameter model, for any listener, the q-parameters (i.e. the power in Equation 3.3) in Figure 3.6 (IPD) were much smaller than those in Figure 3.7 (ITD). The q—parameters in Figure 3.7 were mostly close to one. (Except for listener A, the q-parameters for all the other listeners were above 0.5.) By contrast, for all listeners, the q-parameters in Figure 3.6 were always below 0.5. Especially, three of the listeners (i.e. A, W and Z) had q close to zero. This result in Figure 3.6 should not be explained as compression, but rather, as little dependence on IPD as a variable. In the extreme case, if one had totally-random data independent of the horizonal variable, except that all the data-points were in the first and third quadrants, and if one tried to fit the data with the three—parameter model, one would expect the averaged results to be close to horizontal lines at about the center of the range of responses in both quadrants (e. g. if the data scattered from -40 to +40, the range in the first quadrant is from 0 to +40, and the center of the range is thus 20; similarly the center of the range in the third quadrant is -20), leading to a best—fit parameter q close to zero, and a scale factor a close to the center of the range. For the three listeners A, W and Z, whose q-parameters were close to zero, it is confirmed 138 (Table 3.4) that their scale factors (i.e. a) were between 15 and 20, about the center of the range for those listeners. 3.4.5 Monotonicity In Figure 3.7 (ITD), the data-points are clearly monotonic. There are just two ex- ceptions (i.e. the point at +600 [LS for listener A, and the point at —800 as for listener i), which deviated very little from being monotonic, and hence can be explained as statistical variation. However, in Figure 3.6 (IPD), three of the five listeners (i.e. A, W and Z) showed large deviation from being monotonic. In other words, for those three listeners, increasing the value of IPD did not always lead to a lateral position farther to each side. Yost (1981) showed that, in fixed-frequency experiments, a lis- tener’s lateral judgement increases monotonically as IPD increases. Therefore the non-monotonic result in Figure 3.6 must be due to the variation of frequency. This finding showed that IPD might not give a consistent lateral cue across frequency, and further supports ITD, instead of IPD, as a more reliable cue for lateral judgements of sine-tones. In summary, although the five listeners in this experiment demonstrated individual differences, they shared some important common tendencies, which suggest that ITD, instead of IPD, is a reliable cue for lateralizing sine-tones. 3.5 Conclusion An experiment on lateralization of sine—tones was performed with five listeners. In each experimental run. the interaural phase difference (IPD) and the interaural time difference (ITD) of the stimuli were varied independently, and the frequency varied from 100 to 1250 Hz accordingly. The listeners’ averaged responses were plotted against IPD and against ITD on separate plots. There were individual differences. 139 However, for each listener, by comparing the standard deviations, the RMSEs of the best-fit curves, the powers of the best-fit curves, and the monotonicities between the plots with respect to IPD and those with respect to ITD, it was found that, when including various frequencies in each experimental run, for three listeners (A, W and Z), IPD was not used as a consistent cue, whereas ITD was always a very reliable cue; even for the other two listeners (C and X), ITD is significantly more reliable as a cue than IPD. In conclusion, listeners tend to use IT D, instead of IPD, as a cue to lateralize sine- ‘ tones at frequencies below 1250 Hz, where the temporal cue is valid. This finding contradicts the experimental results by Sayers (1964) and by Yost (1981). The prob- able cause is that, in our experiment, the frequency was not fixed, and therefore IPD and ITD cues were varied independently; whereas experiments by Sayers and Yost were performed in blocks with a fixed frequency, although the frequency was varied among blacks. Since listeners might tend to use the whole range of the lateral-score within a block, results of Sayers and Yost could not directly discriminate judgements based on IPD and judgements based on ITD. This is indeed the advantage of our method introduced in this chapter. Being different from those previous experimental results, the finding in this chapter is however appealing because it supports the fa- mous model that Jeffress suggested with internal delay lines, and has happily solved the puzzle of the inconsistency between Jeffress’ model and the experimental results by Sayers and by Yost. 140 Chapter 4 Virtual Reality 4. 1 Introduction The human auditory system localizes sound sources with different cues, such as interaural level cues, interaural temporal cues and spectral cues. For localization in the azimuthal plane, interaural level cues and interaural temporal cues are used. However, in the sagittal plane, as for the task of discrimination between front and back sources, differences in interaural cues are minimal. Given a simplified spherical head model, which considers human head as a sphere with two holes at the ear positions, there is a cone of confusion. For sources on this cone, the interaural level difference (ILD) and the interaural time difference (ITD) are the same for all locations. Therefore, if two sound sources happen to be on the same cone of confusion, the listener cannot discriminate them by analyzing only the ILD and ITD cues. In this case, spectral cues, especially pinna cues (Musicant and Butler, 1984) that are caused by the asymmetric shape of human pinnea, are very important. A virtual reality experiment is a very good tool to examine the importance of different localization cues. By recording through the probe-microphones inside a listener’s ear-canals, one can measure the phase and amplitude spectra that the lis- 141 tener’s ears actually receive when a sound source is playing (real signal). Then by playing the properly-adjusted synthesis (virtual signal) through headphones or syn- thesis loudspeakers, the listener should receive spectra identical to the spectra of the real signals in both ears, and the listener should not be able to discriminate real and virtual signals because there are no physical cues to use. Then by control of the adjusted syntheses, one can present different cues at each of the listener’s ears inde- pendently, and experiments can be performed to test which ones are most important in determining a listener ’s localization perception. Wightman and Kistler (1989a) did experiments on headphone simulation of free field stimuli. In their experiments, a noise burst was used as signal. They first measured in an anechoic room the transfer functions from the loudspeakers at var- ious locations to probe-microphones inside a listener’s ear-canals (speaker transfer functions). Then they measured the transfer functions from the headphones to the probe-microphones (headphone transfer functions). By inverse-filtering the head- phone transfer functions from the speaker transfer functions, they achieved the trans- fer functions from the loudspeakers (at various locations) to the headphones, which is similar to the Head Related Transfer Functions (HRTF, describing filtering by the torso, head and pinnae as a function of various locations). Then with eyes blindfolded, while sitting in the room, the listener localized the simulation with headphones, and responded with azimuth and elevation angles. The results were compared with re- sults of localizing the actual signals from the loudspeakers. In these experiments, the listeners localized the signal very well in the azimuthal plane; however, the source elevation was not so well—defined. Phrthermore, there were more front-back confu- sions than with free-field sources. In fact, to deal with these confusions, Wightman and Kistler actually flipped the responses in the wrong hemisphere about the vertical plane through the two ears, and recorded the mirror-images in the correct hemisphere of those responses when analyzing their data. This was a weak point. It might be due 142 to the limited accuracy of their experiments. For example, the positions of the probe- microphones might be slightly different, when they measured the transfer functions for the loudspeakers and for the headphones, and this might lead to some error in the calculated transfer functions from the loudspeakers to the headphones, and there- fore the simulation might not be accurate enough. Since they did not have a check comparing the real signal (through loudspeakers) and the simulation (through head- phones), it was unknown how accurate their simulation was. Moreover, Kulkarni and Colburn (2000) showed that different fittings of headphones on a KEMAR manikin led to different signals at the ear-drums. The discrepancies were so pronounced above 8 kHz that a simulation of HRTFs using headphones became inadequate. However, for the front vs. back discrimination studied in this chapter, accurate simulation was essential at these high frequencies. This led to experiments by Hartmann and Wittenberg (1996) on externalization of sound sources. Like Wightman and Kistler, they used headphones to present signals simulating real-space sources. Instead of using noise bursts, they used complex tones. As an improvement, instead of measuring the HRTFs, they directly calculated the simulation for headphones based on the recording of the tones from the loudspeaker. In their experiments, the listener wore headphones and kept probe—microphones in his ear-canals in the entire run. The complex tones were first played through loudspeak- ers, and a recording was made through the probe-microphones. The complex tones were then played through the headphones, and a recording was again made through the probe-microphones. Based on these recordings, the simulation was directly calcu- lated and presented to the listener. The listener then did a confirmation test trying to discriminate the real signal and the simulation. If the listener did not succeed, the simulation was considered good. This was a very good improvement, as Wight- man and Kistler’s technique did not include this check. Therefore, Hartmann and Wittenberg could decide whether to continue the experiment with the achieved sim- 143 ulation, based on the listener’s feedback. However, Hartmann and Wittenberg could only use complex tones in their experiments. As for Wightman and Kistler, although they only used noise bursts, they measured the actual transfer functions (similar to HRTF). Therefore, with little adjustments, just by convolution with any given signal, Wightman and Kistler’s technique could simulate any signal from various locations. Because of the high accuracy in their simulation, Hartmann and Wittenberg could change just one cue at a time, and therefore they could examine the localization in more detail. Specifically with gradual change in the cues, the experiment could gradually “pull” the image outside a listener’s head. However, their experiments only showed good results in the azimuthal plane, and listeners could not localize in the sagittal plane. A possible cause for these failures was that, with headphones, a listener could not correctly use his own pinna cues, which are very important to localize in sagittal plane. The technique employed in this chapter, called the virtual reality (VRX) tech- nique, used loudspeakers to present the simulation at listener’s ears, which gave the listener an opportunity to use his own pinna cues to discriminate the front and back sound sources. In these experiments on front / back discrimination, level cues and phase cues were varied, and sometimes competing cues were presented. The goal was to examine which cues are important to maintain a listener’s localization perception of sound sources in front and back. Besides the methods introduced previously, as well as the methods in this chapter, different variations of the individual spectral cues have been used in psychoacoustical research. For example, Asano et al. (1990) applied n-pole/n-zero filters and smoothed out the microscopic structures at high frequencies, and found that only macroscopic patterns at high frequencies are important for front / back judgement. Another ex- ample is the experiments by Kulkarni et al. (1999), who modified the phase of the HRTFS to achieve minimum-phase systems, whose phase spectrum is a Hilbert trans- 144 form of the log-magnitude spectrum, and vice versa. They found that listeners are not sensitive to the phase spectra of HRTF except for overall ITD cues at low frequen- cies. Moreover, Zahorik et al. (2006) presented listeners with the HRTFS of another subject, and Hofman et al. (1998) put ear-molds on listeners, which was equivalent to listening through ears of someone else, and they all found improvements after training or adaptation. 4.2 Method 4.2.1 Spatial setup The experiments were performed in an anechoic room. The VRX technique cal- culated the transfer matrix for the steady state of signals, and therefore was only valid for the anechoic room. Theoretically, the experiments could be performed in an ordinary room as well. However, in an ordinary room, the decay time after the loudspeaker stops playing could give listeners a cue to distinguish between real and virtual signals. The VRX technique could be accommodated to an ordinary room by calculating the impulse response of the room, or adding a short white noise masking the ending of the signal. Because the steady state was the focus in this research, the experiments were simply performed in an anechoic room, except for a short test of the VRX technique in a normal room, as introduced in Section 4.4 in this chapter. The setup in the anechoic room is shown in Figure 4.1. There were four loud- speakers, all Minimus 3.5 (RadioShack) single-driver loudspeakers with a diameter of 6.5 cm. Using single-driver loudspeakers avoided cancellation between different drivers. The front and back speakers were source speakers, and they were selected to have similar frequency responses, though precise matching was unimportant. The left and right speakers were synthesis speakers, with no requirements on matching frequency response. The loudspeakers were at the ear level of a listener. The listener 145 13 Front source speaker afljLORDJfl Left synthesis speaker Listener Right synthesis speaker E7 Back source speaker Figure 4.1: Setup of loudspeakers in the anechoic room was seated at the center of the room, facing the front source speaker. The distance from the front and back source speakers to the listener’s ears was always 5 feet (1.524 meters). A vacuum fluorescent display was placed on top of the front source speaker, and displayed messages to the listener during the experiments. Two response but- tons, marked with green and red colors, were held by the listener’s left and right hands, respectively. Using the hand-held buttons instead of a response box reduced possible head—motion. During the experiments, the listener would push either of the two buttons for a response, or both of them to quit the run. 4.2.2 Alignment of loudspeakers and chair For VRX experiments, in order to avoid binaural cues, it was important to align front and back source speakers and the center of a listener’s head in a straight line. 146 There was a bite-bar with a length of 21 inches (53.3 cm) at the jaw level on the chair where the listener sat. It was used to eliminate the listener’s head-motion. On each end of the bite-bar, a microphone for alignment was attached. When the source speaker played a sine-tone, the outputs from the microphones, passed through a dual- channel pre—amplifier, were sent to a dual-channel oscilloscope outside the anechoic room to compare the relative phases of the two signals. By proper adjustments, if the relative phase of the signals appeared to be zero, the two microphones were at equal distance from the source speaker, which guaranteed that a line from the source speaker perpendicular to the bite-bar was aligned to the center of the bite-bar. To achieve high accuracy, a high-frequency sine-tone was preferred. However, a full- cycle error might occur using a high-frequency sine-tone. Hence we started from low frequency, and swept gradually to high frequency. First, a sine tone of 1 kHz was played through the front source speaker, and by aligning the front source speaker by eye, one could easily make the relative phase of the signals on the oscilloscope perfectly zero. Then while sweeping the frequency higher and higher. the relative phase, due to the time difference between the two bite- bar microphones, might increase gradually. The loudspeaker location was adjusted when necessary, so that the relative phase observed on the oscilloscope was zero. The frequency was swept up to 17 kHz. Surprisingly, when the frequency was swept back to low frequencies, the relative phase was found to have excursed a little when the frequency varied, which must be due to the reflection from the back of the chair and the loudspeakers. However, we observed that the relative timing always varied within 10 as (3.4 mm). By contrast, there was clear systematic change if a real distance-change occurred. After aligning the front source speaker, the above procedure was repeated for the back source speaker as well. In experiments, listener would sit in the center of the chair. There was a dark line marked on the center of the bite—bar. With the help of a 147 hand-mirror, the listener could bite the bar keeping the top incisors around the dark line. If a listener himself were left-right symmetrical, this approach would guarantee that his ears were equal—distance away from the front and back source speakers. A listener would bite the bar during an entire experimental run. 4.2.3 Signal generating and recording Figure 4.2 shows a block diagram of the analog signal path that was used in the experiments. The signals were generated by the digital-to—analog (DA) converters on the DD1 module of a 'Ibcker Davis System II, with a sampling rate of 50 ksps and a buffer length of 32768. After low-pass filtering at 20 kHz with a roll-off rate of —143 dB/ octave, the signals were sent to a two-channel Crown power amplifier. The outputs of the amplifier were then sent to individual loudspeakers in the anechoic room, by way of computer-controlled relays. For recording, two Etymotic ER-7C probe-microphones were placed inside the listener’s ear-canalsl. The probes were soft and safe, made from silicone rubber, and they were 76 mm long, with outer diameter 0.97 mm, and inner diameter 0.58 mm. Each of the microphones was connected with its own preamplifier with frequency- dependent gain (about 25 dB) independently, compensating the frequency response of the tube. The outputs, which have flat frequency response to the acoustical signal, were then input into a dual-channel preamplifier (AudioBuddy), which added 42 dB of gain, before the signals went out of the anechoic room. The output signals from the preamplifier were low-pass filtered at 18 kHz with a roll-off rate of —143 dB/ octave, and then sent to the analog—to-digital (AD) converters on the DD1 module of the Tucker Davis System II, with a sampling rate of 50 ksps and a buffer length of 32768. For confirmation test and front / back discrimination experiments, a raised-cosine 1In the experiment, the listener wore a velcro band on his head, and the two microphones were attached to the velcro band near each ear, and the probes of the microphones were placed inside the listener’s ear canal without touching the ear-drums. 148 82380:: 0:290me open to; 388me m -ocoa E»?— E A :88 20:85 2: 02.0.5 ~83on £855? § sea c832; $8593 93 t3 8%“on 850m UHT 8am “8?on 08:8 68m :5 My 95 1 a bee NE M: N3. -Bm a 8896 IV .395 I SHE N5 M: £25 A 063. 4 R2 coo—2d. 52c .95 5Q 11 a 95. NE om :BEM n cusom 82c 803mm l 523. Al NE ow $35 l_i 5580 :5 “9.2:. Figure 4.2: Block diagram of signal generating and recording 149 window of 100 as was applied to the generated signal, which was to eliminate clicks at onset and offset. 4.2.4 Stimuli and listeners The first stimulus used in the experiments was a complex tone with a fundamental frequency of 65.6 Hz, and with 250 harmonics. The harmonic amplitudes were chosen by starting with all amplitudes equal to one and then applying broadband changes as in Equations 4.2 and 4.3 to avoid broad hills and valleys in the response in the probe— microphones. The broad valley introduced near 3 kHz avoided the large emphasis of the external ear resonances. For some listeners, dips were found around 10 kHz. To make sure not too little power was in the frequency band around 10 kHz, a +10 dB gain was applied, which was also included in the gain functions of Equations 4.2 and 4.3. The gain function was applied to all listeners. The level of the stimulus was 80 dB at listener’s ears. To optimally use the dynamic range of loudspeakers without clipping, Schroeder-phases (1970) as in Equation 4.1 were used, giving the least variation in envelope. For a complex tone N m) = 20,, cos(n - 27r f0 t+ a.) 11:1 where f0 is the fundamental frequency, Cu is the amplitude of the nth harmonic, and on is the phase of the nth harmonic. Schroeder proved that if on satisfies n—l as. = 4,, — £23m — 0C? (4.1) [=1 150 where a is the total power the amplitude variation (i.e. the crest factor) is small. The Schroeder—phase deter— mined by Equation 4.1 is called “Schroeder—”, due to the negative sign before the summation in the formula. Our experiments only used Schroeder—, and the arbitrary constant (251 was set to be zero. Schroeder noise, both Schroeder+ and Schroeder—, lead to waveforms with small peak factor. This enables one to take best advantage of the dynamic range of the equipment, transferring the most power with the least chance of distortion. However, when coupled with the phase shifts caused by cochlear delays, the Schroeder+ condition leads to a pulse-like stimulus at the auditory nerve (Smith et al., 1986; Oxenham and Dan, 2001) . The Schroeder— condition employed here leads to a more uniform distribution of power throughout a cycle of the stimulus. Because the stimulus had components up to the 250th harmonic, the highest fre- quency was 16.4 kHz. Since eliminating frequencies above 16 kHz would not decrease the performance on median sagittal localization (Hebrank and Wright, 1974b), the frequency range is sufficient for front / back discrimination. The lowest two harmon- ics (f = 65.6 and 131.2 Hz) were omitted because they were below the loudspeaker range. To compensate for the mid-frequency peaks around 3 kHz in the recorded spectra caused by ear-canal resonance, and to share more power with high-frequency compo- nents around 10 kHz where most front / back spectral cues occur (as described earlier in this section), a gain function —65(fn — 0.9) -exp (—0.6f,, + 3.5) dB if 16 g n g 121 AL1(fn) = +10 dB if 121 < n _<_ 170 (4-2) 0 dB otherwise 151 where fn is in kHz, and exp(z) 9— e I, was applied from the 16th harmonic (f z 1.05 kHz) to the 170th harmonic (f x 11.14 kHz). This stimulus was called the source signal (Figure 4.3). As Professor John Middlebrooks suggested, the gain function as in Equation 4.2 was later smoothed around the 1213‘ and 170’” components as follows (Figure 4.4), —65(f.. — 0.9) -exp (—0.6f,, + 3.5) dB if 16 g n g 121 {1+sin[(fn—8.5)-7r]}/20 dB if121n). where f = n f0, and exp(a:) é e 9’. In what follows, a capital letter function of frequency, such as X (f), stands for a certain set of harmonic complex coefficients with amplitudes and phases. Suppose that, while the source signal is presented through the source speaker, the recording through the probe-microphones inside listener’s ears is —. Yo Y0(f) : L(f) a Yoaff) where Y0L( f) and Y03( f) are the recordings through the left and right ears, respec- tively. The source signal X (f) is then played as the calibration signal through the left synthesis speaker (Speaker (1 in figure 4.1) only, and the recording through the probe-microphones is _. W0, mm = ’(f’ W0R(f) Then source signal X (f) is played again as the calibration signal through the right synthesis speaker (Speaker ,[3 in Figure 4.1) only, and the recording is _. W W6 (f) = fiL(f) WcaU) Suppose the transfer matrix between the synthesis speakers and the listener’s two 155 ears is Half) His/.(f) H0R(f) HflR(f) H(f)= which, by definition, satisfies l'Va L l/Vg L X 0 (f) . (f) =Hm. (f) , Waldf) Wmdf) 0 X (f) WaL W131. H(f):_1_. (f) . (f) . (4.4) X (f) WORM) Waaff) The goal is to find a synthesized signal ff( f ), such that if the synthesis speakers play A( f), the recording through the probe-microphones is identical to the recording made when the source signal X (f) is played through the source speaker, i.e. identical to )70( f) The achieved A( f) is : X(f) Wgn(f) “WfiL(f) YOL(f) (45) WOL(f)WflR(f)*WoR(f)VVL3L(f) —W..R(f) WaL(f) 11mm ' where Aa( f ) and AM f) are the signals to be played through the a and 6 synthesis speakers, respectively. [This result can be tested by checking 170( f) = H (f) - A( f )] 4.2.6 Preliminary experiments Discrimination of front / back sources To assure that listeners in V RX experiments could actually discriminate between front and back sound sources, a preliminary discrimination experiment was performed 156 for each listener. The setup was similar to Figure 4.1, but with only front and back source speakers. Each experimental run contained 80 trials with the complex tone of 65.6 Hz at a level of 70 dB SPL, 40 trials from the front source speaker, and 40 trials from the back source speaker, in a random order. The listener’s task was to respond whether the sound came from front or back, by pressing the corresponding buttons. Among the 19 listeners in this preliminary experiment, 13 of them (listeners C, D, E, F, G, L, M, P, R, S, V, W and X) participated in other VRX experiments, and 6 of them (listeners B, H, K, N, Y and Z) did not. Each listener did two runs. Figure 4.5 (squares) shows the results, expressed as the percentage of correct responses averaged over the two runs, where 50% correct corresponds to guessing. The error-bars show the standard deviations. We expected that this would be a very easy task. Surprisingly, most listeners found it was not very easy to do, and some listeners even felt it was rather difficult, which was confirmed by the low percent correct in Figure 4.5, with a mean of 78.0% across listeners, especially compared with the other two stimuli introduced later. The preliminary experiment was therefore repeated with white noise. This time, most listeners found it was a very easy task, and the results of percent correct, shown as circles in Figure 4.5, were also much higher with a mean of 96.5% across listeners. (The only major exception was listener W, who found white noise much more difficult to localize than complex tones. However, his high frequency hearing was not as good as other listeners, and his results showed strong dependence on level as well.) It can be conjectured that the fact that the complex tone was periodic made the discrimination task difficult. To test this idea, a new signal, called “pseudo—noise”, was generated, simply off- setting the frequency of each component of the complex tone by a random value within a range of 21:15 Hz. The pseudo—noise has the same number of components as the complex tone, and has the same frequency distribution, except that those com- 157 100 90 80’ 70’ 50’ so: 40 100 80 70 60 50 40 100 90 80 7O 60 50 40 Percent Correct (2) 100 90 80 7O 60 50 40 90 . (b) L M P R S , L. P t 1 P - 0 white noise ‘ ’ A pseudo-noise ‘ ' Cl complex tone ‘ (d) . K N Y Z Listener Figure 4.5: Results of preliminary front / back experiment 158 ponents are not in a harmonic series anymore. Hence pseudo—noise does not give as clear a pitch sensation as complex tone. The results (Figure 4.5, triangles) show that most listeners could succeed in this task very well (percent correct of 91.3%, averaged across listeners), which was also confirmed by the subjective feedback that most listeners found pseudo-noise much easier to do than complex tones. Most listeners found that localization of the complex tone was much more difficult than the other two signals. (For listeners E, L, P and X, although the scores for complex tones were also high, their subjective responses also claimed that the complex tones were much more difficult than the other two signals. Major exceptions were listeners W and K, who found pseudo-noise more difficult to localize than complex tones) In general, most listeners, especially the 13 listeners in VRX experiments, found pseudo—noise much easier to localize than complex tones. (The exception was listener W, who later participated only in Experiment 8 with complex tones, which he was good at, and in the auxiliary testing.) To summarize the results, Table 4.2 shows the average score of percent correct over all listeners for each stimulus. The score clearly indicates that the overall performances for white noise and pseudo-noise were much better than the performance for complex tones. white noise | pseudo—noise I complex tone 96.5% | 91.3% | 78.0% Table 4.2: Average score of percent correct over all listeners in preliminary front / back experiment The advantage of the pseudo-noise over the complex tone is very clear experimen- tally, but it is difficult to understand. Both stimuli have discrete components with the same levels and approximately the same spectral spacing. Possibly relevant is the fact that for equal sound pressure levels, 80 dB, the complex tone sounded louder. Informal loudness matching experiments found that level of the complex tone had to be reduced in order to sound as loud as the pseudo-noise. The reduction was 5 dB 159 for listener W and 7 dB for listener X. The advantage for pseudo-noise may be related to the negative level effect (Hart- mann and Rakerd, 1993; Macpherson and Middlebrooks, 2000; Vliegen and Van Opstal, 2004) wherein a brief stimulus - click or noise burst - is less easily localized with increasing intensity. Neither the complex tone nor nor the pseudo-noise are brief bursts, they are continuous tones. However, the complex tone, with its large number of intense high harmonics, does have a pulse like character subjectively whereas the pseudo-noise sounds much smoother. Possibly an extended form of the negative level effect is at work here. Because most listeners performed better with pseudo-noise in the discrimination tasks, pseudo-noise was used, instead of complex tones, in the following experiments. The pseudo—noise was frozen, i.e. its spectrum was generated only once, by randomly offsetting each component of the complex tone. The frequency and phase of each component of the pseudo-noise were the same in all of the following experiments. The amplitude shaping described in Equations 4.2 and 4.3 was also applied to the 16“ ~ 170’” components of the pseudo-noise. It should be noted that the use of pseudo—noise rather than the complex tone has no effect on the formal generation of the synthetic signals because the matrix equations above do not depend on the harmonicity of the components. 4.2.7 Calibration sequence Before each run, a calibration sequence with the transaural technique was applied to synthesize the played signal through the synthesis speakers. Figures 4.6 through 4.13 show an example of a calibration sequence. In each calibration sequence, the source signal X (f) (Figure 4.6) was first played through the front source speaker, and recording 170( f ) was made through the two probe-microphones inside the listener’s ear-canals (Figure 4.7). Then the source signal 160 20=111 T 11 'l 111 l“l"I“‘l‘”l 1= CD 13 V C .9 * U C 3 W .E C ..50.=m~ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 :5 0 5K 10K 15K Frequency (Hz) Figure 4.6: Spectrum of the signal sent to the front source speaker BOfi I r. I. I {I I .r 1 Ti I" 'W'_ I "I = 0 Left ear . IYOI ‘ . . ; -- 0 Right ear > Listener: E ~ - _ -- 50 - a Q 3 so ' T) 40 b’ 6‘ > . a) _J 1— 20 .. _ P Recording (of front Source speaker: ‘ ‘ 0:11111.1.1-111.1.111-111 11a 0 5K 10K 15K Frequency (Hz) Figure 4.7: Ear-canal spectra for the front speaker playing the source signal X (f) 161 Level(dB) Level(dB) 20 0 _ 0 Left ear lW-al _ 0 Right ear Listener: E , . ==1 I. 1.. I...L.I..I .I .I I. 1.1.1.1....1 -1: 0 5K 10K 15K Frequency (Hz) Figure 4.8: First recording of the a synthesis speaker playing X (f) 8O =T I I I I I I I I I I I I I I I =' . 0'\ t _ : 7".“ :53.” 3W .- 60 Elf," , . {pf/"Go “N. . .. "‘33-— :9 ‘5 :N. 3 2 _ Q? ova " I»; && ‘ 8% ‘ ‘j "' o° ? 6 129 ° 0% c9 * i 40 — %%§§,80% go 2:388; é: 9698’. ,0 _ o 0 o o o ,o ; g g '% o o _ o c, o , .,H°.qfib_ o o O 20 .. o . o . _. __ 0 Left ear Iwgl _1 _ _ 0 Right ear i Listener: E . , 0.: 1 I I 1 1 I I I I I. 1. 1 1 1 1 1 Q 0 5K 10K 15K Frequency (Hz) Figure 4.9: First recording of the [3 synthesis speaker playing X (f) 162 Level(dB) Level(dB) 20 |A| Listener: E Simulation 1 for the front source _ I I 1 1 l I I 1 1 0 L I I I I I 5K 10K Frequency (Hz) Figure 4.10: Spectra of Simulation 1 sent to the synthesis speakers 15K 80 I I I I I I r I I I I l I I I I 0 Left ear 1Y1 — 0 Right ear Listener: E Recording of Simulation 1 for the front source 0111 I I I I I l I I I I I I 5K 10K 15K Frequency (Hz) Figure 4.11: Ear-canal spectra for Simulation 1 for the front source 163 L Level(dB) Level(dB) 80=IIIIIIIIIIIIIII 0 Real ' leLI 3. IYLI - 0 Virtual Listener: E - 60 — g > .- . - O . . 0 “1° °fi ~ : . I.- l’i‘i m % 9‘5 oi is o 40 m 3? s . — 0 , , 9’, 0 20 - - ’- CompariSon for Simulation 1 (front): A 0.: 1.1 1. I 1 11.1.1 14.1.1.1..1111,I 0 5K 10K 15K Frequency (Hz) Figure 4.12: Left ear-canal spectra for the real and virtual signals 80 I I I l' I I I I I I I" I I“! I 0 Real _ IYORI & IYRI - i- . Virtual ListenerzE 60- M ” {5.96; 3 4° 3 5.43519?» : s _ o F g - 9 q 20 +- _ _ Comparison for Simulation 1 (from): .. 0111111 11111 I 11111:, 10K 15K Frequency (Hz) 0 5K Figure 4.13: Right ear-canal spectra for the real and virtual signals 164 X (f) was played as the calibration signal through the a synthesis speaker with an extra 18 dB attenuation. Because the synthesis speakers were closer to the listener’s ears, the attenuated calibration signal X (f) achieved approximately the same level at the listener’s ears as the source speaker. A recording with 01, W0,( f ), was made through the two probe-microphones (Figure 4.8). The source signal X (f) was played again as the calibration signal through the 13 synthesis speaker with an extra 18 dB attenuation, and the recording with 6, WA f ), was made through the two probe— microphones (Figure 4.9). With these recordings, applying the transaural technique (Equation 4.5), the calculated synthesis signal A( f ) was achieved (Figure 4.10). This simulation was called Simulation 1, to be distinguished from the iterative simulation introduced later. The synthesis signal A( f ) was then played through both synthesis speakers, and the recording 17( f) was made through the probe-microphones (Figure 4.11). Figures 4.12 and 4.13 compare the recorded spectra of the real and virtual signals (Figure 4.7 and Figure 4.11). The spectra look almost identical, indicating good signal generation up to this point. 4.2.8 Optimizing the simulation Locating the synthesis speakers Theoretically, the transaural technique ought to work with arbitrary setup posi- tions of the source speaker and the synthesis speakers. Originally, the two synthesis speakers were set in front, on the left and right of the front source speaker. In infor- mal runs, the simulation with low-frequency components was good. However, when the high-frequency components were added in the source signal, the simulation failed and the listener could easily discriminate the virtual signal (the simulation signal played through the synthesis speakers) from the real signal (the source signal played through the source speaker). The failure could also be seen as a discrepancy between the real and virtual signals by comparing the recorded spectra. We assumed that this 165 discrepancy was due to some motion of the listener’s head, because there was very little discrepancy in the recorded spectra with the KEMAR (the dummy head which was perfectly stable during the run). Motion affects the high-frequency components more because the wavelength of the high-frequency components is smaller and hence the small head-motion caused a larger change in those components. Therefore we , built a bite-bar on the listener’s chair, and asked the listener to bite it during the entire experimental run. It greatly improved the simulation, however, it was not ideal yet because the listener could still easily distinguish the real and virtual signals. The usual cue for the listener was a timbre change between the signals. Many times, a certain component could be so strong at the listener’s ears that it popped up as a separate tone above the stimulus background. Possibly this was due to the motion of the whole body and chair on the wire grid floor of the anechoic room. To improve the quality of the simulation, we changed the setup and put the two synthesis speakers directly on the left and right of the listener’s head a short distance (about 1.2 feet) away from the ears, believing that the head shadow would block the two synthesis speakers very efficiently, leading to larger ILDs at high frequencies. Therefore at high frequencies, the listener’s left ear mainly heard the a synthesis speaker; and his right ear mainly heard the 6 synthesis speaker. It can be shown, as follows, that compared with the synthesis speakers in front, the new setup creates a virtual signal that is less sensitive to motion. Consider a simplified model with an ideally symmetrical head and two identical synthesis speakers that are setup symmetrically on the left and right side in front of the listener (Figure 4.14). When playing the source signal as the calibration signal through the synthesis speakers, the recording is WaL(f) W6L(f) WaR(f) WBR(f) W(f) = 166 E: [E source speaker E a—spcaker ,B-spcakcr Figure 4.14: Simplified model with symmetrical head and synthesis speakers setup symmetrically Given the symmetrical condition, at each frequency f, ”"0 I. = [Van = W6, Wf'ffL : WOR : W70 - C ei¢. 1 According to Equation 4.5, the simulation signal is [if A0 X 1 _C 82¢ 3/0L _ Afi — W32 (1__ C2 Cf 2d)) —C 61¢ 1 YOR Suppose the listener has a very little motion, which leads to an extra phase in both ears, and almost no level change, then, according to Equation 4.4, the transfer matrix becomes I I HI _ 1 w! _ 1 (1L BL X X 1' n 1 ,1 ’ OR 13R where I _ , I _ ,7 WaL — f1m — W02 W61. 2 (in = W0 ° C €’(°+A¢’. 167 Therefore, the recording of the simulation signal is {1 _ C2 ei(2o+Aa>) C(ei(¢+A¢) _ em) I 1_Czetao 1_c2812¢ Y 170': “L =H'.A'= 0” YOIR C(ez’(¢>+Ao) _ etc») 1 _ C2 ei(2¢+A¢) YOR \ 1—C2ei2‘” 1_C'2€i2do a 11’170 (4.6) For the ideal case with no motion, i.e. Act = 0, then, When there is motion, i.e. A66 71$ 0, then, the result depends on the value of C. When the synthesis speakers are placed on the left and right of the listener, and close to the ears, the ILD is about 20 dB at high frequencies, which corresponds to C m 0.1. When C is small, from Equation 4.6, T’ x . (4.7) Therefore the recording is approximately the same as the ideal case with no motion. However, when the synthesis speakers are placed in front, C has a larger value, i.e. C > 0.1. Then, from Equation 4.6, T' would not be so ideal as Equation 4.7, especially, it would have cross terms. When C a: 1, the whole system is very sensitive to the value of q). For one special case, when C = 1 and d) = It, the four elements in matrix T' can diverge. Therefore, at certain conditions, one frequency component might be so strong that it would pop up above the stimulus background. It is this altered T' which leads to a timbre change. In contrast, for the real signal, i.e. when 168 the front source speaker plays the source signal, the little motion only puts in an extra phase to both ears, and thus the amplitude spectra are approximately the same as the ideal case when there is no head-motion. Therefore, by comparing the real and virtual spectra, the listener could notice the timbre change. In general, placing the synthesis speakers on left and right and close to the lis- tener’s ears gives the system less sensitivity to motions, which leads to better simu- lation. Iterative calibration To optimize the simulation, an iterative calibration sequence was applied. In the above calibration sequence, the signal used to calibrate the synthesis speakers was the same as the source signal, X (f) This signal is certain to be very different from the final simulation signal sent to these speakers, namely Aa( f ) and Afl( f) To eliminate the influence of distortion in the synthesis speakers, it would be better if the calibration signal played through the synthesis speakers was similar to the final simulation, i.e. to the synthesized signal A( f) Therefore an iterative calibration was applied after the original calibration, replacing X (f) (Figure 4.6) with A( f) (Figure 4.10) as the calibration signal sent to the synthesis speakers. In the iterative calibration, the a-speaker first played the signal AQ( f ), and the recording W,“ f) was made through the probe-microphones (Figure 4.15). Then, the ,B-speaker played the signal A5( f), and the recording W“ f) was made through the probe-microphones (Figure 4.16). Applying the transaural technique with the new calibration signals, the iterative synthesis signal could be calculated as in Equation 4.8 (Figure 4.17). 169 60 I I I I I I I I I o 5 I W 83 5: § ‘3 025% E J (g; 0 MW? e on? g o 40 —..Mq% o «g 9 Go 0 o' D o O a . :31 g - .11....5-‘n ......- 10. o: 2'". i- ‘ O {:2 r . ...... . . E 20" . 1' ; . ...-..‘n . '.' _ ‘ > . . . . . , .. . m . .. .. ... . —I '— s . s - -' 0 - - - _ 0 Left ear IW’QI _ 0 Right ear Listener: E (front) -20 I I I I I [A I I I .I I . I . I I J 0 5K 10K 15K Frequency (Hz) Figure 4.15: Second recording of the a synthesis speaker playing Simulation 1 Aa( f ) 60 I I I I I I I I I L. .I I I I“ I I 43:. .' .. '9 g - a. 9. . ’5‘” I ‘2 —l a: l.' V o '3 ‘ .' . . . .0 . c .5 7 \ . %%o.: 40 %% _;‘ 'a. N. “fl"..hz'r‘ ., '. o o o . _ ......“ o . . ' : .'e o o g 0 .1500 s 0 g 0 A ,- o 8 _ e a .afiw , g «was $20,, 1’ 20— %°° $8§$°$°° °°° ° 96?- q>) 0 0° 08& 0%? (g 0 0 oo 0.) O O 000 O 00 O (P _l _ 0 ° ° ° 0 I O '- _ o b 0 Left ear Iw/Bl 0 Right eor Listener: E (front) -20 I I I I l I I I I I I I I I I I 0 5K 10K 15K Frequency (Hz) Figure 4.16: Second recording of the ,8 synthesis speaker playing Simulation 1 AQ( f ) 170 s A’ Am ___ .0) AW) : 1 Aug) 0 WW) — am mm WW)W134f>“"""éR Warm 14mm ___ 1 AM) gar) -Aa(f) am mam) Iva-LU)Wig/i(fl—LIZRUHI3L(f) —Ae(f)VV;R(f) A5(f) ;L(f) Yon(f) In Equation 4.8, the signals to calibrate the two synthesis speakers, namely Aa( f ) and A;;( f), are different. The iterative simulation (Simulation 2) was then played through both synthesis speakers, and the recording I7’ (f) was made through the probe-microphones (Figure 4.18). The large peak in fl synthesis signal near 10 kHz in Figure 4.17 (as well as on previous Figure 4.10) is not typical, but it demonstrates the kind of effect that can occur due to head and spatial geometry and the interference between a and B signals in order to create a good synthesis as shown in Figure 4.18. The advantage of the iterative calibration is that the spectrum of the calibration signal is similar to the spectrum of the result. However, the disadvantage is that at some frequencies, the amplitudes in the calibration signal are rather weak, which leads to large calculation errors. The next step was to select the better simulation between the two syntheses, /I( f ) and fI’( f ) For each frequency component f, the errors between the recording of the simulation and the recording of the original source were compared for both Simulation 1 and Simulation 2, i.e. comparing Errorm é MaxtllYLUH — mam], IIYRmI — 1mm!» and E'rrov"(f) 9: MaX(I|Y£(f)l-|%L(f)|(,IIYA(f)|—lYoa(f)ll)- 171 Level(dB) Level(dB) 20 I l I I I I I I I I I I I I I I O a signal IA’I — O B signal Listener: E _. 0 — ... .s. .— ’ ”‘3 as. «g é 6” _20 $0 * . 0’. CD 8%? j ' '41-}. O O ' Simulation 2 for the front source I I I I I I I I I I I I I 5K 10K 15K Frequency (Hz) Figure 4.17: Spectra of Simulation 2 sent to the synthesis speakers 80 I I I I I I 0 Left ear — 0 Right ear F IYW Listener: E I I I I I 60'- _ f‘b'fi’ ”M °-. -" fifi" ' 6‘ were -' 6 . o. s - o — o «f 2 o- 20 — ‘ Recording of Simulation 2 for the front source _ I I J I I I I I I I I I I I I I 5K 10K 15K Frequency (Hz) Figure 4.18: Ear—canal spectra for Simulation 2 for the front source 172 The simulation that led to less error was selected as the final simulation at this frequency. For example, if Err0'r( f ) < Erro'r’ (f), then Simulation 1, i.e. /I( f), would be selected, otherwise Simulation 2, i.e. A” (f), would be selected. The new simulation signal was called X” (f) (Figure 4.19), and it is a combination of X( f) and 2m). To further optimize the simulation, ff” (f) was played through the synthesis speak- ers, and the recording F”( f) was made through the probe-microphones (Figure 4.20). Then the percent error of the amplitude spectrum between the recording of the sim- ulation and the recording of the source was evaluated as (ilY£’(f)l — |Y0Ll|> ErrorL(f) é II’hLIfH X 100%, A (llYé’(f)l - |Y0R(f)||) ErrorR(f) : IYORUH . x 100%. The components that deviated from the recorded spectra of the source by more than 50% (either Erro'rL( f ) > 50% or ErrorR( f ) > 50%) were eliminated, which corre- sponded to an error larger than —6 dB and +3.5 dB. Those frequency components, which often gathered in certain bands that were different for different listeners, were eliminated from the experimental run that followed the calibration. The eliminated components were usually very few. If more than 20 out of the 248 components were eliminated, the calibration would be re—started from the very beginning. Figure 4.21 shows an example of recording in the right ear of one experimental run. On the figure, there was one component marked with an oval, that was eliminated. The eliminated components will be plotted at 0 dB in the following figures. Sometimes eliminated components were in clusters, leading to spectral gaps. No study was made of the dis- tribution of eliminated components. Instead, the runs for any given experiment were not all done successively, a procedural element that should randomize the distribution of eliminated components. 173 Level(dB) 20=IIrI1IIIIIIIIII O a signal IA”| _ O 8 signal Listener: E o e , . rs, ‘ gfl -20$°, _ . ..... .a ég *. _ fl.“ ’ -40 ... 39 z. ._ - The penultimate-simulation for the front source A -5031 I I LI I I I I. I IIIII Igl. 0 5K 10K 15K Frequency (Hz) Figure 4.19: Spectra of the penultimate simulation sent to the synthesis speakers Level(dB) 80'= I I I l“ I" II "I II I I I”! “l 0 Left ear ' le”l . : -~ 0 Right ear , Listenerz-E Recording penultimate simulation for front source 0 1 I I I I I I I .I .I I I. I I , I .I 0 5K 10K 15K Frequency (Hz) i :4 Figure 4.20: Ear-canal spectra for the penultimate simulation for the front source 174 80 ‘l I I I I I I ’I I I I [M I 'I'I 'I 0 Real IYQRI 6: IYHRI - 0 Virtual Listener: Y - 60 - - A 0 fl “Fax % CD .0 a 9 % & g g _ 3 a? a & ° 0 E 40 — 96% ° a ‘9 £33 0 o '- a o 3 — 3 as? a O 20 - - 0 I I I I I I I I I I I l I I I I 0 5K 10K 15K Frequency (Hz) Figure 4.21: Real vs. virtual recording in right-ear with one eliminated component The ultimate simulation signals to be played through the synthesis speakers, i.e. 1””) with some components eliminated, are called baseline syntheses, A?“ f). When ff’”( f) is playing, the recorded spectra I7”’( f ) are called baseline spectra. Figures 4.22 through 4.24 show the recorded spectra for the real and virtual signals I70( f) and 17’” (f) of an excellent simulation with no component being eliminated. All of the decisions, such as selecting between simulations and eliminating the components, were based on amplitudes only, which are more important than phases at high frequency where front/back cues appear. However, phase was also part of the calculation. Therefore a plot showing the phase differences between the recorded spectra of the real and virtual signals was also included in Figure 4.24. Figures 4.22 through 4.24 demonstrate that both the amplitudes and the phases were simulated very well. The flow diagram of the complete calibration sequence is shown in Figure 4.25. 175 Level(dB) Level(dB) 80=I Fall I II I IIET I II I*I= 0 Real ‘ IYOLI & IY”’._| 9 *- 0 Virtual Listener: E ; — 60 l- 1°: ‘ I: ' ‘ ' '. I . "" O ’ % % a ”go ‘ 2 4o 3%" V'; '5’ - - O 20 - ‘ - — Comparison farzultimate simulation" (front) I ‘ CE I ...I. I. I I I ...I J ..I .. I. I. , .I.,..,..I ....I I. ,,.I_._._§ 0 5K 10K 15K Frequency (Hz) Figure 4.22: Left ear-canal amplitude spectra for the real and virtual signals 80‘= I I l" l"l I “I ‘I '1 I 'l' 1- I 'I"I ”i =1 0 Real ~ IYORI & IY”’RI ' 1 ' - 0 Virtual - ’ Listener: E 50—~ 20; ' * “ 40b 43ng . 20*- % 6 Comparison 5for, ultimate simulation (front) - OILI I I I I I I .l I I I I I I I I = 0 5K 10K 15K Frequency (Hz) Figure 4.23: Right ear—canal amplitude spectra for the real and virtual signals 176 180gllIlIFIIIIIIII'lIa 0 Left ear arg(Y”’)-arg(Y0) -. Right ear Listener: E 3 - ’5» 90 F I - m 3 B m c: it '6 2 '— -I .S: s> —90 W'QAW’W'M'IWVM’ T I— ‘--I _180%I I I I I I I. I I I I I I I. I I g 0 5K 10K 15K Frequency (Hz) Figure 4.24: Ear-canal phase differences for the front real and virtual signals. The right-ear phase differences are displaced by —90°. 4.2.9 Confirmation test After the calibration for the front source speaker, a confirmation test was applied to examine whether the listener could distinguish real and virtual signals. In the confirmation test, as in experiments to follow, the signal was given an envelope with raised-cosine onsets and offsets with 100 ms separating the 10% and 90% amplitude points. This test contained 20 trials (10 real and 10 virtual in a random order). In each trial, the listener would press the corresponding response-buttons after hearing a real/virtual interval. If the number of correct responses, NC, was 5 < NC < 15 (i.e. 25% ~ 75%). it was confirmed that the listener could not distinguish between the real and virtual signals, and the experiment continued; otherwise the calibration sequence started again from the very beginning. If the synthesis passed the confirmation test, the whole calibration sequence would 177 ' Play the source signal through the front source speaker and record through the probe-micrOphones in the ear canals 1 Play the source signal through the a synthesis speaker with an extra 18 dB attenuation and record through the probe-microphones in the ear canals 1 fl . . Play the source signal through the I? synthesis speaker with an extra 18 dB Or 2511' attenuation and record through the probe-microphones in the ear canals n Cali- l l . Matrix-calculation with the source signal as the calibration signal to bration . . . . achleve the a and ,6 Simulatlon Signal 1 Play the simulation signals through both synthesis speakers and record through the probe-microphones in the ear canals 1 Calculate and record the error between the recorded spectra of the \ simulation and the recorded spectra of the source 1 f Play the a simulation signal through the a synthesis speaker and record through the probe-microphones in the ear canals I Play the B simulation signal through the fi synthesis speaker and record through the probe-microphones in the ear canals Itera— l tive < Matrix-calculation with the simulation signals as the calibration signal to Cali- achieve the new a and B simulation signal bration 1 Play the new simulation signal through both synthesis speakers and record through the probe-microphones in the ear canals 1 Calculate and record the error between the recorded spectra of the new \ simulation and the recorded spectra of the source r 1 Decide the best baseline synthesis between the two simulations 1 SCICC- 1 tion Play the baseline synthesis through both synthesis speakers and record of . . O t' , through the probe-microphones 1n the ear canals p 1' mum a o l o a Simu— Eliminate the components from both the source Signal and the baseline 1 ati on synthesis whose amplitudes in the recording of the simulation signal \ deviate from the recording of the source signal by more than 50% Figure 4.25: Flow diagram of the calibration sequence for the front source speaker 178 repeat for the back source speaker. If the back-source synthesis did not pass the confirmation test, the experiment re-started from the very beginning, i.e. it was re- calibrated for the front source speaker as well. If the synthesis for the back speaker also passed the confirmation test, one of the following front/back discrimination ex- periments would follow. The duration for the calibration sequences and the confirmation tests was approx- imately 2.5 minutes. The confirmation run provided a direct subjective evidence by the listener for good synthesis. On the other hand, the number of components that were taken out is an objective factor showing how good the synthesis is. As a standard, we allowed 20 or fewer components (out of the 248 components) to be eliminated. From our experience, when this condition was satisfied, no listener could ever discriminate real and virtual signals. Therefore, to make the run shorter and give the listener more convenience, we eliminated confirmation runs during some of the VRX experimental runs, and used only the number of eliminated components to monitor the synthesis. The confirmation runs were added randomly as a double check, approximately an every tenth run. 4.2. 10 Hearing level Because vertical-plane and front / back localization depends on high frequency com- ponents, it was necessary to see whether the listeners could actually hear all of the components, or at least most of them. The hearing test procedure employed is de- scribed below. Figure 4.26 shows the signal sent to the loudspeakers in the VRX experiments. The level scale on the vertical axis is established with an arbitrary reference. In order to find out how well the listener hears the signal at a certain frequency, the signal level needs to be compared with the hearing threshold, i.e. to determine the hearing 179 Gain function (dB) _ q -40 - - _50=.I I I I .1..l.-.l .1 J. .I._,I III, ..I. I- I; 0 5K 10K 15K Frequency (Hz) Figure 4.26: Source signal sent to the front loudspeaker level. The “hearing level” is defined as the level of the signal being played relative to the hearing threshold level. To measure the hearing thresholds, Bekesy tracking was performed for every lis- tener (one measurement for each ear) with the front loudspeaker. (In the VRX experiments, the level of the back loudspeaker was the same as the level of the front loudspeaker, and the loudspeakers were specially selected to have similar frequency responses. Therefore it was adequate to measure with just the front loudspeaker.) The block diagram of the signal generation is shown in Figure 4.27. The signal was a pulsed sine tone, generated by a computer-controlled frequency generator (WGZ). The frequency of the tone increased from 200 Hz to 16 kHz linearly for about 8 min- utes. The level of the tone was varied by a computer-controlled attenuator (PA4) for a range between O-dB and 100—dB attenuation. The level was calibrated with a pure tone at 1 kHz and with the PA4 set to O-dB attenuation. The measured level 180 Power WG2 ___) PA4 __fi Amplifier i m VU Meter Figure 4.27: Block diagram of Bekesy tracking of the calibration tone was 80 dB at a position close to the listener’s ears. During the measurements, the listener was sitting on the chair as in the VRX experiments, but with one ear plugged to test the open ear only. If the listener heard the tone, he would press the button on the response box, and the level of the tone would decrease gradually by increasing the attenuation on the PA4. The listener would not release the button until he could not hear the tone; then the attenuation on the PA4 would increase again. This whole process continued for the complete range of frequency, and the listener kept on pressng and releasing the response button. Because the level of the tone generated by the WG2 was the same for all frequencies, the signal level sent to the loudspeaker was just the calibration level minus the readings of the attenuation on PA4. The average level of adjacent turning points when the listener changed his response (i.e. when he started or stopped pressing the button) shows the hearing threshold of the ear on the scale of the signal level sent to the loudspeaker referenced to the 80-dB 1000-Hz calibration tone. Because the frequency response of the loudspeaker was not ideally flat, the mea- sured level is exact only at the frequency of the calibration tone, which was 1 kHz. The solid and dashed curves in Figure 4.28 through Figure 4.39 show the measured threshold levels, which are affected by the frequency response of the loudspeaker and the transfer function between the loudspeaker and the listeners’ ears. To achieve better detail at the calibration frequency, 1 kHz, each listener did another 2 minute run of Bekesy tracking for each ear, covering a smaller frequency range between 200 181 Hz and 1200 Hz. In general, the hearing levels were measured for 12 out of the 14 listeners participating in the VRX experiments (except for listeners G and W). Because the signal generation in both VRX experiments and Bekesy tracking used the same front loudspeaker, it is possible to compare the signal in VRX experiments and the hearing thresholds measured by Bekesy tracking. In order to compare them on the same plot, the source signal level should be on the same scale as the hearing thresholds. The following describes the detailed method employed to scale the source signal level. First the source signal (Figure 4.26) was played and its level was set to be the level in the VRX experiments (80 dB at the position close to the listener’s ears). Then all the components of the source signal were eliminated except for the component at 1034 Hz, close to the 1-kHz calibration tone used in Bekesy tracking. Because it was too weak to measure, the power amplifier gain was increased by 20 dB, and the level was measured as 66 dB close to the listener’s ears. Therefore the level of this component alone with the original gain was 66 dB — 20 dB = 46 dB. Then the level spectrum in Figure 4.26 was translated upward to set the level of the 1034—Hz component equal to 46 dB, and the hearing thresholds of the listener were plotted on the same plot. In Figures 4.28 through Figure 4.39, the level difference between the source signal (open circles) and the sine tone in Bekesy tracking (solid and dashed curves, for left and right ears respectively) is the hearing level, which shows the VRX signal level with respect to hearing threshold at each frequency. 4.2.11 Accuracy of synthesis at the ear-drums The transaural technique discussed in this chapter controls the signal being played through the synthesis loudspeakers so that the recorded spectra at the probe—tips (the tips of the probe-microphones) inside the listener’s ear-canals were the same as the recorded spectra for the real source signal. Because the tips did not move during the entire run, if we further assume that the transfer function between the tip and the 182 Level (dB SPL) Level (dB SPL) 60 A O N O 60 A O N O I . l I l f I 1 l I I I I j l I - Listener A - P - Threshold . solid line: left ear _q I dashed line: right ear ‘ . C « . > H 0 .1 ,'. —" ’l‘ ‘ '\ ' \‘ ’l\‘ I" -i - .-- ',l “ ' .’l " \‘ '1 I q r l "“ ” |\’l . l- ‘ I‘ ~ ' - h I -l l I j I l I 1 I l I l I J— I L 0 2 4 6 8 IO 12 I4 16 Frequency (kHz) Figure 4.28: Source signal with listener A’s hearing thresholds I . l I ‘ I I r T f l I I U 1 t - Listener C - P - Threshold 3- ", .. solid line: left ear '9. _ dashed line: right ear ‘ " . l l I l I l l L l l 6 8 10 I2 14 16 Frequency (kHz) Figure 4.29: Source signal with listener C’s hearing thresholds 183 Level (dB SPL) Level (dB SPL) 60 60 4O 20 I l I I I I I I T I I I I I I Listener D I Threshold sofid finezleft eor dashed line: right eor \ , C .—-—-—r I 6 8 l 0 12 14 16 Frequency (kHz) Figure 4.30: Source signal with listener D’s hearing thresholds I I I I I l I I I I I T I F I l- Listener E - Threshold . solid line: left ear _ dashed line: right ear _ I > In % f l .1 *- q — " ‘| -\ "\ s I “0"," L I ' .I’ ""\‘ I, ‘ ‘\‘ - h‘ ' I D ...; “‘ r."" ’ ‘\ ‘1' " 0 " v -1, l V 1 I l, n I n L L I n J l 2 -' 4 6 8 10 12 14 16 Frequency (kHz) Figure 4.31: Source signal with listener E’s hearing thresholds 184 Level (dB SPL) Level (dB SPL) 60 40 20 60 40 20 I I I l I l I I I I I I I r I - Listener F - 0' —1 Threshold 5 * a, I . solid line: left ear 3 ’r. ' _ dashed line: right ear Ms " l . .. ., P t ‘ I \ d 0 0 #» C‘s“ ‘ ‘ ~ I d p '0. “C ’ ‘\ I, d ‘r' " \‘ a, P; ‘ § ' ‘ "’ ‘ \ ’ .1 l- : I , “‘ ,' . l— I ' \\ ,’ _- I ' , \ I- ' ,s a - \,‘ \‘ . ‘ ' — ' \ s ‘ —l \ I‘ Q, h ‘ '0 \ q - 1 I I I I I I I J I I I I I I I 0 2 4 6 8 10 12 14 16 Frequency (kHz) Figure 4.32: Source signal with listener F’s hearing thresholds I I r l I I I I I I I r I l I - Listener L d I— and Threshold 5» ‘ g I solid line: left ear 1:. t. ‘ dashed line: right ear _ " .. . . I . > ‘t“ I 0 P‘ u“ l ‘ '|' I- ... c“ I, ... 'r I, ' O ” I - ‘ ‘ 1’ ‘s ' .- b 1’ “ 0' cl I, \ ’ \ ' I, I \ I I d ‘ ~ ’ \-l ‘I b . ' ‘ ‘ ' ’ d I ‘ a I — I ‘\ 1’ “ 0' - - I ‘ 0 «I \‘ l, " — ‘ ‘ I- - d - -l I I I I I I I I I L I I I I I Frequency (kHz) Figure 4.33: Source signal with listener L’s hearing thresholds 185 Level (dB SPL) Level (dB SPL) 60 60 ' l r I ' I ' I ' l ' l ' l ' p Listener M . Threshold .4 solid line: left ear _ dashed line: right ear . C it , ... , ' . ‘ '.' 4 _ ‘4" ‘ .\ 'l _- l' ‘\ 1' - 1 I l l - l I l I l L l I l I l 0 2 4 6 8 10 12 14 16 Frequency (kHz) Figure 4.34: Source signal with listener M’s hearing thresholds I l I —' I I I l I I I l I l I *- Listener P . Threshold . solid line: left ear _ dashed line: right ear ..... . , I I I I I I I I L I I I I I I 0 2 4 6 8 10 12 14 16 Frequency (kHz) Figure 4.35: Source signal with listener P’s hearing thresholds 186 Level (dB SPL) Level (dB SPL) 60 40 60 I l 1 I I l I r f 1 I l I r I Listener R - d Threshold , " c, .1 solid line: left ear 5 9. dashed line: right ear ,. " ~ . i . ~11" d “'~‘ I‘ I, “ 4 s l \ o ‘ 5'! ll . \‘ I, \'\‘ " \‘ l” d s l, I, ‘\ I. ‘ - ‘ "‘0’ ’ - I: “ ’0 \\ " 4 I , ” u I ' I ‘ I - \‘ ’, "\a ‘ I I I I I I I I I I I I I I 2 4 6 8 10 12 14 16 Frequency (kHz) Figure 4.36: Source signal with listener R’s hearing thresholds L . l ‘ I ' l IstenerS Threshold sofid finezleft ear dashed line: right ear Frequency (kHz) Figure 4.37: Source signal with listener S’s hearing thresholds 187 Level(dB SPL) Level(dB SPL) 60 40 60 4O r ' _l fl I ' I l l - Listener - Threshold : solid line: left ear _ dashed line: right ear I O 2 4 6 8 10 12 14 16 Frequency (kHz) Figure 4.38: Source signal with listener V's hearing thresholds ' l ' l l ' l ‘ l T - Listener X . Threshold _ solid line: left ear _ dashed line: right ear l fi I l I I I I I I l 6 8 10 Frequency (kHz) Figure 4.39: Source signal with listener X‘s hearing thresholds 188 ear-drum remained the same, we would conclude that the ear-drum received identical signals when the probe-microphone received identical signals at the tip. However, this might not be true in the ear-canal, because the incident sound wave and the sound wave reflect ed by the ear-drum establish a standing wave inside the ear canal. In case of a standing wave, the recording is very sensitive to the position of the probe-tips. For instance, when the tip position happens to be a node for certain frequencies, the recorded level of those frequency components is very low, which leads to large error in the synthesis. To check the accuracy of the VRX technique, further testing was performed with KEMAR ears. The KEMAR has artificial ear-canals. An Etymotic ER—ll micro- phone (called “KEMAR microphone” in the following text, to be distinguished from the probe-microphone) is built in at the end of each ear-canal. During the test, the KEMAR was placed in the anechoic room, and the probe-microphones were inserted as in the VRX experiments with human subjects. The probe-microphones were in- serted so deep that their tips touched the end of the ear-canals. Then the tips were pulled out by about 1 mm. A complete calibration sequence was performed, and recordings were made through both the probe—microphones and the KEMAR micro— phones. Then the probes were pulled out by about 1.5 mm at a time, and spectra were recorded each time. The recorded spectra in the right ear are shown in Figure 4.40. These spectra show that the recordings at the probe-tip (tOp curves) had large variation. However, when presented with the synthesis that had been calculated by the VRX method based on those three different recordings, the KEMAR microphone (analogous to human ear-drum) received very similar spectra with very little variation (bottom curves). For example, at 14 kHz, the recorded level at different tip-positions could vary as much as 9 dB, while the recorded level at the KEMAR microphone varied by only 1 dB. The recorded spectra in the left ear show similar result, al- though they are not presented here due to limited space. This result indicates that 189 80 I I I I I I F T I I I I I I I I ., L I .— Probe-mic recording ___ I of source "‘ _ Z A 50 —_ m '- . 3 I CI 1.0 mm : — — O 2.5 mm _ g I A 4.0 mm 2 _I :KEMAR—mic recording _ I 40 :- of synthesis : ‘. 20 I. I I I I I I I I I I I I I I I I - 0 5K 10K 15K Frequency (Hz) Figure 4.40: Variation of ear-canal spectra with different probe-tip positions. The numbers on the legend are approximate distance from the probe-tips to the KEMAR microphones. the VRX technique is valid in providing good synthesis at the listeners’ ear-drums. The synthesis is good because the matrix calculation of the VRX method takes out the factor of the probe-tip position in the calibration sequences. If only the probe—tip position does not change during the experimental run, the synthesis remains good. 4.3 Experiments The following 11 experiments focus on various cues for front / back discrimination. The VRX technique was always used except for Experiment 11. Each listener par— ticipated in a certain set of the experiments. To minimize possible variance among different days. the same experiments were performed on the same day when possible. In case that the experiments were too long, the remaining runs were performed on 190 the next scheduled date. On each day, if there were parameters in the experiments, the parameters were varied for adjacent runs. When there was no parameter to vary in the experiments, the runs were alternated with other short experiments. In gen- eral, the experimental runs dedicated to a particular stimulus were not all done in succession. Before introducing VRX experiments, to make language simpler, it is convenient to introduce the concepts of front / back spectral level difference (FBSLD) and front / back spectral phase difference (FBSPD). In this chapter, FBSLD is defined as the level of the recording of the front source minus the level of the recording of the back source at each frequency at each ear (Equation 4.9), and FBSPD is defined as the phase of the recording of the front source minus the phase of the recording of the back source at each frequency at each ear (Equation 4.10). Both FBSLD and FBSPD are functions of frequency. FBSLDL(f) = LFL(f) — LBL(f) FBSLDR( f) = Lm(f) — LBRU) (4.9) FBSPDL(f) = ¢FL(f)—¢BL(f) FBSPDRU) = ¢FR(f)—¢Bn(f) (4.10) 4.3.1 Flattening experiments Experiments 1 through 4 focus on spectral cues for front and back sources within various frequency bands. By flattening the level spectra within a frequency band, the front /back cues (FBSLD) within the band were eliminated, and listener had to use cues outside the band to discriminate front and back sound sources. Performance of each listener was examined with various flattened frequency bands. Changing spectra to determine relevant spectral region for localization is not new. Hebrank and Wright (1974b) used all of the high—pass, low-pass, band-pass and band— reject stimuli, parallel to the flattening experiments (Experiments 1 through 4) in the 191 following text, in their localization experiments. However the flattening experiments, unlike the filtering technique that Hebrank and Wright used, do not remove power from any spectral region, thus the flattening experiments are better because: 1. No extra spectral gradient was introduced, which might itself be a localization cue (Macpherson and Middlebrooks, 2003). 2. Listeners cannot immediately distinguish flattened spectra from spectra with complete information. By contrast, if the signals are filtered in some way, listeners know that they are being given less information, and they can only direct their attention to the available band. 3. The overall level is unchanged. For filtering experiments, as available infor- mation is reduced, resulting in smaller bandwidth, to keep the spectral level unchanged, the overall signal level decreases, which might affect listeners’ per- formance. Langendijk and Bronkhorst (2002) performed localization experiments with DTFs flattened in various frequency bands, very similar to the method of flattening exper- iments in this chapter. The difference, however, is that they flattened the DTF by taking average of the amplitude spectrum for each source, whereas in the flattening experiments the average was taken between front and back sources. Thus Langendijk and Bronkhorst removed the local spectral structure within certain bands, whereas the flattening experiments in this chapter removed the spectral difference between the front and back sources. The necessary band(s) concept A necessary band(s) model is proposed to describe the contribution across vari- ous frequency bands to successful front / back discrimination. The necessary band(s) model says that there exists a necessary frequency band, or there exist necessary, 192 non-contiguous frequency bands. Band(s) are necessary if the information in every portion of every band is necessary for front/ back discrimination. It follows that, if the information in any part of any necessary band is missing, the listener will fail to distinguish between front. and back. Here failure means performance below threshold, defined as 75% correct. The alternative is a multiple bands model, which states that there is no neces- sary band, and if given sufficient information outside a given band, a listener can successfully localize sound sources. Experiments 1 through 4 in the following were designed to test between the nec- essary band(s) model and the multiple bands model. It is worth noting that the terms “necessary” and “sufficient” have been used in articles on spectral cues for front/back. Experiments by Asano et al. (1990) found macroscopic patterns at high frequencies necessary for front/ back judgement and found that, when present, microscopic spectral cues below 2 kHz are necessary, although not sufficient. Experiment 1: Flatten below Experiment 1 examined whether the FBSLD cues were important at low frequen- cies. In Experiment 1, the amplitudes of components below and including the nth component in the baseline spectra were flattened (replaced by the root-mean-square average) for both front and back sources, at each ear independently (Equation 4.11); the components above the nth component in the baseline spectra were unchanged. The frequency of the nth component, fn, is called “the boundary frequency” in the 193 following text. The adjusted phase spectra were identical to baseline. (YEN)! = |Y§L(f)l = (/§| = \/,,,_g_m,z:;3[lra nent (to be discussed in the next paragraph). The new spectra are called adjusted spectra, 17+(f), and these were the spectra that we wanted listener to hear. By applying the transaural technique as in Equation 4.12, the adjusted syntheses to be played through the synthesis speakers, 14+ (f), could be derived. AW) 2 1 AZ’(f)Wé’R(f) —A:;'(f)W;;L(f> YEW) (4.12) Wé’L(f)Wé'ilf)-W’;’R(f)W5L(f) —Ag’(f)W[,’R(f) AW) SM) Yiflf) where the matrix Wé’n(f) - é’zlf) - é’R(f) .1’1.(f) W"(f) = is a combination of the matrices W( f) and W’( f ) in such a way that at frequency f, if /l( f) was selected as the component in fl.” (f), then W( f) was selected as the matrix in W”( f ); otherwise, if fl" (f) was selected as the component in 4” (f), then W' (f) was selected as the matrix in W”( f) When 1431f) was playing through the synthesis speakers, the recording l7+'( f) was supposed to be identical to the adjusted spectra l7+( f) The frequency components of i7+'( f) which deviated from I7+( f ) by more than 50% (corresponding to an error 194 larger than —6 dB or +3.5 dB), i.e. the frequency components satisfying llYJ'Wl— |Y3l( Mm! lli’g'Ufl—IYEU)” lY§l x100% > 50% either x100% > 50% 01' were eliminated. The number of eliminated components is defined as m. Overall, the eliminated components were very few, and we set a standard that if more than 20 components were eliminated (including those eliminated in the calibration sequence), it would be considered as a bad simulation, and the whole calibration sequence would repeat. In Figures 4.41 and 4.42 (as well as all the figures of amplitude spectra in Experiments 2 through 11), the data points at 0 dB represent eliminated components. In this way, while keeping the overall power unchanged, the amplitude spectrum below the boundary frequency was flattened. Hence listener could not get useful spectral information to discriminate front and back from the nth component and below. Figures 4.41 and 4.42 show the baseline and the adjusted syntheses for the right ear for fn = 8026 Hz as an example. The left-ear spectra are similar. In each run of this experiment, the adjusted syntheses for the front and back sources were presented to the listener in a random order for 20 trials (10 for the front source and 10 for the back source). The listener’s task was to respond whether sound came from front or back, by pressing the corresponding buttons. Besides these 20 trials, 8 trials of baseline synthesis (4 for front source and 4 for back source) were added randomly, which was to check whether the listener could still do the discrimination task. If the listener could not succeed in discriminating the baseline synthesis (more than one of those 8 baseline trials were incorrect), it meant that something had gone wrong, and the data from that run were eliminated. The procedure described in this paragraph was practiced in all of the following experiments except for Experiment 11. 195 80 I I I 0 Baseline — 0 Adjusted I T I I TI 7 I I I I I I I Listener: M 60 — '- 63 égupp— 33 a; E W 6 _ d) 9 _I _ O 20 — ° - O — Recording simulations in right ear (front) - 0 I I I I I I A I I l I a I I “A A 0 5K 10K 15K Frequency (Hz) Figure 4.41: Amplitude spectra for the front source in right ear in Experiment 1. flattened below 8 kHz. 80 I I I 0 Baseline — 0 Adjusted _- —| - I I I I I l I I Listener: M 60 - _. 65 33 "03 > m _I q,— 20 -— ° 3 5 — h Recording simulations in right ear (back) _ 0 I l l I l I d l I I I al I ash-A A .— 0 5K 10K 15K Frequency (Hz) Figure 4.42: Amplitude spectra for the back source in right ear in Experiment 1, flattened below 8 kHz 196 Eight listeners (A, D, E, F, L, M, R, and X) participated in Experiment 1. The open circles in Figures 4.45 and 4.46 show the results of Experiment 1, in the form of percent correct on front / back judgement as a function of boundary frequency. Each listener did 4 runs for each condition. Hence each data point on the figures was a mean of 4 runs, and the error—bar was the standard deviation over the 4 runs. The figures show individual differences. For example, data of listener R (the third panel in Figure 4.46) show that presenting information between 6 kHz and 16 kHz is adequate for her to discriminate front and back, but she failed the task with informa— tion between 8 kHz and 16 kHz only. The testing range of boundary frequencies was chosen for each listener so that performance of this listener decreased from almost perfect (100%) to close to the 50%-limit? Performance with an even lower bound— ary frequency was assumed to be perfect, and performance with a higher boundary frequency was assumed to be close to the 50%-limit. These assumptions were con- firmed by testing runs with those boundary frequencies for some listeners. Given the individual differences, listeners might have different testing ranges of boundary frequencies. In Figures 4.45 and 4.46, all listeners showed decreasing performance in Exper- iment 1 as the boundary frequency increased, which is reasonable because useful front / back cues were eliminated for high boundary frequencies. Besides this general tendency, however, listeners demonstrated large individual differences. The change in performance happened at different boundary frequencies for different listeners, and the ranges were also different. On the figures, as boundary frequency increased, per- 2The 50%-limit can be approached in different ways. Sometimes, listeners heard sound images that were either diffuse or in the center of head. Sometimes, they found that they could hear the sound images from both ways, and they can choose to perceive them from front or from back. For these two conditions, the 50%-limit corresponded to random guessing. However at some other times, listeners perceived all the sound images from one direction, either all clearly from front, or all clearly from back. For this condition, the 50%—limit corresponds to responses all from one direction. An example of this condition is the data point at 8 kHz for listener X (the bottom panel in Figure 4.46). Besides those differences, it is always true that for those runs with scores close to the 50%-limit, listeners could not find valid localization cue to discriminate front and back sound sources. Thus this chapter will not distinguish among these conditions, and will simply note them as “the 50%-limit”. 197 formance of listeners A, L, R and X dropped very sharply within a range of only 2—kHz around center frequencies (A around 9 kHz, L around 11 kHz, and R and X around 7 kHz), while performance of listeners D, E, F and M decreased slowly over a much wider range. 90% can be taken as a threshold of percent correct, where the performance started to decrease from perfect, and the boundary frequency for the 90%-threshold is noted as b fgo 1. When the boundary frequency was below bf901, the performance was either perfect or close to perfect. Therefore, if there is any necessary band(s), it has to be above b fgoj because, by definition, eliminating information from any portion of the necessary band(s) would lead to a failure in performance (below the 75%-threshold). Experiment 2: Flatten above Experiment 2 examined whether F BSLD cues were important at high frequencies. It was similar to Experiment 1, except that this time, to make the adjusted synthesis from the baseline synthesis, it was the frequency components above and including the h component whose amplitudes were flattened (Equation 4.13), and the frequency ,nt components below the nth component were unchanged. Similar to Experiment 1, the frequency of the nth component, fn, is called “the boundary frequency”. The adjusted phase spectra were identical to the baseline. Figures 4.43 and 4.44 show the baseline and the adjusted syntheses for fn = 6038 Hz for the left ear as an example. The right-ear spectra are similar. Yaw.) 2+ Iraq.) l l lfllff)‘ = lygrffll = (fie—01.13523: [ (4.13) + [’3”R(fi) IYa| = [Yam] = \/(———m, 3:3. [I law.) 2 198 where f 2 f,,, and m. is the number of eliminated components above the nth compo- nent. The same eight listeners as in Experiment 1 participated in Experiment 2. The solid circles in Figures 4.45 and 4.46 show their results. For example, the data of listener R (the third panel in Figure 4.46) show that she could successfully discrim- inate front and back sources with information below 14 kHz, but she failed the task with only information below 10 kHz. Similar to Experiment 1, the testing range of boundary frequencies was chosen for each listener so that performance of this listener decreased from almost perfect to near the 50%—limit. Performance with even higher boundary frequencies was assumed to be perfect, and performance with lower bound- ary frequencies was assumed to be close to the 50%-limit. These assumptions were confirmed by testing runs with those boundary frequencies for some listeners. Figures 4.45 and 4.46 also show the two features parallel to Experiment 1. First, all listeners showed decreased performance as the boundary frequency decreases. This was expected because useful front / back cues were eliminated with lower boundary fre- quencies. Second, there were large individual differences among listeners. In Figures 4.45 and 4.46, performance of listeners E, F, L, M, R and X dropped sharply within a frequency band of 4 kHz, around different center frequencies (E, F and X around 8 kHz, L and M around 4 kHz, and R around 12 kHz), while performance of listeners A and D decreased very slowly over a much broader frequency range. Parallel to Experiment 1, taking 90% as a threshold of percent correct, the bound- ary frequency for the 90%—threshold is noted as b fgm. When the boundary frequency was above b fgm, the performance was either perfect or close to perfect. Hence, if there is any necessary band(s), it has to be below b fgm. Listeners A, D, L, and M scored greater than 80% when presented with informa- tion only below 4 kHz, which is consistent with Blauert (1983), who found significant cues for front / back localization around 500 and 1000 Hz. Both Experiment 2 and 199 -'_"4-.’".'.nl1'1 80 07 O A O Level(dB) N O F- I I I I I I I I I I I I w I I I =1 0 Baseline ' ‘ Listener: M i 5 - 0 Adjusted . : ._ I?" l I. i 0‘» A... ‘ PU & .‘ ’ -'l o 1 ‘ ° as — 8:?” 3,. ~ 0 ; — Recording simulations in left ear (front), H - I._. I I II 1.1.1.1.] 1.1. I. ' 0 5K 10K 15K Frequency (Hz) Figure 4.43: Amplitude spectra for the front source in left ear in Experiment 2 80=I I II I I"! “’l I I I"“I" I' I'I “I =‘ 0 Baseline :Listener: M » P 0 Adjusted ' - ,. -~ - 60 ._ . , ' , U .— 8 — fl . .. ,_ . ‘ ..‘é \, r0 . .03». "nah." ‘ -I 3 3 s M wag 0 ° . * E 4O P v s . . 0 . i 0 OR o . : ‘0— > ' i 2 O r 0 Q!) o 0 ° 99350 a: 20_ o ._ Recording simulations in left ear (back) j OI: I I I I I I I 4. I I I I._I.._I-_L._L= 0 5K 10K 15K Frequency (Hz) Figure 4.44: Amplitude spectra for the back source in left ear in Experiment 2 200 Percent correct 100’ so: so: 401 20» o . Listener A .‘ 100 . L I I r 80~ 60: 403 """""""""""""""""""""""""""""" 20- o . Listener D J 100 so; so: 4o:- 20'- 0 . Listener E J [00 r ' 1 U T so: so: 40} 0 Experiment I: flatten below boundary 0 Experiment 2: flatten above boundary 20 t'Twfi'I Listener F J L l' A l J. 0 2 4 5 8 110 l 2 I 4 [6 Boundary frequency (kHz) Figure 4.45: Result of Experiments 1 and 2 (part 1) 201 100. so: so: 40L 20- Listener L JJLJAJ-JA1 -l- l A l A l V I U T U U 100 so » 50 .’ 4o 20- 1 l l l l l o . Listener M . d «II- «II- q I d r I I 100 80, Percent correct 50' 40’ 20- 0 L Listener R . I 1A1-l T j 1 100 so , 60' 40’- 0 Experiment I: flatten below boundary 20." 0 Experiment 2: flatten above boundary 0 . Listener X 1 j J J L 1 L L+141-l‘1- O 2 4 6 8 IO 112 F4 16 Boundary frequency (kHz) Figure 4.46: Result of Experiments 1 and 2 (part 2) 202 Blauert’s experiment show that it is not necessary to have cues above 4 kHz to suc- cessfully discriminate front from back. Moreover, Asano et al. (1990) found that listeners’ front/ back judgements were good when smoothing the spectra, i.e. elimi- nating the detail structure in the spectra, above 3 kHz, and listeners failed the task when smoothing below 2 kHz. This suggests that the information above 3 kHz is adequate for front / back judgement, which agreed with the flattening experiments in this chapter. On the other hand, Algazi et al. (2001) calculated the correlation between the azimuthal angle that listeners’ responded and the angle of the DTF, and found that when low-pass filtering below 3 kHz, the correlation was about 0.3 to 0.6, whereas in the median sagittal plane, the correlation was much less (0.1 to 0.2). This re- sult suggests that the information above 3 kHz is critical for front/ back judgement, which tended in the opposite direction to the results of the flattening experiments. Furthermore, Hebrank and Wright (1974b) found that the spectra above 11 kHz was required for localization in the median sagittal plane, which clearly disagrees with the results of all listeners (except for listeners A and R) in Experiment 2. However their loudspeaker did not pass energy below 2.5 kHz, whereas these bands were included in Experiment 2. As Blauert stated, the frequency bands around 500 and 1000 Hz, both below 2.5 kHz, contain useful information for front / back localization. Therefore presenting spectrum below 2.5 kHz or not should explain the difference between the results in Experiment 2 and the results by Hebrank and Wright. Discussion on Experiments 1 and 2 Experiments 1 and 2 identify, for each listener, frequency bands, within which the performance drops from close to perfect to close to 50%-limit. These bands are called “performance-changing bands” (PCB) in the following text. The boundary frequencies of the PCBs were decided by thresholds of 90% (10% from perfect) and 203 60% (10% above the 50%—limit) in percent correct. Comparing the relative positions of the PCBs from Experiments 1 and 2, the listeners can be categorized into three groups: 1. V-shape (listener R): The PCB from Experiment 1 (open circles) is on the left of the PCB from Experiment 2 (solid circles). 2. X-shape (listeners A, F and X): The PCB from Experiment 1 overlaps with the PCB from Experiment 2. 3. A-shape (listeners D, E, L and M): The PCB from Experiment 1 is on the right of the PCB from Experiment 2. According to the necessary band(s) model, there is a necessary frequency-band(s) that has to be present for successful front/ back discrimination. For the V-shape listener (R) and the X—shape listeners (A, F and X), this necessary band(s) has to be between two frequency boundaries, bf901 and b fgm. The frequency band between b fgo l and b fgm is defined as “the central band” in the following text. The PCBs and the central bands are shown in Table 4.3. The results from Experiments 1 and 2 (Figures 4.45 and 4.45) confirm that when this band is gradually removed, the performance by the V-shape and X—shape listeners degenerated. However, there is evidence against the necessary band(s) model, namely the A- shape listeners (D, E, L and M). For these A-shape listeners, b fgo l is higher than b fgm, and therefore no frequency band is absolutely necessary. For instance, in Experiment 1, listener L (the top panel in Figure 4.45) could successfully discriminate the front and back sources with information above 10 kHz (up to 16 kHz); in Experiment 2, she could also discriminate them perfectly with information below 6 kHz. The two bands do not overlap. Therefore for listener L, as well as for other A-shape listeners, no fre- quency band is absolutely required, and these listeners can use available information in various frequency-bands, either high-frequency or low-frequency, to discriminate 204 front and back sources. For comparison purposes, the PCBs of the A-shape listeners are also included in Table 4.3. PCB from Expt. 1 PCB from Expt. 2 central band category listener (kHz) (kHz) (kHz) V-shape R 6—8 10—13 6—13 A 8—10 8-14 8-14 X-shape F 2-12 6—10 2-12 X 6—8 6—9 6-9 D 8-14 0—6 — A-shape E 8-14 6-10 4-9 * L 10-12 3-6 6—10 * M 6-12 2-8 — Table 4.3: Bands for each listener To further test the necessary band(s) model for the V-shape and X-shape listeners, the following two experiments were performed. Only some of the V-shape and X—shape listeners in Experiments 1 and 2 participated in Experiments 3 and 4. In addition, two A-shape listeners participated in Experiments 3 and 4 as well, and their central bands were chosen as indicated in the right-most column in Table 4.3, marked with stars because their central bands were not defined by a lower boundary of bf901 and a higher boundary of b fgm, as for the V-shape and X-shape listeners. Experiment 3: Flatten outside Experiment 3 was similar to Experiments 1 and 2, except that the frequency components outside the central band were flattened. The higher and lower boundary frequencies for each listener were determined from Experiments 1 and 2, so that the central band includes the possible necessary band(s). An example of the baseline and the adjusted syntheses for the left ear is shown in Figures 4.47 and 4.48. This experiment was designed to test whether the central band including all the necessary band(s) is a sufficient band, i.e. listeners could successfully discriminate front /back sources with only information within the central band. 205 80 I I I I I I I I I I I I I I I I - Level(dB) 0 Baseline Listener: E — 0 Adjusted _ 60 - - 40 firm-«M o ,3,ng . _ O m— - — Recording simulations in left ear (front) _ I I I I I I I I I I I I I I I I 0 5K 10K 15K Frequency (Hz) Figure 4.47: Amplitude spectra for the front source in left ear in Experiment 3 80 - I I I I I I I I I I I I I I I I 0 Baseline Listener: E — 0 Adjusted 8 Jvr-t". 7’3: 40 Afghagfisiggz daggagggfinfig ' ' ' O - Q m~ ° os— Level(dB) Recording simulations in left ear (back) 0 I I I I I I Lil I I I I I I I LILL 0 5K 10K 15K Frequency (Hz) Figure 4.48: Amplitude spectra for the back source in left ear in Experiment 3 206 Five of the eight listeners in Experiments 1 and 2 (A, E, L, R and X) participated in Experiment 3. Three of the five listeners (A, R and X) were V-shape or X-shape listeners, for whom this experiment was designed. Listeners E and L participated but with no evident prediction. For V—shape and X-shape listeners (A, R and X), the central band for each listener can be found in the right-most column in Table 4.3. For the A-shape listeners E and L, the central bands were chosen as follows: 4-9 kHz for listener E, and 6—10 kHz for listener L. The results of Experiment 3 were shown as open circles in Figure 4.51 3. Unfortunately none of the listeners did well in this experiment. Actually all of their scores were close to the 50%-limit. (For listeners E, L, R and X, their results were at exactly 50% with no error-bar. This was because they always heard the adjusted synthesis coming from one direction, either front or back, for the entire Experiment 3.) One-tail t-tests show that the percent correct for listener A was significantly below the 75%-threshold (indicating success or failure) at the 0.1-level, and the scores of percent correct for all the other three listeners were significantly below the 75%-threshold (indicating success or failure) at the 0.002-level. The poor performance suggests that the central band was not a sufficient band, and listeners would need information beyond the central band for successful front / back judgement. Because the necessary band(s) was included in the central band, it can be further said that the necessary band(s) was not a sufficient band(s), either. However, this result is not evidence against the necessary band(s) model because a band being insufficient can still be necessary. To test the necessary band(s) model for the A-shape and X-shape listeners is the motivation for Experiment 4. Experiment 4: Flatten inside Experiment 4 was simply the reverse of Experiment 3. In Experiment 4, the spec- trum within the central band was flattened, and the frequency components outside 3Listeners E and L are marked with parentheses in Figure 4.51 because Experiment 3 (as well as Experiment 4) was not designed for these two A—shape listeners. 207 the band were unchanged, i.e. identical to baseline synthesis. Figures 4.49 and 4.50 show an example of the baseline and the adjusted syntheses for the left ear. The same five listeners as in Experiment 3 (A, E, L, R and X) participated in this experiment. The results were shown as the open squares in Figure 4.51. For the three V-shape and X-shape listeners (A, R and X), for whom this exper— iment was designed, the necessary band(s) model predicts that performance should be poor because the necessary band(s) was taken away. The experimental results in Figure 4.51 show that the only V-shape listener in this experiment, listener R, did perform poorly (i.e. significantly below 75%-threshold with a p-value of zero) as predicted. However, listener X got a score close to perfect (significantly above the 75%-threshold at the 0.002-level), and listener A also got a score above 75%, although not significantly. The results by listeners X and A were clearly in disagreement with the prediction by the necessary band(s) model. The two A-shape listeners E and L got perfect scores, which was not surprising because for listener E, she could discriminate front / back sources with all the compo- nents below 10 kHz flattened, and in Experiment 4, only the band between 4 and 9 kHz was flattened, and therefore her performance should not be worse than that in Experiment 1; similarly for listener L. Meanwhile this was a good confirmation that listeners E and L did give consistent results. Experiment 4A: Flatten inside with wider central band Listeners L and X had fairly narrow central bands in Experiment 4, thus flattening within those bands eliminated very little front / back information. Both listeners did very well in Experiment 4. The purpose of Experiment 4A was to test whether listeners L and X could still succeed the task with even wider flattened central bands. The results are shown as solid squares in Figure 4.51. For listener L, the central band used in Experiment 4 did not include both of 208 Level(dB) 80=I I I I I I I In I I I II “I I: 0 Baseline Listener: X ' 5 - 0 Adjusted , .— 60 P - 40? KW fix 6%- 05090: $&: SO; . , o: _ 93 s o . ' 20 _ . . .0 O ... _ Recording Simulations in left ear (front); I - ().= I I I I. III I J .I d. ,l...l.....l...J A: 0 5K 10K 15K Frequency (Hz) Figure 4.49: Amplitude spectra for the front source in left ear in Experiment 4 Level(dB) 80=I I I III "I I‘I I I ‘l“I"‘I“"l“I: 0 Baseline Listener: X i ' - 0 Adjusted ~ -- » .—I 60 " -I 404‘. R g - >31!“ " ”B953: f 33% 20 — Q: - P Recording simUlations in left ear(back) 1 g _ 0.: I I I I I I I I I I .1. I I I .I. L4; 0 5K 10K 15K Frequency (Hz) Figure 4.50: Amplitude spectra for the back source in left ear in Experiment 4 209 100* F ‘a I5- ' E ‘ ES 90» i I v 80 _ E] . + ’ I 8 70- . L ’ 1 8 60- 4- . U ’ i .,_ 50--- ------ O ------ O- ------- OD----o-----. c . I a 40- . . « L IO Experiment 3: flatten outsnde ‘ CE 30 ’0 Experiment 4: flatten inside ' 20 :- Experiment 4A: flatten inside with wider c.b.; A (E) (L) R X Listener Figure 4.51: Result of Experiments 3, 4 and 4A the PCBs from Experiments 1 and 2. By contrast, in Experiment 4A, in order to include those PCBs, the central band for her was chosen to be from 3 to 12 kHz. The figure shows that with this wider central band, listener L could still discrimi- nate front and back stimuli perfectly, which further confirmed the information in the PCBS from Experiments 1 and 2 were not absolutely necessary for her, and the nec- essary band(s) model does not apply to her. It is worthy of noting that the flattened central band between 3 to 12 kHz was very wide for listener L, and yet she still suc- ceeded the front / back discrimination task. This result favors Blauert’s finding that the front / back cues below 1 kHz were significant. For listener X, the wider central band was simply chosen to be 2 kHz wider on both high- and low-boundaries (totally the central band was 4 kHz wider), i.e. from 8 to 12 kHz. In Figure 4.51, it is clear that even flattening over a wider central band, listener X could still discriminate front and back stimuli very well (significantly above the 75%-threshold at the 0.001-level). This result strengthens the result of listener X from Experiment 4, i.e. the information in the central band including the PC38 from Experiments 1 and 2 was not necessary for listener X. 210 Summary of Experiments 1 through 4 In Experiments 1 through 4, spectral patterns in various frequency bands (i.e. high-frequency, low-frequency, band-reject, or bandpass), bearing information for front/back discrimination, were eliminated by means of flattening the amplitude spectrum. The intention was to discover whether there is a necessary band(s) for a given listener that is essential for successful front / back discrimination. The results for seven out of the eight listeners (among the seven listeners, six of them showed significant results) however showed that there is no such necessary band. On the con- trary, the results supported the following idea (called “multiple comparison model”): There are several frequency bands that give a listener front / back information. When presented with a given sound, the auditory system makes judgement on front/ back comparing among those frequency bands, and makes a judgement based on the com- parison. Therefore, all those frequency bands contribute, but no frequency band is absolutely dominant. This mechanism is practical because a sound in nature might lack frequency components within a given band(s). If that band(s) happens to be the necessary band(s), the animal might be in danger. It should be noted that, according to the design of Experiments 1 through 4, the negative results are more meaningful. In other words, for the exceptional listener R who did not show evidence against the necessary band(s) model, the results were not a direct support of the model, either. On the other hand, the multiple comparison model is consistent with the results, i.e. none of the results in Experiments 1 through 4 demonstrated evidence against the multiple comparison model. 4.3.2 Peaks and dips Experiments with sine tones or one-third—octave band noises (Blauert, 1969/70), or with one-twelfth—octave noises (Mellert, 1971), or with one-sixth-octave band noises (Middlebrooks, 1992) show elevation cues that correspond to peaks in the spectrum. 211 7"- ll Blauert (1983) refers to them as boosted bands, serving as directional bands. How- ever, other research, based on stimuli with broader bands, emphasizes notches in the spectrum (Bloom, 1977a, b). The following experiments were done to determine whether peaks or dips are dominant in the ability to distinguish front from back. Because front / back cues are thought to be in a relatively high frequency range, the adjustments in Experiments 5 and 6 were applied above a boundary frequency, which was different for different listeners. The components below the boundary frequency were identical to baseline. The boundary frequency was found from Experiment 2, where the performance dropped to 60% (near the 50%—limit). Four listeners (A, L, R, and X) participated in Experiments 5 and 6. Table 4.4 shows the boundary frequency for each listener. The reason to choose the boundary frequency is as follows. Listener did perfectly for baseline. and did poorly when flattening the components above the boundary frequency. Therefore by varying the components above the boundary frequency, the listener’s performance could vary from perfect to poor, which means that the information above the boundary frequency is vital. Experiments 5 and 6 cut the peaks or dips in the vital range, and examined how much worse would the listener discriminate front and back simulations, and whether eliminating the peak- or dip-information would dramatically decrease the listener’s performance. listener IA L R X boundary freq. (kHz) | 8 3 10 6 Table 4.4: Boundary frequency in Experiments 5 and 6 Experiment 5: Peaks only In Experiment 5, dips in the baseline spectra were cut, and only peaks were left. To cut the dips, the RMS amplitude as in Equation 4.14 was first calculated from the 212 baseline spectra above the boundary frequency, 2 + Yaw.) ] (4.14) . 1 2'0 Ax ,' , . , = 7) I ’1’ i R [S Amplitude “251 _ b _ m 21:1; [ YPR(f) where b is the order number of the component at the boundary frequency, and m is the number of eliminated components. Then the amplitude of the components above the boundary frequency whose ampli- tude was less than the RMS amplitude was set to be equal to the RMS amplitude. An example of resulting spectra in the right ear is shown in Figures 4.52 and 4.53. It is clear that the dips were cut. The open circles in Figure 4.56 show the results of Experiment 5. The scores of all of the four listeners were somewhere between perfect (100%) and the 50%-limit. Experiment 6: Dips only Experiment 6 was similar to Experiment 5, except that it was peaks that were cut, and dips in the baseline spectra were preserved on adjusted spectra. This adjust- ment was achieved by setting the amplitude of the components above the boundary frequency whose amplitude was greater than the RMS amplitude to the RMS ampli- tude. Figures 4.54 and 4.55 show an example of the right-ear spectra in which the peaks were cut. The solid circles in Figure 4.56 show the results of Experiment 6. Three out of the four listeners performed close to perfect (100%). Compared with the results from Experiment 5 (open circles), all listeners showed better performance in Experiment 6 with dips only, and one—tail t-tests showed that, for three out of the four listeners, the performance in Experiment 6 was significantly better (for listeners A and X, significant at the 0.05-level; for listener L, significant at 0.1-level). 213 Level(dB) 80 20 F I T I I * I I I I I I I I I r "l n =- 0 Baseline ' Listener: X 3 3 l- . Adjusted - - ‘ ‘ (a ' ” “ I? Q . mmw~&g fl. , , _ V fifm as g: 9 ’ o a 30.5 0. .3 ° ._ : «a? _ _g. 628 . Recording simulations in right ear (front) =1 I I I, 1.1...1 1.1 “I. *1... I .I 1h: 0 5K 10K 15K Frequency (Hz) Figure 4.52: Amplitude spectra for the front source in right ear in Experiment 5 Level(dB) 8O 60 4o 20 a.” ”flap V\e”'”§$ ? =='l I r‘ r ‘1“ I 'l “l '1’ I l‘ I” l“ I 'l r1 =' 0 Baseline . Listener: X i ‘ ; — 0 Adjusted . .. . . .. , a L _ — o ... 6% — Recording Simulations in right ear (back) - =1 I II I I I I I I ll. II I.” 0 5K 10K 15K Frequency (Hz) Figure 4.53: Amplitude spectra for the back source in right ear in Experiment 5 214 80= I I I I I I "1 I I I I I I II 0 Baseline : Listener: R i — 0 Adjusted ‘ Level(dB) on it , ago, 40 a c, I? h 9&0 °9 ea — 3": ~ we ‘ ‘2 3% , 7° : o 20 _ . . n , ': 00-7 ._ . 3 g?» i? . Recording simulatiOns in right earl-(front) V 0&1 I 1 I I II I VI-Jn_h_-.LQI~IJI~-J 0 5K 10K 15K Frequency (Hz) '5 Figure 4.54: Amplitude spectra for the front source in right ear in Experiment 6 80T= I* I' l‘ I‘“l' I ”l ‘l I I ”1‘ l“ T‘ I I “l =’ 0 Baseline A . g . Listener: R - 0 Adjusted ‘ 2 tr , .. _ .q 60 -- Level(dB) Recording simulations in right.eari(back)1 = I I I I I I I I I mi . i I 0 5K 10K 15K Frequency (Hz) 3 Figure 4.55: Amplitude spectra for the back source in right ear in Experiment 6 215 100 i JT'. ‘ _ ‘. . A 90 ' " . N l O . 4— 80 D # 0 j 8 70- i S? « L ’ d— __ 1 8 60- . U ' . .._ 50 ------------------------- 4.. .............. .. s * 4 8 40 C ‘1 (it, 30 ‘ 0 Experiment 5: peaks only ' 20: 0 Experiment 6: dips only A L R X Listener Figure 4.56: Result of Experiments 5 and 6 Summary of Experiments 5 and 6 Blauert (1969/70) first suggested a frequency band model on front / back localiza- tion based on spectral peaks only. However, he (1972, cited from Blauert, 1983, page 310) and Meller (1973, cited from Blauert, 1983, page 310) later hypothesized that spectral dips are important as well. Hebrank and Wright (1974b) considered both peaks and dips in their studies. The intention of Experiments 5 and 6 was to compare the performance by each listener with stimuli eliminating dips or peaks in spectra. Interestingly, all the four listeners performed better with only dips preserved than with only peaks preserved. This result suggests that dips might be more important cues for front /back localization for most listeners, different from the boosted bands that Blauert initially suggested. On the other hand, one has to admit that the validity of these experiments obviously depend on the definition of peak and dip as deviation from RMS value. to H Ob 4.3.3 Monaural and binaural cues Unlike localization in the horizontal plane, localization in the sagittal plane, such as front / back discrimination, has to utilize spectral cues. Various models have been suggested to account for sagittal plane localization. The following two experiments tested two of them. Experiment 7: Flatten right ear Since spectral cues are important for front / back discrimination, one might expect that a listener could detect the characteristic peaks and dips in spectrum with just one ear. Experiment 7 tested this hypothesis by flattening the right ear spectrum (Equation 4.15) while leaving the left ear spectrum identical to baseline. The adjusted phase spectrum in the right ear was identical to baseline. Figures 4.57 and 4.58 show the baseline and the adjusted syntheses for right ear. The baseline and the adjusted syntheses for left ear were identical. 2 p’iflfi) + Ygl?(fi) 2] (4.15) IY;R(f)‘ = IYgR(f)l = “@232“ where m is the number of eliminated components. It is worth comparing the method in Experiment 7 with the method of simply plugging the right ear. Localization tests using real sources by Morimoto (2001) revealed that the far ear stopped contributing for elevation judgements when azimuth was above 60°. This finding suggested that listeners could succeed in a front /back discrimination task with one ear plugged. However, by plugging the right ear, the sound image moves to the extreme left, and therefore the front / back discrimination experiment would force listeners to rely on percepts other than localization (Blauert, 1983, page 305). It was confirmed by informal listening that listeners with one ear plugged found the task to be meaningless in a sense that all the images were on one 217 80 I I I I I I I 0 Baseline 1 I I I l l Listener: E — 0 Adjusted 60 63 33 3 40 > a) _l 20 - Recording simulations in right ear (front) J l I I I 5K I I I 10K Frequency (Hz) L l l l J I 15K Figure 4.57: Amplitude spectra for the front source in right ear in Experiment 7 Level(dB) 80 I I I I I I I I I I I I I I I I - 0 Baseline Listener: E - 0 Adjusted - 60 - _ o 40 °- 20 - _ Recording simulations in right ear (back) - 0 l L l I I I l l l l I I I I I I n 0 5K 10K 15K Frequency (Hz) Figure 4.58: Amplitude spectra for the back source in right ear in Experiment 7 218 side, and there was no front /back cue. However, with flattened spectra in the right ear, the overall ILD cue was not pointing to the left ear. This method has the same spirit as the improvement by Hebrank and Wright (1974a). Instead of completely plugging one ear, Hebrank and Wright filled the concha to eliminate reliable pinna cues. However their technique still included other features of directional filtering, e.g. the diffraction due to head, neck and torso. Nowadays, with digital technology, the better method of completely flattening a spectrum in one ear, as applied in Experiment 7, becomes easy. In general, the method and stimuli used in Experiment 7 was better than plugging the right ear or filling the right concha. Seven listeners (A, E, F, L, M, R, and X) participated in Experiment 7, and their results are shown as circles in Figure 4.59. Except for listener E, all the other six listeners performed poorly (below 75%) on this experiment, suggesting that monaural cues are not adequate for most listeners for successful front / back judgement. Listener X’s result was right at the 50%-limit with no error-bar, because he heard adjusted signals all from the back. Squares in Figure 4.59 were performances with baseline for comparison. Open squares were runs with baseline stimuli. However three listeners did not do complete baseline runs. Their baseline scores were calculated from the 80 baseline trials in the first ten continuous runs. The scores for these three listeners are presented with solid squares on the figure. Ideally, results with baseline should be perfect. It was clear on the figure, and confirmed by one-tail t-tests at the 0.05-level, that performance with flattened spectra in the right ear was significantly worse than that with baselines for all listeners. Although previous works have shown that listeners can improve localization per- formance on front and back after training and adaptation to a new set of HRTFS (Hofman et al., 1998; Zahorik et al., 2006), it is hard to imagine that listeners can be trained to use monaural localization cues, which they do not normally use in the real world. 219 Percent Correct (:5) 100 90 7o 50 50 4o 30 20 100 90 80 70 60 50 4o 30 20 80 . o l f 4 t i F l l I l ‘ A E F L - '- ‘ID ID I D WT 4 I ‘l' 4 - ‘1 ’ > 1 f o 9 , P a ““““““ l """"" 0 ““““““““ ‘j . 0 Experiment 7: flatten right ear I ’ D Performance for baseline runs ‘ ’_ I Performance for baseline trials j M R X Listener Figure 4.59: Results of Experiment 7 220 W'hen listening to an unusual stimulus as in this experiment, listeners usually found the image to be very diffuse or inside the head. Some listener sometimes perceived split images from different spatial locations. Some other listeners responded compact image in space, but they were usually unable to discriminate adjusted stimuli for front and back, either. Experiment 8: Interaural spectral level difference Theoretically, there is an intrinsic problem in using spectral cues for front/ back localization: how would listeners know that the peaks and dips at certain character- istic frequencies were due to directional filtering? Maybe the spectrum of the original sound source already has peaks and dips at those characteristic frequencies. One way to solve this problem is to use interaural spectral level differences (ISLD), instead of the original level spectrum, as front / back cues. ISLD is defined as the interaural level difference between left and right ears at each frequency. The advantage of ISLD is that peaks and dips in ISLD do not depend on the spectrum of the original source, and encode information on directional filtering (Duda, 1997; Algazi et al., 2001). Experiment 8 was designed to discover whether ISLD cues were adequate for front/back discrimination. In Experiment 8, the adjusted spectra in the right ear were flattened over all frequencies for both front and back sources, in the same way as in Experiment 7 (Equation 4.15). The adjusted spectra in the left ear had amplitude spectra with ISLD identical to the ISLD of baseline (Equation 4.16) for front and back sources independently. Figures 4.60 and 4.61 show an example of baseline and adjusted syntheses for the front source at left and right ears. Those for the back source are similar. [Yaml = imei- Yiliii (4.16) [Yam] = Imam - fig“ 221 Level (dB) 80-IIIIIIIfiIIIIIII Listener: E 0 Baseline - 0 Adjusted 60 40 — O O 20-— Recording simulations in right ear (front) Ollllll In I .I It.) I I l l l 5K 10K 15K Frequency (Hz) Figure 4.60: Amplitude spectra for the front source in right ear in Experiment 8 ISLD (dB) 40 I I I I I I I I I I I I I I I 0 Baseline Listener: E — 0 Adjusted 20 O O wwgesgefia .23, .33. :8 — o 0o 9% - o O O ISLD in recorded simulations for front source _4oIIIIIIIIIIiIIII 5K 10K 15K Frequency (Hz) Figure 4.61: ISLD for the front source in Experiment 8 222 (II/am! = [Israel = constant) One interesting finding is that Figure 4.61 showed large ISLD, such as the data point around 8.8 kHz. Several reasons might lead to this result. First, the left and right synthesis speakers might not be ideally symmetrical about listener’s head, 4 which could give large ISLD due to different interference patterns. Second, the probe- microphones might be at different depths inside listener’s ear canals, which would get different recordings even if the listener’s left- and right-ears were ideally symmetrical. Even if the spectra in the left and right probe-microphones demonstrated the same pattern, if there is a little offset in the frequency domain between the spectra for certain peaks or dips, it would appear as large values on the ISLD spectrum. Third, listener’s ears, especially pinnae are not ideally symmetrical. Eight listeners (A, D, E, F, L, M, R, and X) were in Experiment 8, and their results are shown as circles in Figure 4.62. The results show that, except for listener M, all the other seven listeners did poorly (below 75%) in this experiment. (One-tailed t-tests show that, for listener listeners A, D, L, R and X, the difference below the 75%-threshold was significant at the 0.05-level; for E, the difference was significant at the 0.1-level; and for listener F, the difference was not significant.) Even for listener M, his score was not. close to perfect, either. Listeners A and X always heard stimuli in one direction, either front or back, which led to the the score right at the 50%-limit without error-bars on the figure. Squares in Figure 4.62 were results with baseline. Open squares were runs with baseline stimuli. For the three listeners who did not do complete baseline runs, their baseline scores, plotted as solid squares on the figure, were based on the 80 baseline trials from the first 10 continuous runs. When compared with baselines, the scores for all listeners in Experiment 8 were significantly worse (by one-tailed t-tests at the 0.05-level), indicating that ISLD is not a valid cue for 4The symmetry of synthesis speakers was not checked because the transaural technique does not require the two loudspeakers to be symmetrical about the median sagittal plane. 223 100 90 80 70 60 50 40 30 20 100 90 8O 7O 60 50 40 30 Percent Correct (Z) 20- front / back discrimination. This result agrees with Hartmann and Wittenberg (1996) who concluded that ISLD is not an adequate cue to provide externalization of sound images. . a Cl . . . . . . . . . . . . P d . . . a . . .-0 .......... a .......................... . . l’ q t 1 I- It . I b '4 l l 1 I A D F I 1 T T .. __ D I C] a . '1 F I D d I 1 . <5 . > I _ O I I _b d I- d . ...-j -------------------------------- O---- . 0 Experiment 8: ISLD Cl Performance for baseline runs I Performance for baseline trials 1 l l l L M R X Listener Figure 4.62: Result of Experiment 8 Discussion of Experiments 7 and 8 Experiments 7 and 8 show that the monaural cues and the ISLD cues are not adequate for front/back discrimination. This result agrees with Jin et al. (2004) who also found that these cues are not sufficient. Their experiments covered more 224 locations in the median sagittal plane, i.e. besides front and back sources, they also included elevated sources. The major difference between their experiments and the VRX experiments appears in technique. While Jin. et al. presented through ear- phones the broadband noise filtered by directional transfer functions (DTFs) derived from head-related transfer functions (HRTFs) (Middlebrooks and Green, 1990), the VRX experiments used the transaural technology, free of error in DTF measurements and free of error due to various earphone positions. In general, Experiments 7 and 8 support Morimoto (2001), who found that both ears contribute to localization in median sagittal plane; and Experiments 7 and 8 disagree with Hebrank and Wright (1974a), whose results suggests that listeners used monaural cues for localization in median sagittal plane. A possible cause for this disagreement is that, in the exper- iments by Hebrank and Wright, listeners had training with feedback, and therefore they might have learned the monaural timbre cues to discriminate among different locations in the median sagittal plane, instead of using localization cues; Whereas in Experiments 7 and 8 listeners were always instructed to use localization cues only. 4.3.4 Varying stimuli Experiment 9: Sharpening It is believed that the frequencies of peaks and dips in amplitude spectra give listener sensation of front/ back localization (Shaw and Teranishi, 1968; Blauert, 1969/70; Hebrank and Wright, 1974b). On the other hand, Zakarauskas and Cynader (1993) suggested an algorithm using the patterns of the first or second derivatives of level spectrum with respect to frequency to predict localization. Their calculations showed that the second derivative would be more robust than the first derivative, and either derivative was more robust than the original spectrum. Experiment 9 was designed to test these ideas. The adjusted spectra were the baseline spectra convolved in the frequency domain with a normalized function in a 225 shape like a Mexican-hat (Table 4.5). Equation 4.17 shows the formula for calculating the adjusted spectrum in the left ear for the front source. The right ear spectrum and the spectra for the back source are similar. This algorithm sharpened the baseline spectra by increasing the level difference between the peaks and the dips (with nega- tive elements in the convolution function), and smoothed the curves between adjacent components as well (with positive elements in the convolution function). 1 NIin(250,n+4) lLf‘LUn) =6:- 2 “UiSi—n [IQ‘IL(fi) (4-17) 11 i=Ma:c(3,n—4) where LFL(f.,,) is the level of YFL(fn) in decibels, ’F”L( n) is the level of Y1?“ n), the discrete function 3, is given in Table 4.5, 77,- is given by Equation 4.18, and the weighting function 0,, is given by Equation 4.19. j | :l:4 i3 :l:2 :tl 0 S,|—0.2 —0.5 0.2 0.5 1 Table 4.5: Value of the convolution function 5']- 1 if the 13th component is not eliminated m- : (4.18) 0 if the 2“ component is eliminated Min(250,n+4) 0,, = Z n,- 5.-., (4.19) i=Mam(3,n—4) The width of the convolution function 53- was chosen based on the information of peaks and valleys of the original level spectra and was to emphasize local structure. Figures 4.63 and 4.64 show an example of baseline and adjusted syntheses for the front source at left and right ears. Those for the back source are similar. It can be seen on these figures that peaks and dips were at the same frequencies as in the original spectra, but the level differences between the peaks and dips were magnified. 226 Level(dB) 80 l I I I I T l l l l l 0 Baseline — 0 Adjusted Listener: Recording simulations in left ear (front) 0 I I I I I I I I I a] lull I h 5K 10K Frequency (Hz) 15K Figure 4.63: Amplitude spectra for the front source in left ear in Experiment 9 Level(dB) 80 I I I I I I l l l I l l I l 0 Baseline Listener: X — 0 Adjusted Recording simulations in right ear (front) 0 I I I I I I I I I d In] J I. 10K Frequency (Hz) h 15K Figure 4.64: Amplitude spectra for the front source in right ear in Experiment 9 227 Thus the peaks and dips of the first and second derivatives of the level spectra with respect to frequency were not preserved. Nine listeners (A, D, G, L, P, R, V, X, and W) were in Experiment 9. Besides pseudo-noise that had been used in the previous experiments, the original stimulus of complex tone was also used (except for listener W). This was because many listeners did poorly on complex tone, and therefore they might gain improved performance after sharpening the spectra; whereas for pseudo-noise, performance for most listeners was already close to perfect, and sharpening the spectra would not let listener do better than perfect, and hence would not show much effect of improvement. Altogether there were 17 experimental conditions (9 listeners for pseudo-noise, plus 8 listeners for complex tone). The results for the sharpened pseudo-noise and sharpened complex tone are shown as open circles in Figure 4.65, compared with the results for the corresponding baseline without sharpening, shown as solid circles. In Figure 4.65, among the 17 experimental conditions, 14 of them showed that performance with sharpened stimuli was not worse than baseline. Two of the 14 conditions, namely listener W with pseudo-noise, and listener D with complex tone, showed much better performance for sharpened stimuli. For the remaining three cases in which performance with sharpened stimuli was worse than baseline, namely listener G with pseudo—noise, and listeners A and R with complex tone, the difference between sharpened stimuli and baseline was very subtle, especially when compared with error-bars. In summary, when presented with sharpened stimuli, listeners performed equally well or even better than baseline stimuli. This result agrees with Sabin et al. (2005), who found that increasing contrast of the magnitude of DTF up to 4 times did not impair performance. Because sharpened spectra did not preserve the first and second derivatives of the level spectrum (as shown in Figures 4.66 and 4.67), this result disfavors the computational model that Zakarauskas and Cynader suggested. For 228 Percent Correct (2) 100 90 80 70 60 50 40 100 90 70 50 100 90 80 70 60 50 40 100 90 80 70 60 50 40 I :- O: 0. %'§ 0'. Or; 80 60 L 40 l 1 ~ 1 L (a) pseudo-noise A D G L P 0'. O'. O'. ét T - t (b) pseudo-noise R V X W " 7" TI § i -I ' <> 9 f 1 -------------------------- or ------------- L (c) complex tone : A D G L P I 1 IT I I .’ § 5 %} §f 1 f O sharpened spectra “ O baseline ‘ l (d) complex tone R V X Listener Figure 4.65: Result of Experiment 9 [\D [0 CD 2IIIIIIIIIIIIIIII Listener:X — Adjusted - 1 _ 17 I _ _ \ 93 v 0 I—- .— q. 13 B - _l — aselIne - '0 _2 _ 4 I I I I I I I I I I I I I I I 0 5K 10K 15K Frequency (Hz) Figure 4.66: First derivative of level spectra for the front source in left ear in Exper— iment 9 24 I I I I I I I I I I I I I I I l“ Listener:X N 3'; Adjusted \ _ an 3 0 _ N u— '0 \ - _ (\TJ . '0 _12 BaselIne _ -24_llllllllllllllll 5K 10K 15K Frequency (Hz) Figure 4.67: Second derivative of level spectra for the front source in left ear in Experiment 9 230 characteristic—frequency models, on the other hand, the frequencies of the peaks and dips are important. For these models, the frequencies of the peaks are preserved by the sharpening process. Actually because sharpening increases the relative height between the peaks and dips, one might expect that performance with sharpened stimuli should be better than baseline in some cases. This prediction agrees with the results of this experiment. In general, the results of Experiment 9 favors the characteristic frequency models, and disfavors the computational model suggested by Zakarauskas and Cynader. Experiment 10: Advance right ear It is widely believed that interaural time difference (ITD) cues are most important for localization in the azimuthal plane (Wightman and Kistler, 1989b). In the sagit— tal plane, the spectral cues are most important. Experiment 10 examined whether interaural delay would affect front /back discrimination. With this experiment, our intention was to examine whether the spectral cues for front and back sources are orthogonal to, i.e. independent of, the ITD cues. Bloom (1977a, b) and Watkins (1978) claimed such a high degree of independence that spectral cues to elevation maintain their effectiveness even when the sound image is far off to one side due to monaural presentation. In Experiment 10, the adjusted spectra were achieved by advancing the right-ear baseline spectra by a certain amount of time. The advance (inverse—delay) was added by subtracting an extra phase that increased linearly with increasing frequency with certain slope. Figures 4.68 and 4.69 show an example of phase differences between the baseline spectrum and the adjusted spectrum for each ear for an advance of 100 us. When the delay changes, the slope in Figures 4.68 and 4.69 changes accordingly. The adjusted amplitude spectra were identical to baseline. Five listeners (D, E, M, X, and R) participated in Experiment 10. During the 231 180F= I I I r ‘P I I I I I I I I I .IvIJ-t ...} 0 Left ear Listener: E 5 f3. ' — ,‘ 0 Right ear 2 x: - (D O wAdjusted—‘pBaseline (deg) ’90 " ' :53: ' ‘1 _ i 5."? q . Phase difference _in, y .c‘ recordedgsimulations (front) , -130: I I I..I..Ig"'I l.l. l...l. I. .1 1.1.1331: 0 5K 10K 15K Frequency (Hz) Figure 4.68: Phase differences for the front source in Experiment 10 180I= I 11‘ T'fi" I r I t I I ‘1""l"‘l' Jy'l‘fi ' p 0 Left ear Listener: E 1" ~ - ‘0‘. - ~ ~. Right) ear - » ~ ,3 - J" .r‘ I w 90 - :3 ‘ ' "‘ )oAdjusted-lpBaseline (deg) '90 ‘- rafi' ; a _ 43.- Phase difference in ;.._ ...,‘o" recorded‘simulations (back) .3. -180 = I I I I If. I . I I I I I I I J I I I4, 0 5K 10K 15K Frequency (Hz) Figure 4.69: Phase differences for the back source in Experiment 10 232 experimental runs, listeners heard sound images moved to the right side. The task was to discriminate front and back sources. Results of Experiment 10 are shown in Figure 4.70. Five values of delay time were used: 200, 400, 600, 800 and 1000 as, except for listener R, who also did runs with 50 and 100 as. For those people who did complete runs with baseline, the baseline results are shown on the figure at 0 us. In Figure 4.70, the five listeners showed large individual differences. Performance of listeners E and X dropped below 75% at around 600 as, which is close to the physiological range of the human head. However listener R’s performance dropped below 75% at 200 us, much less than the human physiological range. On the other hand, listeners D and M had scores above 75% even at 1000 as. Especially, listener D responded almost perfectly up to 1000 as. Besides those individual differences, there are two things in common: 1. All listeners successfully discriminated front from back with ITD less than 200 ,us. 2. Performance by most listeners (except for listener D) decreased as ITD increases. These tendencies suggest that spectral cues for front / back localization and ITD cues for horizontal localization were independent, especially with ITD less than 200 [1.8; however when ITD was too large (400 to 800 ps, depending on the listener), perfor- mance would degrade, except for listener D. It is known that the spectral cues for elevation from the HRTFs are different for different azimuths (Algazi et al., 2001). Listeners can be expected to apply their experience with these differing sets of cues depending on their knowledge of azimuth. Consequently, the stimuli of Experiment 10 presented conflicting cues in that spectral cues appropriate to zero azimuth were accompanied by azimuth cues indicating 15°, or 31°, or 52° to the right. Conflicting cues often lead to a diffuse image instead of 233 Percent Correct (X) 100 30 20 100 90 so ' 7o 60 40 30 20’ l 00 90 80 70 60 50 40 30 20 istener D istener E IE] I—I- 0 200 400 600 800 l 000 50' W I L O Listener M O Listener X 0 200 400 600 800 l 000 t A—A— -'- 7 T ' .l t 1 .’ 1 ~-----------__- ------ a a a Ann: L .l L A Listener R , O 200 400 600 800 l 000 Interaural time difference (us) Figure 4.70: Result of Experiment 10 234 a compact image and a lower externalization score. Therefore, it was of interest to measure externalization scores for Experiment 10. The externalization scores between 0 (inside head) and 3 (perfectly externalized) were recorded for listeners D, R and X. Listener D always reported 3 for all conditions. Listener R reported scores above 2.8 for all conditions except for ITD of 200 us, where the score was about 2. Thus she perceived a less externalized sound image with ITD of 200 as. Listener X reported scores above 2 for all conditions, except for front sources with ITDs of 200 and 400 us. Similar to listener R, listener X also perceived less externalized image with some small ITDs of 200 and 400 ,us. All of listeners D, R and X gave the perfect externalization score of 3 for the baseline stimuli (with zero ITD). In general, the externalization was good even with ITD of 1000 as. The fact that inconsistency between azimuthal and elevation information does not lead to markedly reduced externalization may be further evidence of orthogonality between binaural and spectral cues. It needs to be mentioned that Macpherson et al. (2004) found that, when applying an ILD of 10 dB or an ITD of 300 ps, the spectral cues in listener’s ipsilateral ear, with respect to the perceived lateral source position, dominated the elevation judgement, as though listeners were monaural. In Experiment 10, all listeners, except for listener D, showed decreased performance score for ITDs above 300 ,us, which is consistent with the findings by Macpherson et al. However the good performance by listener D for ITD up to 10 kHz either contradicts the idea that listeners used monaural cues for ITD greater than 300 ,us, as found by Macpherson et al., or contradicts the finding from Experiment 7, i.e. listeners could not discriminate front/ back using monaural cues. In general, although the monaural cues in near ear might have more weight for front / back judgement, at least for ITDs less than 300 ps, the result from Experiment 10 suggests that ITD cues and front / back cues are relatively independent. 235 4.3.5 Competing cues Experiment 11: High-frequency cues vs. low-frequency cues In Experiments 1 and 2, the listener was presented with either high-frequency cues or low-frequency cues for front and back sources. Experiment 11 presented the listener with both high-frequency cues and low-frequency cues, and examined how listener dealt with competing cues. Experiment 11 was done in a simpler way, as suggested by Professor Brad Rakerd. It used only two source speakers, without the VRX technique or calibration sequence. There were 20 trials in each run, 10 for each of the two types of intervals. For Type I interval, the front speaker played the frequency components up through the nth component, and, simultaneously, the back speaker played the frequency components from the (n+1)"‘ component and above. Type II interval reversed Type I interval, i.e. the back speaker played up through the nth component, and the front speaker played the (n + 1)“ component and above. Type I and Type II intervals were presented to listener in a random order, and the listener responded whether he heard the interval from front or back. The boundary frequency in this experiment is defined as the frequency of the nth component. Six listeners (R, X, F, A, L, and D) participated in this experiment, and their results are shown as solid circles in Figures 4.71 and 4.72. The vertical axis on the figures is the percentage score demonstrating how well listener followed the low- frequency cues. Results from Experiments 1 and 2 were also included in the figures for comparison because these experiments lead to an alternative view of a listener’s use of low-frequency and high-frequency information. In Experiment 2, high-frequency cues were flattened, and hence only low-frequency cues existed. Therefore the correct comparison from in Experiment 2 is just the percentage score tracking low-frequency cues, which can be directly plotted in Figures 4.71 and 4.72 (squares). In Experi- 236 1 T V V U V I I 100 E 0 Experiment ll: low-f » vs. high-f 80 , A Experiment 1: r flatten low-freq 50 L V Reversed Experiment I: ‘_ flatten low-freq . D Experiment 2: .. 40? flatten high-freq ...V LAJA‘LlAl LA ALLLA j 20- AIL‘ Listener R I T T A CF 100, so: ajAJAIAlal 60 L1 40: ‘1 1‘1 20’ IALA 100, Percentage tracking low-freq cues so 50 I 40 20 Listener F b 5 t i 6 a 1‘0 f2 Boundary frequency (kHz) _1 14 16 Figure 4.71: Result of Experiments 1 and 2 and 11 (part 1) 237 100 80 60 4O 20 100 80 60 40 Percentage tracking low-freq cues Figure 4.72: Listener A . l I I I I I I I v I l- 1 n . n d . _— 1 _______ J .V \7' Listener L . l T . 0 Experiment ll: low-f vs. high-f _ A Experiment 1: flatten low—freq V Reversed Experiment I: ' flatten low-freq - El Experiment 2: flatten high—freq .V-. Listener D ééiééI'oIznie Boundary frequency (kHz) Result of Experiments 1 and 2 and 11 (part 2) 238 ment 1, low-frequency cues were flattened. Therefore the correct comparison from Experiment 1 represents the percentage score tracking high—frequency cues. However, the vertical axis in Figures 4.71 and 4.72 is percentage score tracking low-frequency cues, which, in Experiment 11, is exactly 100% minus the percentage score tracking high-frequency cues. Therefore results from Experiment 1 were flipped upside down and plotted as triangles and small dashed lines. in Figures 4.71 and 4.72. In Experiment 11, every listener’s score increased from 0% to 100% as bound- ary frequency increased over the complete frequency range. This common feature is not surprising, because as boundary frequency increased, more front/back infor- mation was played through the low—frequency speaker, which led listener to track low-frequency cues more easily. Similar to Experiments 1 and 2, listeners showed great individual variance. The shape of the curves, and the frequency band when listener flipped from tracking low- frequency cues to tracking high-frequency cues were quite different for each listener. However, when compared with results for the same listener from Experiments 1 and 2 (Figures 4.71 and 4.72), there were some similarities. For example, for listeners R (V-shape listener) and X (X-shape listener, but very close to A-shape because the overlapping band is very narrow), at high boundary frequencies, their results in Ex- periment 11 (solid circles) coincide with the results from Experiment 2 (squares); at low boundary frequencies, their results coincide with the flipped results from Exper- iment 1 (triangles). This result is not difficult to understand. For listeners R and X, their transition from Experiment 1 (triangles) were on the lower frequency side of their transition from Experiment 2 (squares), which means that they were “insen- sitive” listeners, i.e. they need a lot of information for successful localization (for R, 0-13 kHz or 7-16 kHz; for X, 0-8 kHz or 6-16 kHz). Therefore, when one of the two speakers presented suflicient front /back cue, the other speaker could not present adequate front / back cue for these insensitive listeners. Thus listeners’ results in Ex- 239 periment 11 should agree with their results without competing from Experiments 1 and 2. On the other hand, for listeners L and D (A-shape listeners), their results from Experiment 1 (triangles) were on the higher frequency side of their results from Ex- periment 2 (squares), which means that they were sensitive listeners, i.e. they could localize front / back sources with very little information (for L, 0-5 kHz or 11—16 kHz; for D, 05 kHz or 12—16 kHz). Therefore when the boundary frequency was at mid- freqency range, both speakers presented adequate front / back cues for these sensitive listeners. Thus they were conflicted in voting for front or back, and the result might depend on which frequency band they happened to pay attention to. That is why their results from Experiment 11 were quite different from their results from Experi- ments 1 and 2. For listener D, there is still some similarity between his results from Experiments 11 (solid circles) and Experiment 1 (triangles). Apparently, listener D chose to pick high-frequency cues until the boundary had risen so that there was hardly any power in the high-frequency speaker. For listeners F and A (X-shape listeners), their PCBs from Experiment 1 (tri- angles) and Experiment 2 (squares) overlapped. Hence they were between sensitive and insensitive listeners, and their results from Experiment 11 roughly followed their flipped results from Experiment 1 (triangles) and their results from Experiment 2 (squares), but not as well as insensitive listeners. 4.4 Auxiliary testing in ordinary room Experiments with the VRX technique were normally performed in the anechoic room, where there was no reflection from the walls, and listener’s localization percep- tion was optimum. However the VRX technique might be valuable for other people who do not have access to anechoic room. Furthermore, in the anechoic room, the 240 frequency (kHz) reverberation time (second) l0.25 0.5 I 2 5 8 16 10.9 0.8 0.8 0.9 0.8 0.7 0.4 Table 4.6: Reverberation time of Room 10B (ordinary room) chair was on a wire—grid floor, and there might be error due to the movement of the chair. To reduce the error, the listener sometimes had to wait a long time for the chair to stop waggling. Hence one advantage in the ordinary room is that the chair on the hard floor was very stable, and there was no error due to the movement of the chair. Therefore it is worthwhile to test the technique in an ordinary room. Table 4.6 shows the reverberation time of the room at various frequencies, as measured by Hartmann et al. (2005). The average reverberation time was about 0.8 seconds. 4.4.1 Source speakers at 5 feet In the first test, the setup of loudspeakers and chair was exactly the same as in the anechoic room. The front and back speakers were 5 feet away from listener, and the simulation loudspeakers on the sides were close to listener, about 1.2 feet away. The top block in Table 4.7 shows the score in the confirmation runs. Two runs were performed for each condition and each of the front and back sources, and there were 20 trials in each confirmation run. Average and absolute error in scale of percentage were calculated. When the score was between 25% and 75%, it was inferred that listener could not discriminate real and virtual signals, and the synthesis passed the confirmation; otherwise the confirmation failed. Listener W found that the real signal decayed more slowly than the virtual signal, and hence successfully discriminated the real and virtual signals. Thus the synthesis did not pass the confirmation. Listener X was also able to discriminate real and virtual signals, and he agreed with listener W’s observation, and found that real signals have some ringing in the tail, whereas virtual 241 signals were more dead and stopped more quickly after the steady state. Listeners A also reported that the decay of the real signals were longer. However, listener Q reported that the virtual signals had longer decay, different from all the other listeners. This report by listener Q was consistent throughout his runs, including the following experiments, and his close—to—perfect correct-scores and his report on externalization (he responded that the virtual signals were more diffuse and close to head, which was normally a common feedback for virtual signals) confirmed that he was not confused between the real and virtual signals. Meanwhile, with these subtle differences, all of the four listeners agreed that they could not discriminate real and virtual signals by only listening to the steady state without the tail. When asked for an externalization score (0 for totally inside head, and 3 for ideally externalized) for the simulation (i.e. the virtual signals), listeners A and X rated 3, and listeners W and Q rated above 2 except for listener Q for the back speaker, which was 1.5, about half of the perfect score. Hence the externalization of the simulation signals was in general fairly good. In conclusion, the stimuli in the ordinary room failed the confirmation test. 4.4.2 Source speakers at 1.2 feet A possible cause for the difference in the tail of real and virtual signals is that source speakers were farther from the listener than the synthesis speakers, and thus the signal level at the source speakers was higher than the synthesis speakers, which caused more reverberation in the room.5 To test this hypothesis, the front source speaker was moved to 1.2 feet away from the listener, about the same distance as the synthesis speakers. The results are shown in the middle block in Table 4.7. All listeners found that the differences between real and virtual signals were much more 5This explanation works for listeners A, W and X, but fails for listener Q, who heard the virtual signals had a longer decay. It is really puzzling. Maybe the sensation of a longer decay for listener Q was due to some pitch contour. 242 listener distance source run run percentage externalization ’ ' ' (feet) ‘ ' #1 #2 of virtual trials A 5 front 19/20 20/20 97.53: 2.5 % 3 back 20/20 20/20 100.03: 0.0 0/0 3 ,_ front 17/20 19/20 90.0:l: 5.0 % 2 Q ° back 17/20 19/20 90.0i 5.0 % 1.5 W 5 front 20/20 20/20 100.03: 0.0 ‘70 2.5 back 18/20 17/20 87.5i 2.5 % 2 X 5 front 18/20 20/20 95.0i 5.0 ‘70 3 back 20/20 20/20 100.0i 0.0 ‘7 3 A 12 front 20/20 18/20 95.03: 5.0 % 3 ' back 20/20 20/20 100.0:l: 0.0 % 3 Q 1 2 front 20/20 17/20 92.5:L 7.5 % 3 ' back 19/20 18/20 92.52}: 2.5 % 3 W 1 2 front 6/13 6/15 42.93: 3.3 70 * ' back 6/15 7/13 46.4:t 6.4 % * x 12 front 16/20 17/20 82.5:t 2.5 % * ' back 20/20 20/20 100.0:t 0.0 % * A 1.2 front 9/15 6/13 536i 6.4 % 3 longer window back 18/20 6/12 75.0:l:15.0 % 3 1.2 front 12/18 11/17 65.7:l: 1.0 % 3 Q longer window back 7/13 6/ 12 52.03: 1.8 ‘70 3 x 1.2 front 18/20 20/20 95.02: 5.0 % * longer window back 20/ 20 20/ 20 100.0:l: 0.0 (70 * Table 4.7: Correct score and externalization (0 to 3) of confirmation runs in Room 108 (ordinary room). Externalization was not measured for those runs with a star symbol for listeners W and X. However, W and X did not report less externalization for those runs, compared with previous runs, therefore the externalization scores for those two listeners in the top block can be taken as approximate externalization for the bottom two blocks. 243 subtle, and the task was harder to do. However three out of the four listeners (A, Q and X) could still succeed the task. Listeners A and X heard a pitch contour in the tail as a cue for the virtual signal. Listener A found that the tail of virtual signals had a high increasing pitch whereas real signals do not. Listener X found in the tail that the real signal had a pitch going up and the virtual signal had a pitch going down. These pitch-contour cues can be understood as follows. Different speakers at various positions excited different modes, having different frequencies, in the room. Those modes had different decay rates, and therefore as time passed, the frequency might change in a characteristic way. Different from listeners A, Q and X, who did not let synthesis pass the confir- mation, listener W was the only one who had score of percent correct falling below the 75%-threshold, and therefore the synthesis passed the confirmation. Listener W’s subjective response confirmed that, with the stimuli in this testing, he could not discriminate real and virtual signals at all. 4.4.3 Signal with slow onset and offset In the previous testing, three out of four listeners could successfully discriminate real and virtual signals. A possible improvement on virtual signal is to make the stimulus decay more slowly, so that the reverberation from the room is less evident. The onset and offset were increased from 100 ms, which was the normal setup of VRX experiments, to 320 ms, which was noticeably slower. Three of the four listeners from the previous testing, namely A, Q and X, partici- pated in this testing with longer decay time. Since listener W could not discriminate real and virtual signals even with short window, it is not meaningful for him to par- ticipate in this testing. The results of this testing are shown in the bottom block in Table 4.7. For listeners A and Q, the percent correct fell below the 75%-threshold, indicating 244 a failure in discriminating real and virtual signals. (Listener A had one successful run, namely the first run for the back speaker, which boosted the average score for the back speaker to 75%. But, most of the time, she failed the task. Listener Q always failed the task. In general, the percent correct was never greater than the 75%- threshold.) Both A and Q found the cues they used in the previous testing were very, very weak, and therefore the task was very hard. For both listeners, the signals were externalized perfectly with a score of 3. For listener X, however, the synthesis still could not pass the confirmation with a. score close-to-perfect. This time, although the decays of both real and virtual signals were of similar duration and no frequency contour was detected, the timbres were nevertheless different: the tails in real signals had a ringing tone, whereas the tails in virtual signals did not. This timbre difference might be due to the interference between the room modes and the played stimuli during the decaying tail. So in general, although the simulation in ordinary room by the VRX technique was very similar to the real signal, there was a noticeable difference in the tail, due to the subtle difference in reverberation of the room. The simulation could be improved by putting the source speaker closer to the listener, and by a slow windowing, however there was still one listener (listener X) who could discriminate real and virtual signals. On the other hand, all of the four listeners agreed that without listening to the tail, the simulation was excellent and the real and virtual signals could not be discriminated just by listening to the steady state. For most (three out of four) listeners, the cues occurred in the tail. Therefore if one wants to make perfect simulation as in the anechoic room, it seems that more detailed impulse responses must be included, or else the reverberation tail must be masked with noise. 245 4.4.4 Front / back discrimination With such close—to—perfect simulation in the ordinary room, one signal was tested for front / back discrimination. The front and back source speakers were 5 feet away from listener, the same setup as in the anechoic room, and the signal was the same as in Experiment 10: the right-ear channel was advanced by an ITD. The same four listeners, A, Q, W and X, who were in the previous testing in the ordinary room, participated in this experiment. The signals with two different ITDs, i.e. 200 and 600 as, were presented, and the results are shown with circles and squares, respectively, in Figure 4.73. Listeners A and Q did four complete runs for each ITD, as in Experiment 10 in the anechoic room. Their data are shown with open symbols, and the error-bars show the standard deviations. The data of Listeners W and X are shown as solid symbols, because they did not complete four runs for each condition. Instead, listener W did two runs for each IT D, and thus the error-bar shows the absolute error; listener X did only one run for ITD of 200 as, and hence there is no error-bar. Figure 4.73 shows that, except for listener W with ITD of 600 as, all the results were above the 75%-threshold, i.e. the listeners could discriminate front / back sources in an ordinary room with normal reverberation using localization cues. Listener X did the experiment with ITD of 200 as, and scored perfectly. In the anechoic room, he had found that, compared with baseline, the adjusted signal for the front source sounded more diffuse and less well externalized, and was offset to the right; whereas the adjusted signal for the back source still sounded compact and well externalized, and was also offset to the right. In the ordinary room, he found that the adjusted signal for the front source sounded compact and well externalized with an externalization score of 3 on a scale of 0 (totally diffuse or inside head) to 3 (perfectly externalized). He also found that the adjusted signal was localized clearly in the center, where the baseline was localized. The adjusted signal for the back 246 100 ~ ' ¥F i. o' . A 90' Q § . N I . : 80- Q . 3 70- . L . E , ‘5 60- . U . . ,_ 50 ......................................... .. s * . 8 40f j g; 30toono=2oons 1 E] I ITD = 600 us 20 . . . 1 . A a w x Listener Figure 4.73: Result of Experiment 10 in ordinary room (Room 10B). An open symbol shows the mean and standard deviation of four runs. Solid symbols are results for listeners W and X, who did not complete four runs for this experiment. Listener W did two runs for each ITD, and the error-bars are absolute errors. Listener X did only one run for ITD of 200 #S, and therefore there is no error-bar for him. source sounded compact and well externalized, and was offset to the right, however with much less distance than in the anechoic room. When listener W did these experiments, he scored almost perfectly (missed 1 out of 40 trials), and he found that both baseline and adjusted signal were externalized fairly well (with an externalization score of 2.5 for the front source, and 2 for the back source). For him, the baseline and adjusted signals sounded identical. For the front source, both the baseline and adjusted signals were elevated and offset to the right; and for the back source, both of them were offset to the left. When listener W did experiments with ITD of 600 [L8, close to the physiological range of human, he could easily discriminate baseline and adjusted signals, and found that the adjusted signal was more diffuse and less externalized (1 for the externalization score) than the baseline. He tended to hear the adjusted signals in the back (34 out of 40, 85%). The percent correct for 600 us is much worse than that for 200 [18, which is not surprising because the externalization for 600 as was less good and therefore the listener was 247 more confused. With ITD of 200 ifs, listener A found that the sound image of the adjusted signal was perfectly externalized with an externalization score of 3. The front adjusted signal was displaced to the right by about 1 foot, and the back adjusted signal was also displaced to the right, but only slightly. These results are expected because the right-ear signal was advanced, and the results also agree with the results by other listeners in the anechoic room. With ITD of 600 as, listener A found the task harder to do, and the images were on the right, but very close to head, and therefore the externalization of the image was less good with a score of 2.5. In general, listener A found the localization cues very strong, and the results on percent correct were always above 84%. Listener A also demonstrated the same trend as for listener W, i.e. the percent correct for 200 ,us is better than that for 600 as, although the difference is not as large. For listener Q, the results were similar to those of listener A. For ITD of 200 [.LS, listener Q rated the front signals with an externalization of 2, and found the images about 4 feet away from head. For the back signals, he rated the externalization as 1.0, and reported that the images were about 2 feet away from head. For both front and back signals, he found the signals to be 30° to the right. For ITD of 600 ps, listener Q gave an externalization score of 1, and reported that the images were very close to his head with a distance less than a foot. The images were about 45° to the right, farther on the right than the images for 200 as. In general, his feedback on the externalization and location was very similar to the feedback from other listeners in Experiment 10 in the anechoic room. His performance on front / back discrimination was very good, and his performance for 200 as is better than that for 600 ps, agreeing with the results for listeners W and A. In summary, with this specific signal. i.e. advancing the right—ear channel, lis- teners could localize the synthesis in an ordinary room with normal reverberation 248 and succeed in the task of front / back discrimination using localization cues for an ITD of 200 as; for ITD of 600 ,us, two out of the three listeners could discriminate front /back with a high score (above 84%). Because only listener X did this experi- ment in both the ordinary room and anechoic room, it might not be fair to compare the results in both rooms, especially with large individual differences we have found in VRX experiments. However, there seems to be some general tendency we can sum- marize. Listener X found the displacement to the right in the ordinary room much less than in the anechoic room. Although listeners A and Q did not do Experiment 10 in the anechoic room, the displacement that they reported in the ordinary room was very little, i.e. at a distance of 5 feet away, the displacement to the right was never more than a feet, which is smaller than the results in the anechoic room by other listeners. In summary three out of four listeners (except for Listener W) found that the displacement is indeed to the right, as in the anechoic room. Listener W, however, found the sound image to be elevated, and sometimes even displaced to the left, which was never reported by other listeners who participated in Experiment 10 in the anechoic room. The difference between the anechoic room and the ordinary room must be due to the standing waves that were established in the ordinary room, which might even add unwanted ILD cues. On the other hand, the good performance by listeners has confirmed that the spectral information in the simulation was well preserved, indicating that VRX technique can be applied in an ordinary room. For signals which the room does not add strong cues to destroy the original localization cues, e.g. advancing the right-ear channel with a small ITD of 200 as, listeners could successfully discriminate front / back signals, as in the anechoic room. 249 4.5 Conclusion The VRX technique was developed to simulate external complex sound sources using transaural synthesis in an anechoic room with two synthesis loudspeakers. Sim- ulation was good up to 16 kHz. Sound images were externalized very well, and listen- ers could not discriminate real and virtual signals. When tested in an ordinary room with normal reverberation time, listeners could discriminate real and virtual signals by subtle differences in the decay. Apart from this subtle difference, the steady state of real and virtual signals sounded identical, and externalization of the simulation was normally good. The VRX technique is a very good tool to present any desired spectra to listeners’ ears, while giving them an opportunity to use their own pinna cues. The technique is expected to be more accurate than any method using headphones (Kulkarni and Colburn, 2000). Eleven experiments, ten of which used the VRX technique, were per- formed on front / back discrimination to discover the importance of various front / back cues. There were large individual differences among listeners, suggesting that listeners have learned to use their own ears and developed quite different strategies in localizing front and back sound sources. The large individual differences in performance must be due to the large individual differences in the head-related transfer functions, or specifically, in the directional transfer functions, which were found highly correlated with geometric properties of listeners’ ears and heads (Middlebrooks, 1999). In testing different models for the use of spectral cues, the experiments showed evidence supporting a multiple band model. For any given listener, no frequency band is absolutely necessary, and he can use information in available bands, or he may even compare across various available bands, to make decisions on front / back localization (Experiments 1 through 4). When examining peaks and dips in front / back cues, our experiments found that dips were more important than peaks for accurate front/ back localization (Experi- 250 ments 5 and 6). Monaural cues and interaural spectral level difference cues were not sufficient. for correct front /back judgement for most of our listeners (Experiments 7 and 8). Applying interaural time delay up to 200 us did not ruin listeners’ front/back judgement, although the sound image did offset to one side (Experiment 10). As the ITD increased, most listeners’ performance decreased with individual differences, and most listeners could not follow front / back cues with ITD more than 800 as. To evaluate Blauert’s frequency-band model and the derivative model by Zaka- rauskas and Cynader, sharpened spectra were presented to listeners (Experiment 9). The experiments showed that listeners discriminated sharpened spectra better or equally well compared with baseline spectra. This result supports Blauert’s model, and disfavors the computational model by Zakarauskas and Cynader. In VRX experiments, some listeners appeared to be insensitive, while others were sensitive. Sensitive listeners are capable of making correct front / back decisions based on little information, whereas insensitive listeners require more information. In an ex— periment with competing cues (Experiment 11), insensitive listeners showed similarity to analogous experiments with non-competing cues (Experiments 1 and 2) because, when presented with competing cues, these listeners were sensitive to at most one of these cues, and the competition does not actually arise; however for sensitive listen- ers, because little information was adequate for them, the cues could compete in the auditory system, and their results for competing cues were quite different from their results for non-competing cues. 251 Appendix A References A.1 References for Chapter 1 Akeroyd, MA. and Summerfield, A.Q. (2000) “The lateralization of simple dichotic pitches,” J. Acoust. Soc. Am. 108, 316-334. Akeroyd, M.A., Moore, B.C. J ., and Moore, GA. (2001) “Melody recognition using three-types of dichotic pitch stimulus,” J. Acoust. Soc. Am. 110, 1498-1504. Bilsen, FA. (1972) “Pitch of dichotically delayed noise,” in Hearing Theory - 1972, ed. E. de Boer, B.L. Cardozo, and R. Plomp, IPO, Eindhoven, Holland, pp 5-8. Bilsen, FA. (1976) “Pronounced binaural pitch phenomenon,” J. Acoust. Soc. Am. 59, 467-468. Bilsen, F .A. (2000) personal communication. Bilsen, FA. and Goldstein, J .L. (1974) “Pitch of dichotically delayed noise and its possible spectral basis,” J. Acoust. Soc. Am. 55, 292-296. Bilsen, F .A. and Raatgever, J. (2000) “On the dichotic pitch of simultaneously presented interaurally delayed white noises. Implications for binaural theory,” J. Acoust. Soc. Am. 108, 272-284. Bilsen, FA. and Raatgever, J. (2002) Demonstrations of Dichotic Pitch, compact disc, available from the authors at Perceptual Acoustics Laboratory, Applied Physics Department, Delft University of Technology, Delft, The Netherlands. Brand, A., Behrend, O., Marquardt, T., McAlpine, D., and Grothe, B. (2002) “Pre- cise inhibition is essential for microsecond interaural time difference coding,” Nature 417, 543-547. 252 Colburn, HS. (1973) “Theory of binaural interaction based on auditory-nerve data I. General strategy and preliminary results on interaural discrimination,” J. Acoust. Soc. Am. 54, 1458-1470 Colburn, HS. (1977) “Theory of binaural interaction based on auditory-nerve data II. Detection of tones in noise,” J. Acoust. Soc. Am. 61, 525-533. Cramer, EM. and Huggins, W.H. (1958) “Creation of pitch through binaural in- teraction,” J. Acoust. Soc. Am. 30, 413-417. Culling, .I.F., Summerflcld, A.Q., and Marshall, D.H. (19983) “Dichotic pitches as illusions of binaural unmasking I. Huggins pitch and the binaural edge pitch,” J. Acoust. Soc. Am. 103, 3509-3526. Culling, J .F ., Marshall, DH, and Summerfield, A.Q. (1998b) “Dichotic pitches as illusions of binaural unmasking II. The Fourcin pitch and the dichotic repetition pitch,” J. Acoust. Soc. Am. 103, 3527-3540. Domnitz, RH. and Colburn, HS. (1976) “Analysis of binaural detection models for dependence on interaural target parameters,” J. Acoust. Soc. Am. 59, 598-601. Durlach, N .I. (1960) “Note on the equalization and cancellation theory of binaural masking level differences,” J. Acoust. Soc. Am. 32, 1075-1076. Durlach, NI. (1972) “Binaural signal detection - equalization and cancellation the- ory,” in Foundations of Modern Auditory Theory, volume.2, ed. J. Tobias, Academic Press, NY, pp 369-462. Frijns, J .H.M., Raatgever, J., and Bilsen, FA. (1986) “A central spectrum theory of binaural processing. The binaural edge pitch revisited,” J. Acoust. Soc. Am. 80, 442-451. Fourcin, A.J. (1962) “An aspect of the perception of pitch,” Proc. 4th Int]. Congr. Phonetic Sciences, Helsinki, Mouton and Co, The Hague, pp 355-399. Fourcin, A.J. (1970) “Central pitch and auditory localization,” in Frequency Analy- sis and Periodicity Detection in Hearing, ed. R. Plomp and GP. Smoorenburg, A.W. Sitjhoff, Leiden, pp 319-328. Gabriel, K.J. and Colburn, HS. (1981) “Interaural correlation discrimination I. Bandwidth and level dependence,” J. Acoust. Soc. Am. 69, 1394-1401. Good, M.D., Gilkey RH, and Ball, J .M. (1994) “The relation between detection in noise and localization in noise in the free field,” in Binaural and Spatial Hearing in Real and Virtual Environments, ed. R.H. Gilkey and TR. Anderson, Lawrence Erlbaum Associates, Mahwah, New Jersey, pp 349-376. 253 Green, D.M. (1966) “Signal-detection analysis of equalization and cancellation model,” J. Acoust. Soc. Am. 40, 833-838. Grothe, B. (2000) “The evolution of temporal processing in the medial superior olive, an auditory brainstem structure,” Progress in Neurobiology 61, 581-610. Guttman, N. (1962) “Pitch and loudness of a binaural subjective tone,” J. Acoust. Soc. Am. (abst) 34, 1996. Bell Telephone Laboratories memorandum MH- 1232-NG-KN, 30 Nov. 1962 Hartmann, W.M. (1993) “On the origin of the enlarged melodic octave,” J. Acoust. Soc. Am. 93, 3400—3409. Hartmann, W.M. and McMillan, CD. (2001) “Binaural coherence edge pitch,” J. Acoust. Soc. Am. 109, 294-305. Hartmann, W.M. and Zhang, P.X. (2003) “Binaural models and the strength of dichotic pitches,” J. Acoust. Soc. Am. 114, 3317-3326. Heijden, M. van der and Trahiotis, C. (1999) “Masking with interaurally delayed stimuli: The use of internal delays in binaural detection,” J. Acoust. Soc. Am. 105, 388-399. Jeffress, L.A. (1948) “A place theory of sound localization,” J. Comp. Physiol. Psychol. 41, 35-39. Klein, MA. and Hartmann, W.M. (1980) “Binaural edge pitch,” J. Acoust. Soc. Am. 70, 51-61. Raatgever, J. (1980) On the binaural processing of stimuli with difi‘erent interaural phase relations, Thesis, Delft, (unpublished). Raatgever, J. and Bilsen, FA. (1986) “A central spectrum theory of binaural pro- cessing. Evidence from dichotic pitch,” J. Acoust. Soc. Am. 80, 429-441. Shackleton, T.M., Meddis, R., and Hewitt, M.J. (1992) “Across frequency integra- tion in a model of lateralization,” J. Acoust. Soc. Am. 91, 2276-2279. Stern, RM. and Colburn, HS. (1978) “Theory of binaural interaction based on auditory-nerve data IV. A model for subjective lateral position,” J. Acoust. Soc. Am. 64, 127-140. Stern, RM. and Trahiotis, C. (1995) “Models of Binaural Interaction,” in Handbook of Perception and Cognition - Hearing ed. B. Moore, Academic, San Diego, pp 347-386 Stern, R.M., Zeiberg, AS, and Trahiotis, C. (1988) “Lateralization of complex binaural stimuli A weighted-image model,” J. Acoust. Soc. Am. 84, 156-165. 254 Wilbanks, WA. and Whitmore, J .K. (1967) “Detection of monaural signals as a function of interaural noise correlation and signal frequency,” J. Acoust. Soc. Am. 43, 785-797. A.2 References for Chapter 2 Akeroyd, MA. (2003) Personal communication. Akeroyd, MA. and Summerfield, A.Q. (2000) “The lateralization of simple dichotic pitches,” J. Acoust. Soc. Am. 108, 316—334. Bilsen, F .A. (2003) Personal communication. Cramer, EM. and Huggins, W.H. (1958) “Creation of pitch through binaural in- teraction,” J. Acoust. Soc. Am. 30, 413-417. Culling, J .F . (2002) Personal communication. Grange, AN. and Trahiotis, C. (1996) “Lateral position of dichotic pitches can be substantially affected by interaural intensive differences,” J. Acoust. Soc. Am. 100, 1901-1904. Hafter, E.R., Bourbon, W.T., Blocker, AS, and Tucker, A. (1969) “A direct comparison between lateralization and detection under conditions of antiphasic masking.” J. Acoust. Soc. Am. 46, 1452-1457. Hafter, ER. and DeMaio, J. (1975) “DifferencI-a thresholds for interaural delay,” J. Acoust. Soc. Am. 57, 181-187. Hartmann, W.M. and Zhang, P.X. (2003) “Binaural models and the strength of dichotic pitches,” J. Acoust. Soc. Am. 114, 3317-3326. Jeffress, L.A. (1948) “A place theory of sound localization,” J. Comp. Physiol. Psychol. 41, 35-39. Jeffress, L.A. (197 2) “Binaural signal detection: Vector theory,” in Foundations of Modern Auditory theory, ch 9, ed. J .V. Tobias, Academic, New York. Raatgever, J. (1980) On the binaural processing of stimuli with different interaural phase relations, Thesis, Delft, The Netherlands (unpublished). Raatgever, J. and Bilsen, FA. (1986) “A central spectrum theory of binaural pro- cessing. Evidence from dichotic pitch,” J. Acoust. Soc. Am. 80, 429-441. Stern, R.M., Zeiberg, AS, and Trahiotis, C. (1988) “Lateralization of complex binaural stimuli A weighted-image model,” J. Acoust. Soc. Am. 84, 156-165. 255 \Nhitworth, RH. and J effress, L.A. (1961) “Time versus intensity in the localization of tones,” J. Acoust. Soc. Am. 33, 925-929. Yost, WA. (1981) “Lateral position of sinusoids presented with interaural intensive and temporal differences,” J. Acoust. Soc. Am. 70, 397-409. Yost, WA. (1991) “Thresholds for segregating narrow-band noise from broadband noise based on interaural phase and level differences, ” J. Acoust. Soc. Am. 89, 838-844. A.3 References for Chapter 3 Constan, Z.A. and Hartmann, W.M. (2003) “On the detection of dispersion in the head-related transfer function,” J. Acoust. Soc. Am. 114, 998-1008. Dye, RH. (1988) “The combination of interaural information across frequencies: lateralization on the basis of interaural delay,” J. Acoust. Soc. Am. 88, 2159- 2170. Jeffress, L.A. (1948) “A place theory of sound localization,” J. Comp. Physiol. Psychol. 41, 35-39. Kuhn, GP. (1977) “Model for the interaural time differences in the azimuthal plane,” J. Acoust. Soc. Am. 62, 157-167. Kuhn, CF. (1979) “The pressure transformation from a diffuse sound field to the external ear and to the body and head surface,” J. Acoust. Soc. Am. 65, 991-1000. McAlpine, D., Jiang, D. and Palmer, AR. (2001) “A neural code for low-frequency sound localization in mammals,” Nature Neurosci. 4, 396-401. Sayers, B. McA. (1964) “Acoustic-image lateralization judgments with binaural tones,” J. Acoust. Soc. Am. 36, 923-926. Stern, R.M., Zeiberg, A.S., Trahiotis, C. (1988) “Lateralization of complex binaural stimuli A weighted-image model,” J. Acoust. Soc. Am. 84, 156-165. Strutt, J .W. (1907) “On our perception of sound direction,” Phil Mag. 13, 214-232. Strutt, J .W. (1909) “On our perception of the direction of sound,” Proc. Roy Soc. 83, 61-64. Yost, WA. (1981) “Lateral position of sinusoids presented with interaural intensive and temporal differences,” J. Acoust. Soc. Am. 70, 397-409. 256 Yost, WA. and Hafter, ER. (1987) “Lateralization,” in Directional Hearing, ed. W.A. Yost and G. Gourevitch, Springer, New York, p. 62. Zhang, P.X. and Hartmann, W.M. (2006) “Lateralization of sine tones—interaural time vs phase,” J. Acoust. Soc. Am. 120, 3471-3474. A.4 References for Chapter 4 Algazi, V.R., Avendano, C., and Duda, R.O. (2001) “Elevation localization and head-related transfer function analysis at low frequencies,” J. Acoust. Soc. Am. 109, 1110-1122. Asano, F., Suzuki, Y., and Sone, T. (1990) “Role of spectral cues in median plane localization,” J. Acoust. Soc. Am. 88, 159-168. Blauert, J. (1969 / 70) “Sound localization in the median plane,” Acustica 22, 205- 213. Blauert, J. (1983) Spatial hearing: the psychophysics of human sound localization, MIT Press, MA, 1983. Bloom, P.J. (1977a) “Determination of monaural sensitivity changes due to the pinna by use of minimum-audible field measurements in the lateral vertical plane,” J. Acoust. Soc. Am. 61, 820-828. Bloom, P.J. (1977b) “Creating source elevation illusions by spectral manipulation,” J. Audio Engr. Soc. 25, 560-565. Hartmann, W.M. and Rakerd, B. (1993) “Auditory spectral discrimination and the localization of clicks in the sagittal plane,” J. Acoust. Soc. Am. 94, 2083-2092. Hartmann, W.M. and Wittenberg, A. (1996) “On the externalization of sound im- ages,” J. Acoust. Soc. Am. 99, 3678-3688. Hartmann, W.M., Rakerd, B., and Koller, A. (2005) “Binaural coherence in rooms”, Acta Acustica United with Acustica, Vol. 91, 451-462. Hebrank, J. and Wright, D. (1974a) “Are two ears necessary for localization of sound sources on the median plane?” J. Acoust. Soc. Am. 56, 935-938. Hebrank, J. and Wright, D. (1974b) “Spectral cues used in the localization of sound sources on the median plane,” J. Acoust. Soc. Am. 56, 1829-1834. Hofman, P.M., Van Riswick, J .G.A., and Van Opstal, A.J. (1998) “Relearning sound localization with new ears,” Nature Neuroscience 1, 417-421. 257 Jin, C., Corderoy, A., Carlile, S., and Schaik A. (2004) “Contrasting monaural and interaural spectral cues for human sound localization,” J. Acoust. Soc. Am. 115, 3124-3141. Kulkarni, A., Isabelle, SK, and Colburn, HS. (1999) “Sensitivity of human sub- jects to head-related transfer-function phase spectra,” J. Acoust. Soc. Am. 105, 2821-2840. Kulkarni, A., and Colburn, HS. (2000) “Variability in the characterization of the headphone transfer-function,” J. Acoust. Soc. Am. 107, 1071-1074. Langendijk, E.H.A. and Bronkhorst, A.W. (2002) “Contribution of spectral cues to human sound localization,” J. Acoust. Soc. Am. 112, 1583-1596. Macpherson, EA. and Middlebrooks, J .C. ( 1999) “Sound localization illusions pro— duced by source spectrum discontinuities,” Assoc. Res. Otolaryngology Ab- stracts. hillacpherson, EA. and Middlebrooks, J .C. (2003) “Vertical-plane sound localization probed with ripple-spectrum noise,” J. Acoust. Soc. 114, 430-445. Macpherson, E.A., Sabin, A.T., and Middlebrooks, J .C. (2004) “Binaural weighting of monaural spectral cues for sound localization,” Assoc. Res. Otolaryngology Abstracts. Mellert, V. (1971) “Directional hearing in the median plane and diffraction of sound around the head,” thesis, Gottingen, cited by Blauert (1983). Middlebrooks, J .C. (1992) “N arrow-band sound localization related to external ear acoustics,” J. Acoust. Soc. Am. 92, 2607-2624. Middlebrooks, J .C. (1999) “Individual differences in external-ear transfer functions reduced by scaling in frequency,” J. Acoust. Soc. Am. 106, 1480-1492. Middlebrooks, J ., Green D. (1990) “Directional dependence of interaural envelope delays,” J. Acoust. Soc. Am. 87, 2149-2162. Morimoto, M. (2001) “The contribution of two ears to the perception of vertical angle in sagittal planes,” J. Acoust. Soc. Am. 109, 1596—1603. Musicant, AD. and Butler, RA. (1984) “The influence of pinnae-based spectral cues on sound localization,” J. Acoust. Soc. Am. 75, 1195-1200. Oxenham, A.J. and Dan, T. (2001) “Reconciling frequency selectivity and phase effects in masking,” J. Acoust. Soc. Am. 110, 1525-1538. Sabin, A.T., Macpherson, E.A., and Middlebrooks, J .C. (2005) “Vertical-plane lo— calization of sounds with distorted spectral cues,” Assoc. Res. Otolaryngology Abstracts. 258 Schroeder, MR. (197 0) “Synthesis of low-peak-factor signals and binary sequences with low autocorrelation,” IEEE Trans. on Information Theory, IT—16, pp 85- 89. Schroeder, M.R. and Atal, BS. (1963) “Computer simulation of sound transmission in rooms,” IEEE Intl. Conv. Rec. 11, 150—155. Shaw, E.A.G. and Teranishi, R. (1968) “Sound pressure generated in an external- ear replica and real human ears by a nearby point source,” J. Acoust. Soc. Am. 44, 240-249. Smith, B.K., Sieben, U.K., Kohlrausch, A., and Schroeder, MR. (1986) “Phase effects in masking related to dispersion in the inner ear,” J. Acoust. Soc. Am. 80, 1631-1637. Vliegen, J. and Van Opstal, A.J. (2004) “The influence of duration and level on human sound localization,” J. Acoust. Soc. Am. 115, 1705-1713. Watkins, A.J. (1978) “Psychoacoustical aspects of synthesized vertical locale cues,” J. Acoust. Soc. Am. 63, 1152-1165. Wightman, FL. and Kistler, D.J. (1989a) “Headphone simulation of free-field lis- tening. I: stimulus synthesis,” J. Acoust. Soc. Am. 85, 858-867. Wightman, EL. and Kistler, D.J. (1989b) “Headphone simulation of free-field lis- tening. II: psychophysical validation,” J. Acoust. Soc. Am. 85, 868-878. Zahorik, P., Bangayan, P., Sundareswaran, V., Wang, K., and Tam, C. (2006) “Perceptual recalibration in human sound localization: learning to remediate front-back reversals,” J. Acoust. Soc. Am. 120, 343-359. Zakarauskas, P. and Cynader, MS. (1993) “A computational theory of spectral cue localization,” J. Acoust. Soc. Am. 94, 1323-1331. 259