Ph.D. dissertation available (Brian Lykkegaard Karlsen )

Subject: Ph.D. dissertation available From: Brian Lykkegaard Karlsen <blyk(at)ACOUSTICS.AUC.DK> Date: Wed, 3 May 2000 08:54:45 +0200 Dear list, Since Al Bregman would like to see some more dissertation announcements, = I've decided to let you know where you can get a copy of my dissertation. The = final version of it was submitted in December 1999 to the Faculty of Science an= d Engineering at Aalborg University, Denmark. The title is "Spatial localization of speech segments" and it concerns localization of speech segments under adverse (speech-shaped) noise condi= tions. This is where you can get it: http://www.acoustics.auc.dk/~blyk/karlsen_phd.zip The file format is zipped postscript (9.8 Mb). I also have a limited numb= er of hardcopies, which I would be happy to send to you, if you prefer this - j= ust send me an email at mailto:blyk(at)acoustics.auc.dk . Abstract and table of contents follows. ----------------------------------------------------------------------- Spatial Localization of Speech Segments Brian L. Karlsen Center for PersonKommunikation (now: Department of Acoustics) Aalborg University Denmark Much is known about human localization of simple stimuli like sinusoids, clicks, broadband noise and narrowband noise in quiet. Less is known about human localization in noise. Even less is known about localization of speech and very few previous studies have reported data from localization of speech in noise. This study attempts to answer the question: ``Are there certain features of speech which have an impact on the human ability to determine the spatial location of a speaker in the horizontal plane under adverse noise conditions?=B4=B4. The study consists of an extensive literature survey on Psychoacoustics, Physiology of Hearing and Computational Hearing looking at both normal hearing and hearing impairment, as well as a psychoacoustical localization experiment in the horizontal plane of space and a localization model built in order to attempt to explain some of the processes involved when humans do the task of the experiment. The psychoacoustical experiment used naturally-spoken Danish consonant-vowel combinations as targets presented in diffuse speech-shaped noise at a peak SNR of -10 dB. The subjects were normal hearing persons. The experiment took place in an anechoic chamber where eight loudspeakers were suspended so that they surrounded the subjects in the horizontal plane. The subjects were required to push a button on a pad indicating where they had localized the target to in the horizontal plane. The response pad had twelve buttons arranged uniformly in a circle and two further buttons so that the subjects could indicate if they had not heard the target or if they had heard it, but could not localize it. The model consists of three overall parts. One part assigns directional estimates to time-frequency components on the basis of interaural time difference and front/back templates. Another part does grouping by weighting the time-frequency components of the target on the basis of the partial specific loudness. Finally, the information of these two parts is combined, integrated across time, converted to azimuth angles and integrated across frequency to yield a probability distribution of which azimuth angle the target is likely to have originated from. The model is trained on the experimental data. On the basis of the experimental results, it is concluded that the human ability to localize speech segments in adverse noise depends on the speech segment as well as its point of origin in space. A comparison between the experimental data and the data produced by the model for the same stimuli reveals that the model is capable of reproducing the overall structure of the experimental data. This may indicate that the overall structure of the model is on the right track. ---------------------------------------------------------------------- Table of Contents Abstract v Dansk Resume vii Acknowledgments ix Preface xi 1 Introduction 1 1.1 The field 1 1.2 This study 2 2 Literature Survey 5 2.1 Normal hearing persons 5 2.1.1 Psychoacoustics 5 2.1.1.1 Auditory periphery 5 2.1.1.2 Localization 13 2.1.1.3 Other binaural phenomena 15 2.1.2 Physiology 17 2.1.2.1 Auditory periphery 17 2.1.2.2 Localization 26 2.1.3 Models 27 2.1.3.1 Auditory periphery 28 2.1.3.2 Localization 35 2.2 Hearing impaired persons 41 2.2.1 Pathology 41 2.2.2 Psychoacoustics 42 2.2.2.1 Auditory filters 42 2.2.2.2 Binaural processing 42 2.2.2.3 Speech perception 45 2.2.3 Models 47 2.3 Summary 48 2.3.1 Psychoacoustics 48 2.3.1.1 Auditory periphery 48 2.3.1.2 Grouping 49 2.3.1.3 Localization 50 2.3.2 Models 51 2.3.2.1 Auditory periphery 51 2.3.2.2 Localization 52 3 Psychoacoustical Experiments 53 3.1 Introduction 53 3.2 Method 54 3.2.1 Design 54 3.2.2 Subjects 55 3.2.3 Stimuli 55 3.2.4 Apparatus 60 3.2.5 Procedure 61 3.3 Results 62 3.3.1 Audibility 63 3.3.2 Localization 66 3.3.2.1 Localized/not localized 66 3.3.2.2 Correctly/mistakenly localized 66 3.3.2.3 Mistakenly localized 68 3.3.3 Summary 76 4 Model Building 79 4.1 Introduction 79 4.2 Design 80 4.2.1 Structure 80 4.2.2 Alternative models 82 4.2.2.1 Peripheral localization pathway 82 4.2.2.2 Component localization 83 4.2.2.3 Grouping 84 4.2.2.4 Cue combination 84 4.2.2.5 Assignment of direction 85 4.2.3 Detailed model description 85 4.2.3.1 Peripheral localization pathway 86 4.2.3.2 Component localization 90 4.2.3.3 Grouping 93 4.2.3.4 Cue combination 98 4.2.3.5 Assignment of direction 99 4.2.3.6 Example: /su/ from 180 degrees azimuth 103 5 Model Verification and Analysis 111 5.1 Verification of model 111 5.1.1 Static model components 111 5.1.1.1 Peripheral localization pathway 111 5.1.1.2 Interaural time difference 115 5.1.1.3 Grouping 118 5.1.2 Model behaviour 119 5.1.2.1 Audibility 120 5.1.2.2 Localized/not localized 120 5.1.2.3 Correctly/mistakenly localized 121 5.1.2.4 FBC localization errors 121 5.1.2.5 Non-FBC localization errors 123 5.1.2.6 Overall performance summary 125 5.2 Analysis of neural network weights 127 5.2.1 ITD to azimuth angle 128 5.2.2 Integration of frequency 129 6 Discussion and Conclusions 131 6.1 Discussion 131 6.1.1 Psychoacoustical data 131 6.1.2 Model data 132 6.1.3 Overall discussion 133 6.2 Conclusions 134 6.3 Pointers for future research 135 Appendices 137 A A Psychoacoustical Auditory Model 139 A.1 Motivation 139 A.2 Preliminary work 140 A.2.1 Model structure 140 A.2.2 Bootstrapping 141 A.3 Model description 143 A.4 Parameter fitting 144 A.5 Results 144 A.5.1 Linear results 145 A.5.2 Nonlinear results 146 A.6 Summary 149 B Spectrograms of Target Stimuli 151 C Detailed Statistics of Psychoacoustical Results 159 C.1 Audibility 159 C.2 Localization 162 C.2.1 Localized/not localized 162 C.2.2 Correctly/mistakenly localized 165 C.2.3 Mistakenly localized 169 C.2.3.1 FBC localization mistakes 169 C.2.3.2 Non-FBC localization mistakes 171 D Input-to-Hidden Layer Neural Network Weights 173 E The Assessment of a Ph.D.-thesis 179 E.1 Assessment committee 179 E.2 The task of the assessment committee 179 E.3 Acceptance of the Ph.D.-thesis for public defense 180 E.4 The Ph.D.-defense 180 E.5 Recommendation 181 -- _______________________________________________________________________ Brian Lykkegaard Karlsen phone: (+45) 9635 9885 Assistant Research Professor e-mail: mailto:blyk(at)acoustics.auc.dk http://www.acoustics.auc.dk/~blyk/ Department of Acoustics phone: (+45) 9635 8710 Aalborg University fax: (+45) 9815 2144 Fredrik Bajersvej 7 B4 e-mail: acoustics(at)acoustics.auc.dk DK-9220 Aalborg =D8, Denmark http://www.acoustics.auc.dk/ _______________________________________________________________________

This message came from the mail archive
http://www.auditory.org/postings/2000/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University