[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Ph.D. dissertation available
Dear list,
Since Al Bregman would like to see some more dissertation announcements, I've
decided to let you know where you can get a copy of my dissertation. The final
version of it was submitted in December 1999 to the Faculty of Science and
Engineering at Aalborg University, Denmark.
The title is "Spatial localization of speech segments" and it concerns
localization of speech segments under adverse (speech-shaped) noise conditions.
This is where you can get it:
http://www.acoustics.auc.dk/~blyk/karlsen_phd.zip
The file format is zipped postscript (9.8 Mb). I also have a limited number of
hardcopies, which I would be happy to send to you, if you prefer this - just
send me an email at mailto:blyk@acoustics.auc.dk . Abstract and table of
contents follows.
-----------------------------------------------------------------------
Spatial Localization of Speech Segments
Brian L. Karlsen
Center for PersonKommunikation (now: Department of Acoustics)
Aalborg University
Denmark
Much is known about human localization of simple stimuli like
sinusoids, clicks, broadband noise and narrowband noise in quiet. Less
is known about human localization in noise. Even less is known about
localization of speech and very few previous studies have reported
data from localization of speech in noise.
This study attempts to answer the question: ``Are there certain
features of speech which have an impact on the human ability to
determine the spatial location of a speaker in the horizontal plane
under adverse noise conditions?´´. The study consists of an extensive
literature survey on Psychoacoustics, Physiology of Hearing and
Computational Hearing looking at both normal hearing and hearing
impairment, as well as a psychoacoustical localization experiment in
the horizontal plane of space and a localization model built in order
to attempt to explain some of the processes involved when humans do
the task of the experiment.
The psychoacoustical experiment used naturally-spoken Danish
consonant-vowel combinations as targets presented in diffuse
speech-shaped noise at a peak SNR of -10 dB. The subjects were normal
hearing persons. The experiment took place in an anechoic chamber
where eight loudspeakers were suspended so that they surrounded the
subjects in the horizontal plane. The subjects were required to push a
button on a pad indicating where they had localized the target to in
the horizontal plane. The response pad had twelve buttons arranged
uniformly in a circle and two further buttons so that the subjects
could indicate if they had not heard the target or if they had heard
it, but could not localize it.
The model consists of three overall parts. One part assigns
directional estimates to time-frequency components on the basis of
interaural time difference and front/back templates. Another part does
grouping by weighting the time-frequency components of the target on
the basis of the partial specific loudness. Finally, the information
of these two parts is combined, integrated across time, converted to
azimuth angles and integrated across frequency to yield a probability
distribution of which azimuth angle the target is likely to have
originated from. The model is trained on the experimental data.
On the basis of the experimental results, it is concluded that the
human ability to localize speech segments in adverse noise depends on
the speech segment as well as its point of origin in space. A
comparison between the experimental data and the data produced by the
model for the same stimuli reveals that the model is capable of
reproducing the overall structure of the experimental data. This may
indicate that the overall structure of the model is on the right
track.
----------------------------------------------------------------------
Table of Contents
Abstract v
Dansk Resume vii
Acknowledgments ix
Preface xi
1 Introduction 1
1.1 The field 1
1.2 This study 2
2 Literature Survey 5
2.1 Normal hearing persons 5
2.1.1 Psychoacoustics 5
2.1.1.1 Auditory periphery 5
2.1.1.2 Localization 13
2.1.1.3 Other binaural phenomena 15
2.1.2 Physiology 17
2.1.2.1 Auditory periphery 17
2.1.2.2 Localization 26
2.1.3 Models 27
2.1.3.1 Auditory periphery 28
2.1.3.2 Localization 35
2.2 Hearing impaired persons 41
2.2.1 Pathology 41
2.2.2 Psychoacoustics 42
2.2.2.1 Auditory filters 42
2.2.2.2 Binaural processing 42
2.2.2.3 Speech perception 45
2.2.3 Models 47
2.3 Summary 48
2.3.1 Psychoacoustics 48
2.3.1.1 Auditory periphery 48
2.3.1.2 Grouping 49
2.3.1.3 Localization 50
2.3.2 Models 51
2.3.2.1 Auditory periphery 51
2.3.2.2 Localization 52
3 Psychoacoustical Experiments 53
3.1 Introduction 53
3.2 Method 54
3.2.1 Design 54
3.2.2 Subjects 55
3.2.3 Stimuli 55
3.2.4 Apparatus 60
3.2.5 Procedure 61
3.3 Results 62
3.3.1 Audibility 63
3.3.2 Localization 66
3.3.2.1 Localized/not localized 66
3.3.2.2 Correctly/mistakenly localized 66
3.3.2.3 Mistakenly localized 68
3.3.3 Summary 76
4 Model Building 79
4.1 Introduction 79
4.2 Design 80
4.2.1 Structure 80
4.2.2 Alternative models 82
4.2.2.1 Peripheral localization pathway 82
4.2.2.2 Component localization 83
4.2.2.3 Grouping 84
4.2.2.4 Cue combination 84
4.2.2.5 Assignment of direction 85
4.2.3 Detailed model description 85
4.2.3.1 Peripheral localization pathway 86
4.2.3.2 Component localization 90
4.2.3.3 Grouping 93
4.2.3.4 Cue combination 98
4.2.3.5 Assignment of direction 99
4.2.3.6 Example: /su/ from 180 degrees azimuth 103
5 Model Verification and Analysis 111
5.1 Verification of model 111
5.1.1 Static model components 111
5.1.1.1 Peripheral localization pathway 111
5.1.1.2 Interaural time difference 115
5.1.1.3 Grouping 118
5.1.2 Model behaviour 119
5.1.2.1 Audibility 120
5.1.2.2 Localized/not localized 120
5.1.2.3 Correctly/mistakenly localized 121
5.1.2.4 FBC localization errors 121
5.1.2.5 Non-FBC localization errors 123
5.1.2.6 Overall performance summary 125
5.2 Analysis of neural network weights 127
5.2.1 ITD to azimuth angle 128
5.2.2 Integration of frequency 129
6 Discussion and Conclusions 131
6.1 Discussion 131
6.1.1 Psychoacoustical data 131
6.1.2 Model data 132
6.1.3 Overall discussion 133
6.2 Conclusions 134
6.3 Pointers for future research 135
Appendices 137
A A Psychoacoustical Auditory Model 139
A.1 Motivation 139
A.2 Preliminary work 140
A.2.1 Model structure 140
A.2.2 Bootstrapping 141
A.3 Model description 143
A.4 Parameter fitting 144
A.5 Results 144
A.5.1 Linear results 145
A.5.2 Nonlinear results 146
A.6 Summary 149
B Spectrograms of Target Stimuli 151
C Detailed Statistics of Psychoacoustical Results 159
C.1 Audibility 159
C.2 Localization 162
C.2.1 Localized/not localized 162
C.2.2 Correctly/mistakenly localized 165
C.2.3 Mistakenly localized 169
C.2.3.1 FBC localization mistakes 169
C.2.3.2 Non-FBC localization mistakes 171
D Input-to-Hidden Layer Neural Network Weights 173
E The Assessment of a Ph.D.-thesis 179
E.1 Assessment committee 179
E.2 The task of the assessment committee 179
E.3 Acceptance of the Ph.D.-thesis for public defense 180
E.4 The Ph.D.-defense 180
E.5 Recommendation 181
--
_______________________________________________________________________
Brian Lykkegaard Karlsen phone: (+45) 9635 9885
Assistant Research Professor e-mail: mailto:blyk@acoustics.auc.dk
http://www.acoustics.auc.dk/~blyk/
Department of Acoustics phone: (+45) 9635 8710
Aalborg University fax: (+45) 9815 2144
Fredrik Bajersvej 7 B4 e-mail: acoustics@acoustics.auc.dk
DK-9220 Aalborg Ø, Denmark http://www.acoustics.auc.dk/
_______________________________________________________________________