[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ph.D. dissertation available



Dear list,

Since Al Bregman would like to see some more dissertation announcements, I've
decided to let you know where you can get a copy of my dissertation. The final
version of it was submitted in December 1999 to the Faculty of Science and
Engineering at Aalborg University, Denmark.

The title is "Spatial localization of speech segments" and it concerns
localization of speech segments under adverse (speech-shaped) noise conditions.
This is where you can get it:

        http://www.acoustics.auc.dk/~blyk/karlsen_phd.zip

The file format is zipped postscript (9.8 Mb). I also have a limited number of
hardcopies, which I would be happy to send to you, if you prefer this - just
send me an email at mailto:blyk@acoustics.auc.dk . Abstract and table of
contents follows.

-----------------------------------------------------------------------
Spatial Localization of Speech Segments
Brian L. Karlsen
Center for PersonKommunikation (now: Department of Acoustics)
Aalborg University
Denmark

Much is known about human localization of simple stimuli like
sinusoids, clicks, broadband noise and narrowband noise in quiet. Less
is known about human localization in noise. Even less is known about
localization of speech and very few previous studies have reported
data from localization of speech in noise.

This study attempts to answer the question: ``Are there certain
features of speech which have an impact on the human ability to
determine the spatial location of a speaker in the horizontal plane
under adverse noise conditions?´´. The study consists of an extensive
literature survey on Psychoacoustics, Physiology of Hearing and
Computational Hearing looking at both normal hearing and hearing
impairment, as well as a psychoacoustical localization experiment in
the horizontal plane of space and a localization model built in order
to attempt to explain some of the processes involved when humans do
the task of the experiment.

The psychoacoustical experiment used naturally-spoken Danish
consonant-vowel combinations as targets presented in diffuse
speech-shaped noise at a peak SNR of -10 dB. The subjects were normal
hearing persons. The experiment took place in an anechoic chamber
where eight loudspeakers were suspended so that they surrounded the
subjects in the horizontal plane. The subjects were required to push a
button on a pad indicating where they had localized the target to in
the horizontal plane. The response pad had twelve buttons arranged
uniformly in a circle and two further buttons so that the subjects
could indicate if they had not heard the target or if they had heard
it, but could not localize it.

The model consists of three overall parts. One part assigns
directional estimates to time-frequency components on the basis of
interaural time difference and front/back templates. Another part does
grouping by weighting the time-frequency components of the target on
the basis of the partial specific loudness. Finally, the information
of these two parts is combined, integrated across time, converted to
azimuth angles and integrated across frequency to yield a probability
distribution of which azimuth angle the target is likely to have
originated from. The model is trained on the experimental data.

On the basis of the experimental results, it is concluded that the
human ability to localize speech segments in adverse noise depends on
the speech segment as well as its point of origin in space. A
comparison between the experimental data and the data produced by the
model for the same stimuli reveals that the model is capable of
reproducing the overall structure of the experimental data. This may
indicate that the overall structure of the model is on the right
track.

----------------------------------------------------------------------
Table of Contents

Abstract  v


Dansk Resume  vii


Acknowledgments  ix


Preface  xi


1 Introduction  1

1.1 The field  1
1.2 This study  2


2 Literature Survey  5

2.1 Normal hearing persons  5
2.1.1 Psychoacoustics  5
2.1.1.1 Auditory periphery  5
2.1.1.2 Localization  13
2.1.1.3 Other binaural phenomena  15
2.1.2 Physiology  17
2.1.2.1 Auditory periphery  17
2.1.2.2 Localization  26
2.1.3 Models  27
2.1.3.1 Auditory periphery  28
2.1.3.2 Localization  35

2.2 Hearing impaired persons  41
2.2.1 Pathology  41
2.2.2 Psychoacoustics  42
2.2.2.1 Auditory filters  42
2.2.2.2 Binaural processing  42
2.2.2.3 Speech perception  45
2.2.3 Models  47

2.3 Summary  48
2.3.1 Psychoacoustics  48
2.3.1.1 Auditory periphery  48
2.3.1.2 Grouping  49
2.3.1.3 Localization  50
2.3.2 Models  51
2.3.2.1 Auditory periphery  51
2.3.2.2 Localization  52


3 Psychoacoustical Experiments  53

3.1 Introduction  53

3.2 Method  54
3.2.1 Design  54
3.2.2 Subjects  55
3.2.3 Stimuli  55
3.2.4 Apparatus  60
3.2.5 Procedure  61

3.3 Results  62
3.3.1 Audibility  63
3.3.2 Localization  66
3.3.2.1 Localized/not localized  66
3.3.2.2 Correctly/mistakenly localized  66
3.3.2.3 Mistakenly localized  68
3.3.3 Summary  76


4 Model Building  79

4.1 Introduction  79

4.2 Design  80
4.2.1 Structure  80
4.2.2 Alternative models  82
4.2.2.1 Peripheral localization pathway  82
4.2.2.2 Component localization  83
4.2.2.3 Grouping  84
4.2.2.4 Cue combination  84
4.2.2.5 Assignment of direction  85
4.2.3 Detailed model description  85
4.2.3.1 Peripheral localization pathway  86
4.2.3.2 Component localization  90
4.2.3.3 Grouping  93
4.2.3.4 Cue combination  98
4.2.3.5 Assignment of direction  99
4.2.3.6 Example: /su/ from 180 degrees azimuth  103


5 Model Verification and Analysis  111

5.1 Verification of model  111
5.1.1 Static model components  111
5.1.1.1 Peripheral localization pathway  111
5.1.1.2 Interaural time difference  115
5.1.1.3 Grouping  118
5.1.2 Model behaviour  119
5.1.2.1 Audibility  120
5.1.2.2 Localized/not localized  120
5.1.2.3 Correctly/mistakenly localized  121
5.1.2.4 FBC localization errors  121
5.1.2.5 Non-FBC localization errors  123
5.1.2.6 Overall performance summary  125

5.2 Analysis of neural network weights  127
5.2.1 ITD to azimuth angle  128
5.2.2 Integration of frequency  129


6 Discussion and Conclusions  131

6.1 Discussion  131
6.1.1 Psychoacoustical data  131
6.1.2 Model data  132
6.1.3 Overall discussion  133

6.2 Conclusions  134

6.3 Pointers for future research  135


Appendices  137


A   A Psychoacoustical Auditory Model  139

A.1 Motivation  139

A.2 Preliminary work  140
A.2.1 Model structure  140
A.2.2 Bootstrapping  141

A.3 Model description  143

A.4 Parameter fitting  144

A.5 Results  144
A.5.1 Linear results  145
A.5.2 Nonlinear results  146

A.6 Summary  149


B Spectrograms of Target Stimuli  151


C Detailed Statistics of Psychoacoustical Results  159

C.1 Audibility  159

C.2 Localization  162
C.2.1 Localized/not localized  162
C.2.2 Correctly/mistakenly localized  165
C.2.3 Mistakenly localized  169
C.2.3.1 FBC localization mistakes  169
C.2.3.2 Non-FBC localization mistakes  171


D Input-to-Hidden Layer Neural Network Weights  173


E The Assessment of a Ph.D.-thesis  179

E.1 Assessment committee  179

E.2 The task of the assessment committee  179

E.3 Acceptance of the Ph.D.-thesis for public defense  180

E.4 The Ph.D.-defense  180

E.5 Recommendation  181


--
_______________________________________________________________________

Brian Lykkegaard Karlsen        phone:  (+45) 9635 9885
Assistant Research Professor    e-mail: mailto:blyk@acoustics.auc.dk
                                http://www.acoustics.auc.dk/~blyk/

Department of Acoustics         phone:  (+45) 9635 8710
Aalborg University              fax:    (+45) 9815 2144
Fredrik Bajersvej 7 B4          e-mail: acoustics@acoustics.auc.dk
DK-9220 Aalborg Ø, Denmark      http://www.acoustics.auc.dk/
_______________________________________________________________________