[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Tech report on location-based segregation
Dear Colleagues,
It is my pleasure to announce the availability of the following
technical report.
Thanks for your attention,
Nicoleta Roman
************************************
"Speech segregation based on sound localization", Technical Report #16,
June 2002.
Department of Computer and Information Science
The Ohio State University
Nicoleta Roman, The Ohio State University
DeLiang Wang, The Ohio State University
Guy J. Brown, University of Sheffield
*************************************
Abstract
---------
At a cocktail party, we can selectively attend to a single voice and
filter out all the other acoustical interferences. How to simulate this
perceptual ability remains a great challenge. This paper describes a
novel machine learning approach to speech segregation, in which a target
speech signal is separated from interfering sounds using spatial
location cues: interaural time differences (ITD) and interaural
intensity differences (IID). The auditory masking effect motivates the
notion of an “ideal” time-frequency binary mask, which selects the
target if it is stronger than the interference in a local time-frequency
(T-F) unit. We observe that within a narrow frequency band,
modifications to the relative strength of the target source with respect
to the interference trigger systematic deviations for ITD and IID. For a
given spatial configuration, this interaction produces characteristic
clustering in the binaural feature space. Consequently, we perform
pattern classification in order to estimate ideal binary masks. A
systematic evaluation shows that the resulting system produces masks
very close to ideal binary ones, and gives a significant improvement in
performance over an existing approach, as quantified by changes in
signal-to-noise ratio before and after segregation.
**************************************
The manuscript is available for download at:
ftp://ftp.cis.ohio-state.edu/pub/tech-report/2002/TR16.pdf
Related sound demos can be found at:
http://www.cis.ohio-state.edu/~niki/soundemo.html
A preliminary version of this work is included in the Proceedings of
2002 ICASSP.