[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tech report on location-based segregation

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Tech report on location-based segregation
From: Nicole Roman <niki@xxxxxxxxxxxxxxxxxx>
Date: Wed, 17 Jul 2002 14:43:28 -0400
Delivery-date: Wed Jul 17 14:57:14 2002
Reply-to: Nicole Roman <niki@xxxxxxxxxxxxxxxxxx>
Sender: AUDITORY Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Dear Colleagues,

It is my pleasure to announce the availability of the following
technical report.

Thanks for your attention,

Nicoleta Roman

************************************
"Speech segregation based on sound localization", Technical Report #16,
June 2002.

Department of Computer and Information Science
The Ohio State University

Nicoleta Roman, The Ohio State University
DeLiang Wang, The Ohio State University
Guy J. Brown, University of Sheffield
*************************************

Abstract
---------
At a cocktail party, we can selectively attend to a single voice and
filter out all the other acoustical interferences. How to simulate this
perceptual ability remains a great challenge. This paper describes a
novel machine learning approach to speech segregation, in which a target

speech signal is separated from interfering sounds using spatial
location cues: interaural time differences (ITD) and interaural
intensity differences (IID). The auditory masking effect motivates the
notion of an “ideal” time-frequency binary mask, which selects the
target if it is stronger than the interference in a local time-frequency

(T-F) unit. We observe that within a narrow frequency band,
modifications to the relative strength of the target source with respect

to the interference trigger systematic deviations for ITD and IID. For a

given spatial configuration, this interaction produces characteristic
clustering in the binaural feature space. Consequently, we perform
pattern classification in order to estimate ideal binary masks. A
systematic evaluation shows that the resulting system produces masks
very close to ideal binary ones, and gives a significant improvement in
performance over an existing approach, as quantified by changes in
signal-to-noise ratio before and after segregation.
**************************************

The manuscript is available for download at:

    ftp://ftp.cis.ohio-state.edu/pub/tech-report/2002/TR16.pdf

Related sound demos can be found at:

    http://www.cis.ohio-state.edu/~niki/soundemo.html

A preliminary version of this work is included in the Proceedings of
2002 ICASSP.

Prev by Date: Re: Announcement - Auditory Perception Toolbox v1.0 release
Next by Date: matlab pitch/stretch
Previous by thread: Hair Cell Models
Next by thread: matlab pitch/stretch
Index(es):
- Date
- Thread