[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Postdoc position at IRISA, Rennes, France
Dear list,
We are seeking to recruit a postdoctoral researcher on the statistical
modeling of multichannel audio, applied to speaker segmentation and
separation (full subject below). The successful candidate will work
under the supervision of Drs. Guillaume Gravier and Emmanuel Vincent, in
the METISS group at IRISA, which possesses a newly-equipped room
dedicated to the exploration of future meeting environments.
Prospective candidates should have a background in multichannel signal
processing or in speech processing and hold a PhD for less than one year
or being about to obtain one. Informal enquiries may be made to Emmanuel
Vincent (emmanuel.vincent@xxxxxxxx) or Guillaume Gravier
(guillaume.gravier@xxxxxxxx).
This appointment is for 2 years, starting summer or fall 2007. Salary
will be at 28000 euros per annum. Applications must be submitted online
before march 31st at
http://www.inria.fr/travailler/opportunites/postdoc/postdoc.en.html
Joint statistical modeling of spectral, temporal and spatial audio
features, applied to speaker segmentation and separation
Most audio signals represent complex sound scenes consisting of several
overlapping sources (speakers, natural sounds, musical instruments).
These sources are usually located at different spatial positions and
exhibit different spectro-temporal characteristics. The processing of
such documents involves several challenging tasks, such as the
separation, the segmentation and more generally the description of each
source.
Existing description algorithms are mostly designed for one-microphone
recordings and rely on statistical modeling of spectral features. Yet,
in many application environments, multiple microphones are available
thus providing valuable spatial information. Beamforming algorithms are
then typically employed to determine at each instant the number of
sources and their locations based on spatial features. These algorithms
can improve the detection of overlapping sources. However their
robustness decreases for small microphone arrays or with moving sources.
The goal of this project is to define a unified statistical modeling
framework for the joint exploitation of spectral, temporal and spatial
information in multichannel audio signals. Dynamic state-based models
offer a promising approach for the description of some extracted
spectral and spatial features as a function of some hidden states
associated with different sources and positions. A first stage of the
project could consist of extending the state-of-the-art one-microphone
segmentation model developed in our lab (based on GMMs) by incorporating
spatial features obtained from classical source localization and
separation techniques (e.g. ICA, DUET, beamforming).
The proposed framework will be primarily applied to speaker segmentation
and separation, which is the task of finding out the structure of a
speech recording according to the question "who spoke when and where"
and to extract the signal of each speaker. The results will be evaluated
on meeting data recorded by small microphone arrays. Data from the NIST
meeting evaluation will be used along with data recorded at our lab in a
room dedicated to the exploration of future meeting environments.
--
Emmanuel Vincent
METISS Project
IRISA-INRIA
Campus de Beaulieu, 35042 Rennes cedex, France
Phone: +332 9984 7227 - Fax: +332 9984 7171
Web: http://www.irisa.fr/metiss/members/evincent/