[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Postdoctoral position: deep neural networks for source separation and noise-robust ASR



(Apologies for any cross-posting - Please forward to anyone that may be interested)

POSTDOCTORAL POSITION

*SUBJECT*: Deep neural networks for source separation and noise-robust ASR
*LAB*: PAROLE team, Inria Nancy, France
*SUPERVISORS*: Antoine Liutkus (antoine.liutkus@xxxxxxxx) and Emmanuel Vincent (emmanuel.vincent@xxxxxxxx)
*START*: between November 2014 and January 2015
*DURATION*: 12 to 16 months
*TO APPLY*: apply online before June 10 at http://www.inria.fr/en/institute/recruitment/offers/post-doctoral-research-fellowships/post-doctoral-research-fellowships/campaign-2014/%28view%29/details.html?nPostingTargetID=13790 (earlier application is preferred)

Inria is the biggest European public research institute dedicated to computer science. The PAROLE team in INRIA Nancy, France, gathers 20+ speech scientists with a growing focus on speech enhancement and noise-robust speech recognition exemplified by the organization of the CHiME Challenge [1] and ISCA's Robust Speech Processing SIG [2].

The boom of speech interfaces for handheld devices requires automatic speech recognition (ASR) system to deal with a wide variety of acoustic conditions. Recent research has shown that Deep Neural Networks (DNNs) are very promising for this purpose. Most approaches now focus on clean, single-source conditions [3]. Despite a few attempts to employ DNNs for source separation [4,5,6], conventional source separation techniques such as [7] still outperform DNNs in real-world conditions involving multiple noise sources [8]. The proposed postdoctoral position aims to overcome this gap by incorporating the benefits of conventional source separation techniques into DNNs. This includes for instance the ability to exploit multichannel data and different characteristics for separation and for ASR. Performance will be assessed over readily available real-world noisy speech corpora such as CHiME [1].

Prospective candidates should have defended a PhD in 2013 or defend a PhD in 2014 in the area of speech processing, machine learning, signal processing or applied statistics. Proficient programming in Matlab, Python or C++ is necessary. Practice of DNN/ASR software (Theano, Kaldi) would be an asset.

[1] http://spandh.dcs.shef.ac.uk/chime_challenge/

[2] https://wiki.inria.fr/rosp/

[3] G. Hinton, L. Deng, D. Yu, G. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition", IEEE Signal Processing Magazine, 2012.

[4] S.J. Rennie, P. Fousek, and P.L. Dognin, "Factorial Hidden Restricted Boltzmann Machines for noise robust speech recognition", in Proc. ICASSP, 2012.

[5] A.L. Maas, T.M. O’Neil, A.Y. Hannun, and A.Y. Ng, "Recurrent neural network feature enhancement: The 2nd CHiME Challenge", in Proc. CHiME, 2013.

[6] Y. Wang and D. Wang. "Towards scaling up classification-based speech separation”, IEEE Transactions on Audio, Speech and Language Processing, 2013.

[7] A. Ozerov, E. Vincent, and F. Bimbot, "A general flexible framework for the handling of prior information in audio source separation", IEEE Transactions on Audio, Speech and Language Processing, 2012.

[8] J. Barker, E. Vincent, N. Ma, H. Christensen, and P. Green, "The PASCAL CHiME Speech Separation and Recognition Challenge", Computer Speech and Language, 2013.