Apologies for cross-posting: PhD position in Multiple sound source tracking with deep learning, M/F Ref : 2022-10750 | 31 Mar 2022 4 rue du clos Courtel 35510 CESSON SEVIGNE - France About the role Your role is to develop neural network-based methods for an accurate, causal, and low-latency tracking, capable of simultaneously estimating the trajectories
of multiple speech sources. Owing to progress in deep learning, speech recognition has shown a great momentum in recent years. Nevertheless, the accuracy of speech recognition engines
degrades in adverse acoustic conditions, due to noise, reverberation, and interfering speech sources, emphasising the requirement for a pre-processing, speech enhancement stage. This is usually achieved by the combination of a microphone array and beamforming,
which aims at preserving the sound coming from the target source direction, while attenuating the rest. Therefore, knowing the Direction-of-Arrival (DoA) of the useful speech signal with respect to the microphone array is a mandatory pre-requisite. In practice,
the DoA of a target source is estimated by a source localization algorithm, which can provide only instantaneous, noisy, and unlabeled DoA estimates. Such observations are inadequate for beamforming, which requires source trajectories, corresponding to the
positions along time of each target source. The goal of the tracking algorithm is to assemble these trajectories from the raw DoA observations. Tracking multiple speech sources in real-life environments is known to be a notoriously challenging problem, for several reasons. First, due to acoustics
- particularly, noise and reflections off the surfaces of the environment (walls, floor, furniture, …). The latter can bias the DoA estimates, and even produce false observations. Second, the observations are usually intermittent, whether due to the intermittent
nature of the source itself (speech for instance), or because, during certain period, a source may be masked by a stronger one. Recovering such trajectory is equivalent to « re-identification » of a speech source. Finally, due to application requirements:
since the speech recognition system often runs in real-time, the tracker needs to be causal, i.e., it should exploit only present and past information. The goal of this PhD thesis is to devise a multisource tracking system based entirely, or in part, on deep learning methods. The tracker would integrate the
speaker counting and DoA estimates, obtained from highly efficient detection and localization modules, already developed in a recently defended PhD thesis. Furthermore, these features will be augmented by speaker neural embeddings, with the goal of improving
the tracking of intermittent sources. Finally, the complete system could be unified into a single neural network architecture, hence enabling the end-to-end training. About you Skills and qualities required: Research Master or engineering school · Background in signal processing and/or machine learning applied to audio and acoustics · Appetite for audio processing · In-depth knowledge of Python and Bash · Hands-on experience with deep learning toolkits, such as PyTorch, Tensorflow, Kaldi ·. Scientific rigor and creativity Additional information The practical outcome of the thesis would be the full processing chain comprised of counting, localization and tracking modules running in real-time on a standard
PC (or an embedded device). The subject of this thesis, at the boundary of microphone array processing and speech recognition, associated with the deep learning approach currently in huge progression, guarantee that the conducted works will be recognized both
by academic and industrial communities. To achieve this task, the candidate will have access to different equipment to create database and test its developments: room equipped with 30 loudspeakers
and ICARE software to simulate moving sources immersed in realistic and complex acoustic environments. Concerning the implementation task, the PhD student could be led to interact with the team in charge of developing prototypes and integrating localizing algorithms
in VST plugins for real-time visualization. Department Orange Innovation brings together the research and innovation activities and expertise of the Group's entities and countries. We work every day to ensure that
Orange is recognized as an innovative operator by its customers and we create value for the Group and the Brand in each of our projects. With 720 researchers, thousands of marketers, developers, designers and data analysts, it is the expertise of our 6,000
employees that fuels this ambition every day. Orange Innovation anticipates technological breakthroughs and supports the Group's countries and entities in making the best technological choices to meet
the needs of our consumer and business customers. The team CVA (Content Audio Video) counts around 20 people, mostly permanent researchers, and PhD students, focused on signal processing and machine learning
technologies for audio and video. The focus of the audio group is on microphone array processing, 3D-audio rendering and multichannel compression in connection with the definition and implementation of international standards in the domain (MPEG, 3GPP). _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. |