Multi-functional neural speech and audio coding
Speech and Audio coding is one of the critical technologies in real-time communication. Traditional coding methods (i.e., signal processing-based ones) mostly rely on physical sound perception and production models as well as basic digital signal processing principles. Recently, deep learning and artificial intelligent (AI) based speech synthesis and audio compression methods were developed. In comparison with the SP-based methods, AI-based approaches bring more possibilities for audio compressioon and are able to achieve better performance with higher compression efficiency. However, the AI-based method (e.g., the neural speech and audio coding) still suffers from certain problems including but not limited to robustness and high computational complexity, which have attracted the attention from many academic and industrial organizations and researchers.
This session proposal aims to collect new ideas and developments in neural coding techniques, including low bitrate and low latency neural coding. We are also looking for new solutions that enable the neural codec to work with different functions such as packet loss concealment, noise reduction, voice conversion, TTS, audio band extension, and AIGC-related topics, etc.
Therefore, we propose to apply a special session in INTERSPEECH 2024. Please feel free to contact us if you have interest to contribute to this special session.
Organizers
Wei XIAO (denniswxiao@xxxxxxxxxxx),
Tencent Ethereal Audio Lab
Prof. Jing WANG (wangjing@xxxxxxxxxx),
Beijing Institute of Technology
Prof. Jingdong CHEN, IEEE Fellow (jingdongchen@xxxxxxxx),
Center of Intelligent Acoustics and Immersive Communications,
Northwestern Polytechnical University
Xuan ZHU (xuan.zhu@xxxxxxxxxxx),
Samsung Research China - Beijing