[AUDITORY] Announcing the first COG-MHEAR Audio-visual Speech Enhancement Challenge (AVSEC)

Dear all (please help share with colleagues)

We are pleased to announce the launch of the first COG-MHEAR Audio-visual Speech Enhancement Challenge (AVSEC) - http://challenge.cogmhear.org

Participants will work on a large dataset derived from TED talks to enhance speech in extremely challenging noisy environments and with competing speakers. The performance will be evaluated using human listening tests as well as with objective measures. We hope that the Challenge will create a benchmark for AVSEC research that will be useful for years to come. The challenge data and development tools are now available - for details see the challenge website: https://challenge.cogmhear.org/#/ and our github repository: https://github.com/cogmhear/avse_challenge

AVSEC has been accepted as an official challenge at the IEEE Spoken Language Technology (SLT) Workshop (https://slt2022.org/) to be held in Doha, Qatar, 9-12 Jan 2023, where a special session will be run.

Important Dates

1st May 2022: Challenge website launch

31st May 2022: Release of the full toolset, training/development data and baseline system

1 June 2022: Registration for challenge entrants opens

25th July 2022: Evaluation data released

1st Sept 2022: Submission deadline for evaluation (by objective and subjective measures)

9th Jan 2023: Results announced at IEEE SLT 2022

Background:

Human performance in everyday noisy situations is known to be dependent upon both aural and visual senses that are contextually combined by the brain’s multi-level integration strategies. The multimodal nature of speech is well established, with listeners known to unconsciously lip-read to improve the intelligibility of speech in a real noisy environment. It has been shown that the visual aspect of speech has a potentially strong impact on the ability of humans to focus their auditory attention on a particular stimulus.

The aim of the first AVSEC is to bring together the wider computer vision, hearing and speech research communities to explore novel approaches to multimodal speech-in-noise processing. Both raw and pre-processed AV datasets – derived from TED talk videos – will be made available to participants for training and development of audio-visual models to perform speech enhancement and speaker separation at SNR levels that will be significantly more challenging than those typically used in audio-only scenarios. Baseline neural network models and a training recipe will be provided.

In addition to participation at IEEE SLT, Challenge participants will be invited to contribute to a Journal Special Issue on the topic of Audio-Visual Speech Enhancement that will be announced early next year.

Further information:

If you are interested in participating and wish to receive further information, please sign up here: https://challenge.cogmhear.org/#/getting-started/register

If you have questions, contact us directly at: cogmhear@xxxxxxxxxxxx

Organising Team:

Amir Hussain, Edinburgh Napier University, UK (co-Chair)

Peter Bell, University of Edinburgh, UK (co-Chair)

Mandar Gogate, Edinburgh Napier University, UK

Cassia Valentini Botinhao, University of Edinburgh, UK

Kia Dashtipour, Edinburgh Napier University, UK

Lorena Aldana, University of Edinburgh, UK

Evaluation Panel Chair: John Hansen, University of Texas in Dallas, USA

Scientific Committee Chair: Michael Akeroyd, University of Nottingham, UK

Industry co-ordinator: Peter Derleth, Sonova AG

Funded by the UK Engineering and Physical Sciences Research Council (EPSRC) programme grant: COG-MHEAR (http://cogmhear.org )

Supported by RNID (formerly Action on Hearing Loss), Deaf Scotland, Sonova AG

Professor Amir Hussain

School of Computing,

Edinburgh Napier University, Scotland, UK

E-mail: A.Hussain@xxxxxxxxxxxx

This message and its attachment(s) are intended for the addressee(s) only and should not be read, copied, disclosed, forwarded or relied upon by any person other than the intended addressee(s) without the permission of the sender. If you are not the intended addressee you must not take any action based on this message and its attachment(s) nor must you copy or show them to anyone. Please respond to the sender and ensure that this message and its attachment(s) are deleted.

It is your responsibility to ensure that this message and its attachment(s) are scanned for viruses or other defects. Edinburgh Napier University does not accept liability for any loss or damage which may result from this message or its attachment(s), or for errors or omissions arising after it was sent. Email is not a secure medium. Emails entering Edinburgh Napier University's system are subject to routine monitoring and filtering by Edinburgh Napier University.

Edinburgh Napier University is a registered Scottish charity. Registration number SC018373

BSL users can contact us via contactSCOTLAND-BSL, the on-line British Sign Language interpreting service. Find out more on the contactSCOTLAND website.

[AUDITORY] Announcing the first COG-MHEAR Audio-visual Speech Enhancement Challenge (AVSEC) - as part of IEEE SLT 2022