[AUDITORY] Deadline extended: 4th COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSEC-4)

Dear all (with apologies for any cross-postings),

We are running a fourth edition of the COG-MHEAR International Audio-Visual Speech Enhancement Challenge (AVSEC-4) as a Satellite Workshop of Interspeech 2025 in Rotterdam, on 16th August 2025 (http://challenge.cogmhear.org)

The Audio-Visual Speech Enhancement Challenge (AVSEC) established the first benchmark in the field, providing a common framework for the evaluation of audio-visual speech enhancement and separation systems.

Building upon three successful editions of the Challenge (SLT 2022, ASRU 2023 and Interspeech 2024), AVSEC-4 aims to further advance system performance, create opportunities to reflect on the scope and limitations of current audio-visual speech technologies, and help transform the future of multimodal assistive hearing and speech communication systems.

As in previous editions of the challenge, systems will be ranked based on the results of listening tests with human participants.

In addition to a carefully curated audio-visual dataset, we provide facial landmarks for the train/dev datasets.

A new baseline model for AVSEC-4 has been released along with scripts for objective evaluation. Baseline models of previous AVSEC editions are also available.

This year's evaluation dataset includes an additional 'out-of-domain' corpus involving a small group, free-flowing conversation with a hearing-aid user in the loop.

To register for the challenge and access the AVSEC-4 dataset please follow the guidelines on the website: https://challenge.cogmhear.org

AVSEC scripts are available here: https://github.com/cogmhear/avse_challenge

Results - including prizes, generously sponsored by Sonova, for both winners and runners-up of the three evaluation tracks (regular, low-latency and out-of-domain) - will be announced at the AVSEC-4 Satellite Workshop during Interspeech 2025.

Important dates:

21st March 2025: Release of training and development data.
2nd April 2025: Release of low-latency baseline system.
6th June 2025: Evaluation data release.
9th June 2025: Leaderboard open for submissions.
12th June 2025: Paper submission opens.
24th June 2025: Additional "out-of-domain" evaluation corpus released.
(Extended) 7th July 2025: Deadline for Challenge submissions and one-page system description submission.
(Extended) 11th July 2025: Workshop paper submission closes.
14th July 2025: Early acceptance notification.
23rd July 2025: Early release of evaluation results.
1^st August 2025: camera-ready paper.

AVSEC-4 Workshop proceedings:

We invite prospective authors to submit, for peer review, either 2-page extended abstracts or 4-6 page full-papers, following the Interspeech 2025 paper template.

As a follow-on to the IEEE Journal of Selected Topics in Signal Processing (JSTSP) special issue organised as part of AVSEC-3 Workshop (which is currently in press), we plan to invite extended AVSEC-4 Workshop papers for submission to a new special issue (details to be confirmed).

We welcome Workshop submissions from participants of both AVSEC-4 as well as previous editions: AVSEC-2 and AVSEC-3. Papers are also welcome from researchers not participating in the Challenge but interested in related Workshop topics, including (but not limited to):

Low-latency approaches to audio-visual speech enhancement and separation.
Human auditory-inspired models of multi-modal speech perception and enhancement.
Energy-efficient audio-visual speech enhancement and separation methods.
Machine learning for diverse target listeners and diverse listening scenarios.
Audio quality & intelligibility assessment of audio-visual speech enhancement systems.
Objective metrics to predict quality & intelligibility from audio-visual stimuli.
Understanding human speech perception in competing speaker scenarios in real world and virtual environments.
Clinical applications of audio-visual speech enhancement and separation, (e.g. multi-modal hearing assistive technologies for hearing-impaired listeners).
Accessibility and human-centric factors in the design and evaluation of innovative multimodal technologies, including multimodal corpus development, public perceptions, ethics considerations, standards, societal, economic and political impacts.

The call for papers is available here: https://challenge.cogmhear.org/#/getting-started/call-for-papers

Workshop registration:

Workshop registration costs:

Regular/Retiree (ISCA Member and Non-member) registration: €40 EUR
Student (ISCA Member and Non-member) registration: €25 EUR

Further information about the workshop registration process is available on the Challenge website and also via Interspeech: https://www.interspeech2025.org/registration

We look forward to seeing you in Rotterdam.

AVSEC organising team

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.