[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AUDITORY] Call for participation to the ICME2025 Audio Encoder Challenge



The IEEE International Conference on Multimedia & Expo (ICME) 2025 Audio Encoder Capability Challenge
Overview
The ICME 2025 Audio Encoder Capability Challenge, hosted by Xiaomi, University of Surrey, and Dataocean AI, aims to rigorously evaluate audio encoders in real-world downstream tasks.
This challenge imposes NO restrictions on model size or the scale of training data, and training based on existing pre-trained models is allowed.
Participants are invited to submit pre-trained encoders that convert raw audio waveforms into continuous embeddings. These encoders will undergo comprehensive testing across diverse tasks spanning speech, environmental sounds, and music. The evaluation will emphasize real-world usability and leverage an open-source evaluation system.
Participants are welcome to independently test and optimize their models. However, the final rankings will be determined based on evaluations conducted by the organizers.

Registration
To participate, registration is required. Please complete the registration form before April 1, 2025. Note that this does not means the challenge starts on April 1, 2025. The challenge begins on February 7, 2025.
For any other information about registration, please send Email to: 2025icme-aecc@xxxxxxxxxxxxxxx
Submission
  1. Clone the audio encoder template from the GitHub repository.
  2. Implement your own audio encoder following the instructions in README.md within the cloned repository. The implementation must pass all checks in audio_encoder_checker.py provided in the repository.
  3. Before the submission deadline, April 30, 2025, email the following files to the organizers at 2025icme-aecc@xxxxxxxxxxxxxxx:
  • a ZIP file containing the complete repository
  • a technical report paper (PDF format) not exceeding 6 pages describing your implementation
The pre-trained model weights can either be included in the ZIP file or downloaded automatically from external sources (e.g., Hugging Face) during runtime. If choosing the latter approach, please implement the automatic downloading mechanism in your encoder implementation.
While there are no strict limitations on model size, submitted models must be able to be run successfully in a Google Colab T4 environment, where the runtime is equipped with a 16 GB NVIDIA Tesla T4 GPU, 12GB RAM.
More details can be found from the following webpage:

Thanks for your attention. Sorry for cross-posting. 

Best wishes,
 
Wenwu
 
 
--
Wenwu Wang

Professor of Signal Processing and Machine Learning,
Centre for Vision Speech and Signal Processing (CVSSP)

Associate Head of External Engagement, 
School of Computer Science and Electronic Engineering

AI Fellow,
Surrey Institute for People Centred AI

University of Surrey
Guildford, GU2 7XH
United Kingdom
Phone: +44 (0) 1483 686039
Fax: +44 (0) 1483 686031
Email: w.wang@xxxxxxxxxxxx