[AUDITORY] Announcing SOUNDATA: A Python library for reproducible use of audio datasets

Subject: [AUDITORY] Announcing SOUNDATA: A Python library for reproducible use of audio datasets

From: Justin Salamon <000000b4a42fd03d-dmarc-request@xxxxxxxxxxxxxxx>

Date: Wed, 3 Nov 2021 00:26:38 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=adobe.com; dmarc=pass action=none header.from=adobe.com; dkim=pass header.d=adobe.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=RGnlD9uO7mfEs7KM2+KJcZmOv3fXEGpi+H9pFf/E2UM=; b=MTHfowfSl1PxYci7xXnrcMAXA0XL6XOWH/a2erMi8sFI0l3Q01HRI0MNZghn//8UvOguKqa8q4j4kgnHJVwKZR6Bpz8Q6hDFeE97pgpeDTuQump3DflvPoBV1rCrJME1fmvXYcMSK9iz9c5Uqp1zxPc7/DUDii0w1rQcddUCdtjW6I5U/XEgwWDbCMzy9Bff9JMUBCNmlBkWaGUvTgKyWP0263RhIXmx3wdYoHiH931Om/66UCXCP6VM9q5W6KiXP8wN3T/tNrjXgF/TJufCTGHp9d1lPX4xMTBndaobXQvca4BHAK1ef4oo8KepBpTToESbFmkPU/V5f8bxpCQ5Vw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Eg/mV72LEVhtB4Q4HYpJzjXAdxQa8CsX8XqOrYNtu3UWbzRWtSm1LRpgPn8kTEg0YLStAIJrwVwhB3/JpZWY1ZkbhXGxapOR3kyhHitiJkRpPtZAHpDkWf61FVjdLNf/TQCAD6SJ2fhLj4/5Nt50AcV5I8KPBRhp9XO2PL17pDzCrsUrxe8Ptwt4T2QBz/9sgbHz/ICM0z9rvXhgftBZMbedPTZV3b/+vkEZ+1vOvq6RT5VueP+YFO8jGO8aqjtGPCr/Hf5q94rNtTEblBp+o6P8amD6TAtN/9cpw6GNa6aSv797+is8hruaSp/R1M1ceJ/bhE0DJviCbs7yYjsHjQ==

Authentication-results: mx.google.com; arc=fail (signature failed); spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mcgill.ca

Comments: cc: Magdalena Fuentes <mgfuenteslujambio@xxxxxxxxx>

Delivered-to: dan.ellis@xxxxxxxxx

List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

Reply-to: Justin Salamon <salamon@xxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Suggested_attachment_session_id: 4923df28-f9d6-b72c-a72a-8acc7a4ced5a

Thread-index: AQHX0ElZUGl8xEYdjku5gewRbcH8hQ==

Thread-topic: Announcing SOUNDATA: A Python library for reproducible use of audio datasets

*** apologies for any cross-postings ***

Dear colleagues,

We’re excited to announce the release of soundata, a python library for reproducible use of audio datasets.

Soundata can be installed via: pip install soundata

The source code lives here: https://github.com/soundata/soundata

We’re launching with 14 popular environmental sound datasets, with plans to continue expanding with additional datasets spanning a range of audio domains including speech and bioacoustics. For music datasets see mirdata, which was the inspiration for soundata.

Soundata makes it easy to:

Download datasets to a common location and format
Validate that a downloaded dataset is complete and perfectly matches a canonical version
Load audio and annotation files into a common format
Parse clip-level metadata for detailed evaluations

We hope soundata will help the community to:

Ensure results are reproducible by working against exactly the same data
Save time by avoiding manual downloads and having to write custom dataset parsers
Automate large-scale download, training, and evaluation pipelines
Increase the visibility of new datasets by adding them to soundata

Soundata is a cross-organizational collaboration spanning researchers from MARL@NYU, Adobe Research, MTG@UPF, and GPA@UdelaR.

You can learn more about the library on our docs page: https://soundata.readthedocs.io/

A bit more about the motivation for soundata can be found in our (work in progress) paper:

"Soundata: A Python library for reproducible use of audio datasets"

Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Plaja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello

[arXiv]

We *welcome and encourage* contributions from the community, especially data loaders for datasets not included yet in soundata.

Cheers,

Justin & Magdalena on behalf of the soundata team

Justin Salamon | Adobe Research | www.justinsalamon.com