[AUDITORY] Releasing FSD50K: an open dataset of human-labeled sound events with over 100h of audio

Subject: [AUDITORY] Releasing FSD50K: an open dataset of human-labeled sound events with over 100h of audio

From: Eduardo Fonseca <eduardo.fonseca@xxxxxxx>

Date: Fri, 2 Oct 2020 20:06:34 +0200

Arc-authentication-results: i=1; mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.102 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=upf.edu

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:to:subject:from:sender:reply-to:date:message-id :mime-version:approved-by; bh=dNTissg1qMo0jH8pAESaGvg+k85xuaOMIuxE/VIv7Zo=; b=NukH54JVDdCc5Vrn9VHG6GLV3VNn/QThdCwfyeh7qASgb1+wnf0a5DhG4Zb2UZZtOW H0f1na08nWfiyETPn5TWv6w8Cn4mK/FbP3wnumcFtxcNSJIYhjZOKSKyE5CQ4bTdg16o EFCaMnumVT0ueaGwJ5606ae7DvevcXDYRwr7oiOirvnaUynlUyojI0n1/sH/FRvoSYFx k1G9TEogv8XyZAI3r0/99puLqWySC10Ghn02XICBTj7Mc6C1I7DH5y90/KhfdMmL7cOQ egC7UoVvZhk6BKX2ccdrhA0r1QysDjlw7Yh56gRkVaFerDvgMpo2J2+X4jqJ4zrJOQ6d 94Wg==

Arc-seal: i=1; a=rsa-sha256; t=1601698491; cv=none; d=google.com; s=arc-20160816; b=dBqX5RK5NJlxHAMWpY0OHimelNcVgxvoqtD3bnO4Al5FK8CtT2VB6tkMrTr5EmzhPo P+Eh2Z8vi/RQGjD9zxVYmtYZRemMAiTDpALUia+80KYPKaIbmbRdRmB3CGKaK3n+fGhf LSSutWflzhny5GQpW1NyH70cJEexsZgdSzlc1D2clgI8Ula8DsNZQLbID4Ja/NnLkTW0 SkNmVhSpxxRFT4DEAg18MScz5C1OE73iixCxrKd8aiWhEqbK4Ok+8Tyzw+OFRzH4fjeT SoB6HlDiAoE+B+pPTDrIkZ/Y+V8JQHEyPNGBTgikKoSqHL3hZgrAJXkbQBs2w5fodkik Ce9w==

Authentication-results: mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.102 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=upf.edu

Delivered-to: dan.ellis@xxxxxxxxx

List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

Reply-to: Eduardo Fonseca <eduardo.fonseca@xxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

=== Apologies for cross-posting ===

Dear list,

We’re glad to announce the release of FSD50K, the new open dataset of human-labeled sound events. FSD50K contains over 51k Freesound audio clips, totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. To our knowledge, this is the largest fully-open dataset of human-labeled sound events, and modestly the second largest after AudioSet.

FSD50K's most important characteristics:

FSD50K contains 51,197 audio clips from Freesound, totalling 108.3 hours of multi-labeled audio
The dataset encompasses 200 sound classes hierarchically organized with a subset of the AudioSet Ontology, allowing development and evaluation of large-vocabulary machine listening methods
The audio content is composed mainly of sound events produced by physical sound sources, including human sounds, sounds of things, animals, natural sounds, musical instruments and more
The acoustic material has been manually labeled using the Freesound Annotator platform
Clips are of variable length (0.3 to 30s), and ground truth labels are provided at the clip-level (i.e., weak labels)
All clips are provided as uncompressed PCM 16 bit 44.1 kHz mono audio files
The dataset is split into a development set (41k clips / 80h, in turn split into train and validation) and an evaluation set (10k clips / 28h)
In addition to audio clips and ground truth, additional metadata is made available (including raw annotations, sound predominance ratings, Freesound metadata, and more), allowing a variety of sound event research tasks
All these resources are licensed under Creative Commons licenses, which allow sharing and reuse

FSD50K dataset: http://doi.org/10.5281/zenodo.4060432

Paper documenting dataset creation, characterization and experiments: Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv:2010.00475, 2020

Companion site (where you can explore the audio content of the dataset): https://annotator.freesound.org/fsd/release/FSD50K/

Code for baseline experiments (to be released soon): https://github.com/edufonseca/FSD50K_baseline

Also, we will soon publish a blog post. Stay up-to-date about FSD50K by subscribing to the freesound-annotator Google Group. We hope all these resources are useful for the community! FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra, Barcelona. This effort was kindly sponsored by two Google Faculty Research Awards 2017 and 2018.

Cheers,

Eduardo on behalf of the Freesound Datasets team

Eduardo Fonseca
Music Technology Group
Universitat Pompeu Fabra