*** apologies for cross-posting ***
Dear list,
We're pleased to announce the release of
FSDnoisy18k, an open dataset to foster the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.
The dataset is released as part of our publication:
Learning Sound Event Classifiers from Web Audio with Noisy LabelsE. Fonseca, M. Plakal, D. P. W. Ellis, F. Font, X. Favory, and X. Serra.
arXiv preprint arXiv:1901.01189, 2019
where we present the dataset and a CNN baseline system. We show that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data. We also show that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.
FSDnoisy18k dataset:
http://www.eduardofonseca.net/FSDnoisy18k/Source code is available:
https://github.com/edufonseca/icassp19We hope you find these resources useful!
Thanks!
Eduardo