[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AUDITORY] Release of FSDnoisy18k: a dataset to investigate label noise in sound event classification



*** apologies for cross-posting ***

Dear list,

We're pleased to announce the release of FSDnoisy18k, an open dataset to foster the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.

The dataset is released as part of our publication:

Learning Sound Event Classifiers from Web Audio with Noisy Labels
E. Fonseca, M. Plakal, D. P. W. Ellis, F. Font, X. Favory, and X. Serra.
arXiv preprint arXiv:1901.01189, 2019

where we present the dataset and a CNN baseline system. We show that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data. We also show that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.

FSDnoisy18k dataset: http://www.eduardofonseca.net/FSDnoisy18k/
Source code is available: https://github.com/edufonseca/icassp19

We hope you find these resources useful!

Thanks!

Eduardo

--
Eduardo Fonseca
Music Technology Group
Universitat Pompeu Fabra

--