[AUDITORY] Release of FSDnoisy18k: a dataset to investigate label noise in sound event classification

Subject: [AUDITORY] Release of FSDnoisy18k: a dataset to investigate label noise in sound event classification

From: Eduardo Fonseca <eduardo.fonseca@xxxxxxx>

Date: Thu, 10 Jan 2019 20:29:39 +0100

Arc-authentication-results: i=1; mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.102 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=upf.edu

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:to:subject:from:sender:reply-to:date:message-id :mime-version:approved-by; bh=w5YL4py1UxZOfUA5sqflvmRuOneTJkVJT0kb388hjiA=; b=cqwOBdPRnkI9GJ4zQhGns6iqcUs2tRr9Ess8t2lC2F74M58acIOKmcMj7o5AM67Ekv no5lW+U0Oxxb3m6ehv+zdf9/75X8o5gN0P5lJmn8YOZQjqegnS+kSbASt+odtfgn15o8 P0KcTN+jXk735bZDCmvlL4+rxQ1vKZizdS+FZ+9FZq2wIh963/TBkh3XEFxW3f5KN8tI Jpy3Iql2zrNCRntsNdl4Kk30P5DtScoUfgfduSNGoHUOo33dQKyF7Qal4XZ6e7zwcXkX FWlUJVnxZb/SFo9dA+NmCsGk1nvgD/qD9uaMu8ugnn2x8ofSgQjLne2Gik9EkQsaS3Ra 1pLA==

Arc-seal: i=1; a=rsa-sha256; t=1547184540; cv=none; d=google.com; s=arc-20160816; b=l0B+2/rho0oBYbPO4Cqlf3NoT9t0gBjEoxfjpIZBFqEBNRlGT1wi+ByQA+tHcalH+g /VERJxUyM/ry9SksreTiE3lZMoSdE26Z7o2IAenUD1ikFavRtXV+I8zPPmPyBo1CEzu1 eQglTUN8ni6bAQy2znEna3pRGa2IIkUHKxLB8tERZmPDMwMPAh5idkdhHMIXdDOK9ZkE 3zA+DHbRYToZnVuEfDQQ/wpd9MOg+DX21rbVwN4/gddHMhT5L+5zADe6Z2i7aFO/s9RL vtLrj+t0eR3UWZKjyUVFvDzwoIwju1r3t2R+nyuvDvycepYW91tkgBGGiKiPtr2mTD1l yaBA==

Authentication-results: mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.102 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=upf.edu

Delivered-to: dan.ellis@xxxxxxxxx

List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

Reply-to: Eduardo Fonseca <eduardo.fonseca@xxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

*** apologies for cross-posting ***

Dear list,

We're pleased to announce the release of FSDnoisy18k, an open dataset to foster the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.

The dataset is released as part of our publication:

Learning Sound Event Classifiers from Web Audio with Noisy Labels
E. Fonseca, M. Plakal, D. P. W. Ellis, F. Font, X. Favory, and X. Serra.
arXiv preprint arXiv:1901.01189, 2019

where we present the dataset and a CNN baseline system. We show that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data. We also show that noise-robust loss functions can be effective in improving performance in presence of corrupted labels.

FSDnoisy18k dataset: http://www.eduardofonseca.net/FSDnoisy18k/
Source code is available: https://github.com/edufonseca/icassp19

We hope you find these resources useful!

Thanks!

Eduardo

Eduardo Fonseca
Music Technology Group
Universitat Pompeu Fabra