Subject: [AUDITORY] Release of FSDnoisy18k: a dataset to investigate label noise in sound event classification From: Eduardo Fonseca <eduardo.fonseca@xxxxxxxx> Date: Thu, 10 Jan 2019 20:29:39 +0100 List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>--0000000000002d380e057f1f9a8c Content-Type: text/plain; charset="UTF-8" *** apologies for cross-posting *** Dear list, We're pleased to announce the release of *FSDnoisy18k <http://www.eduardofonseca.net/FSDnoisy18k/>*, an open dataset to foster the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. The dataset is released as part of our publication: Learning Sound Event Classifiers from Web Audio with Noisy Labels <https://arxiv.org/abs/1901.01189> E. Fonseca, M. Plakal, D. P. W. Ellis, F. Font, X. Favory, and X. Serra. arXiv preprint arXiv:1901.01189, 2019 where we present the dataset and a CNN baseline system. We show that training with large amounts of noisy data can outperform training with smaller amounts of carefully-labeled data. We also show that noise-robust loss functions can be effective in improving performance in presence of corrupted labels. FSDnoisy18k dataset: http://www.eduardofonseca.net/FSDnoisy18k/ Source code is available: https://github.com/edufonseca/icassp19 We hope you find these resources useful! Thanks! Eduardo -- Eduardo Fonseca Music Technology Group Universitat Pompeu Fabra -- --0000000000002d380e057f1f9a8c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr">*** apologies for cross-posting ***<br><b= r>Dear list, <br><br>We're pleased to announce the release of <b><a hre= f=3D"http://www.eduardofonseca.net/FSDnoisy18k/">FSDnoisy18k</a></b>, an op= en dataset to foster the investigation of label noise in sound event classi= fication. It contains 42.5 hours of audio across 20 sound classes, includin= g a small amount of manually-labeled data and a larger quantity of real-wor= ld noisy data.<br><br>The dataset is released as part of our publication:<b= r><br><a href=3D"https://arxiv.org/abs/1901.01189">Learning Sound Event Cla= ssifiers from Web Audio with Noisy Labels</a><br>E. Fonseca, M. Plakal, D. = P. W. Ellis, F. Font, X. Favory, and X. Serra.<br>arXiv preprint arXiv:1901= .01189, 2019<br><br>where we present the dataset and a CNN baseline system.= We show that training with large amounts of noisy data can outperform trai= ning with smaller amounts of carefully-labeled data. We also show that nois= e-robust loss functions can be effective in improving performance in presen= ce of corrupted labels.<br><br>FSDnoisy18k dataset: <a href=3D"http://www.e= duardofonseca.net/FSDnoisy18k/">http://www.eduardofonseca.net/FSDnoisy18k/<= /a><br>Source code is available: <a href=3D"https://github.com/edufonseca/i= cassp19">https://github.com/edufonseca/icassp19</a><br><br>We hope you find= these resources useful!<br><br>Thanks!<br><br>Eduardo</div><div dir=3D"ltr= "><br clear=3D"all"><div><div dir=3D"ltr" class=3D"gmail_signature"><div di= r=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"= ><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><di= v dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"= ltr">--</div><div dir=3D"ltr"><font size=3D"1">Eduardo Fonseca<br>Music Tec= hnology Group<br>Universitat Pompeu Fabra</font><div><span style=3D"color:r= gb(0,0,0)"><br></span></div><div><span style=3D"color:rgb(255,255,255)">--<= /span><br></div><br></div></div></div></div></div></div></div></div></div><= /div></div></div></div></div></div></div></div></div></div></div></div></di= v></div></div></div> --0000000000002d380e057f1f9a8c--