Subject: [AUDITORY] Audio captioning dataset From: Kostas Drosos <000000e20a593251-dmarc-request@xxxxxxxx> Date: Sun, 17 Nov 2019 13:44:57 +0100 List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>--Apple-Mail-815E701C-07A9-42B7-AEC8-5D79A9792552 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Dear list,=20 =3D=3D=3D Apologies for cross posting =3D=3D=3D We are happy to announce the release of Clotho, a novel and freely available= dataset for audio captioning, which consists of 4981 audio samples (focusin= g on general sounds) of 15 to 30 seconds duration and 24 905 captions of eig= ht to 20 words length.=20 Clotho is built with focus on audio content and caption diversity. All sound= s are from the Freesound platform, and captions are crowdsourced using Amazo= n Mechanical Turk and annotators from English speaking countries. Unique wor= ds, named entities, and speech transcription are removed with post-processin= g.=20 You can find Clotho online at Zenodo: https://zenodo.org/record/3490684=20 The paper that presents Clotho is on arXiv: https://arxiv.org/abs/1910.09387= =20 We also have realised code for handling the dataset: https://github.com/dr-c= ostas/clotho-baseline-dataset =E2=80=94 For those that are not familiar with audio captioning =E2=80=94=20= Audio captioning is the novel task of general audio content description usin= g free text. It is an intermodal translation task (not speech-to-text), wher= e a system accepts as an input an audio signal and outputs the textual descr= iption (i.e. the caption) of that signal.=20 Enjoy! Konstantinos Drossos,=20 Postdoc researcher, Audio Research Group Tampere University, Finland=20= --Apple-Mail-815E701C-07A9-42B7-AEC8-5D79A9792552 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D= utf-8"></head><body dir=3D"auto"><span style=3D"-webkit-text-size-adjust: au= to;">Dear list, </span><br style=3D"-webkit-text-size-adjust: auto;"><s= pan style=3D"-webkit-text-size-adjust: auto;"></span><br style=3D"-webkit-te= xt-size-adjust: auto;"><span style=3D"-webkit-text-size-adjust: auto;">=3D=3D= =3D Apologies for cross posting =3D=3D=3D</span><br style=3D"-webkit-text-si= ze-adjust: auto;"><span style=3D"-webkit-text-size-adjust: auto;"></span><br= style=3D"-webkit-text-size-adjust: auto;"><span style=3D"-webkit-text-size-= adjust: auto;">We are happy to announce the release of Clotho, a novel and f= reely available dataset for audio captioning, which consists of 4981 audio s= amples (focusing on general sounds) of 15 to 30 seconds duration and 24 905 c= aptions of eight to 20 words length. </span><br style=3D"-webkit-text-s= ize-adjust: auto;"><span style=3D"-webkit-text-size-adjust: auto;"></span><b= r style=3D"-webkit-text-size-adjust: auto;"><span style=3D"-webkit-text-size= -adjust: auto;">Clotho is built with focus on audio content and caption dive= rsity. All sounds are from the Freesound platform, and captions are crowdsou= rced using Amazon Mechanical Turk and annotators from English speaking count= ries. Unique words, named entities, and speech transcription are removed wit= h post-processing. </span><br style=3D"-webkit-text-size-adjust: auto;"= ><span style=3D"-webkit-text-size-adjust: auto;"></span><br style=3D"-webkit= -text-size-adjust: auto;"><span style=3D"-webkit-text-size-adjust: auto;"></= span><br style=3D"-webkit-text-size-adjust: auto;"><span style=3D"-webkit-te= xt-size-adjust: auto;">You can find Clotho online at Zenodo: <a href=3D= "https://zenodo.org/record/3490684" dir=3D"ltr" x-apple-data-detectors=3D"tr= ue" x-apple-data-detectors-type=3D"link" x-apple-data-detectors-result=3D"0"= style=3D"color: currentcolor;">https://zenodo.org/record/3490684</a> <= /span><br style=3D"-webkit-text-size-adjust: auto;"><span style=3D"-webkit-t= ext-size-adjust: auto;"></span><br style=3D"-webkit-text-size-adjust: auto;"= ><span style=3D"-webkit-text-size-adjust: auto;">The paper that presents Clo= tho is on arXiv: <a href=3D"https://arxiv.org/abs/1910.09387" dir=3D"lt= r" x-apple-data-detectors=3D"true" x-apple-data-detectors-type=3D"link" x-ap= ple-data-detectors-result=3D"1" style=3D"color: currentcolor;">https://arxiv= .org/abs/1910.09387</a> </span><br style=3D"-webkit-text-size-adjust: a= uto;"><span style=3D"-webkit-text-size-adjust: auto;"></span><br style=3D"-w= ebkit-text-size-adjust: auto;"><span style=3D"-webkit-text-size-adjust: auto= ;">We also have realised code for handling the dataset: <a href=3D"http= s://github.com/dr-costas/clotho-baseline-dataset" dir=3D"ltr" x-apple-data-d= etectors=3D"true" x-apple-data-detectors-type=3D"link" x-apple-data-detector= s-result=3D"2" style=3D"color: currentcolor;">https://github.com/dr-costas/c= lotho-baseline-dataset</a></span><br style=3D"-webkit-text-size-adjust: auto= ;"><span style=3D"-webkit-text-size-adjust: auto;"></span><br style=3D"-webk= it-text-size-adjust: auto;"><span style=3D"-webkit-text-size-adjust: auto;">= </span><br style=3D"-webkit-text-size-adjust: auto;"><span style=3D"-webkit-= text-size-adjust: auto;">=E2=80=94 For those that are not familiar with audi= o captioning =E2=80=94 </span><br style=3D"-webkit-text-size-adjust: au= to;"><span style=3D"-webkit-text-size-adjust: auto;">Audio captioning is the= novel task of general audio content description using free text. It is an i= ntermodal translation task (not speech-to-text), where a system accepts as a= n input an audio signal and outputs the textual description (i.e. the captio= n) of that signal. </span><br style=3D"-webkit-text-size-adjust: auto;"= ><span style=3D"-webkit-text-size-adjust: auto;"></span><br style=3D"-webkit= -text-size-adjust: auto;"><span style=3D"-webkit-text-size-adjust: auto;"></= span><br style=3D"-webkit-text-size-adjust: auto;"><span style=3D"-webkit-te= xt-size-adjust: auto;">Enjoy!</span><br style=3D"-webkit-text-size-adjust: a= uto;"><span style=3D"-webkit-text-size-adjust: auto;"></span><br style=3D"-w= ebkit-text-size-adjust: auto;"><span style=3D"-webkit-text-size-adjust: auto= ;">Konstantinos Drossos, </span><br style=3D"-webkit-text-size-adjust: a= uto;"><span style=3D"-webkit-text-size-adjust: auto;">Postdoc researcher, Au= dio Research Group</span><br style=3D"-webkit-text-size-adjust: auto;"><span= style=3D"-webkit-text-size-adjust: auto;">Tampere University, Finland = </span></body></html>= --Apple-Mail-815E701C-07A9-42B7-AEC8-5D79A9792552--