[AUDITORY] Audio captioning dataset (Kostas Drosos )

Subject: [AUDITORY] Audio captioning dataset From: Kostas Drosos <000000e20a593251-dmarc-request@xxxxxxxx> Date: Sun, 17 Nov 2019 13:44:57 +0100 List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY> --Apple-Mail-815E701C-07A9-42B7-AEC8-5D79A9792552 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Dear list,=20 =3D=3D=3D Apologies for cross posting =3D=3D=3D We are happy to announce the release of Clotho, a novel and freely available= dataset for audio captioning, which consists of 4981 audio samples (focusin= g on general sounds) of 15 to 30 seconds duration and 24 905 captions of eig= ht to 20 words length.=20 Clotho is built with focus on audio content and caption diversity. All sound= s are from the Freesound platform, and captions are crowdsourced using Amazo= n Mechanical Turk and annotators from English speaking countries. Unique wor= ds, named entities, and speech transcription are removed with post-processin= g.=20 You can find Clotho online at Zenodo: https://zenodo.org/record/3490684=20 The paper that presents Clotho is on arXiv: https://arxiv.org/abs/1910.09387= =20 We also have realised code for handling the dataset: https://github.com/dr-c= ostas/clotho-baseline-dataset =E2=80=94 For those that are not familiar with audio captioning =E2=80=94=20= Audio captioning is the novel task of general audio content description usin= g free text. It is an intermodal translation task (not speech-to-text), wher= e a system accepts as an input an audio signal and outputs the textual descr= iption (i.e. the caption) of that signal.=20 Enjoy! Konstantinos Drossos,=20 Postdoc researcher, Audio Research Group Tampere University, Finland=20= --Apple-Mail-815E701C-07A9-42B7-AEC8-5D79A9792552 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D= utf-8"></head><body dir=3D"auto">Dear list,  <s= pan style=3D"-webkit-text-size-adjust: auto;"> =3D=3D= =3D Apologies for cross posting =3D=3D=3D <br= style=3D"-webkit-text-size-adjust: auto;">We are happy to announce the release of Clotho, a novel and f= reely available dataset for audio captioning, which consists of 4981 audio s= amples (focusing on general sounds) of 15 to 30 seconds duration and 24 905 c= aptions of eight to 20 words length.  <b= r style=3D"-webkit-text-size-adjust: auto;">Clotho is built with focus on audio content and caption dive= rsity. All sounds are from the Freesound platform, and captions are crowdsou= rced using Amazon Mechanical Turk and annotators from English speaking count= ries. Unique words, named entities, and speech transcription are removed wit= h post-processing.  </= span> You can find Clotho online at Zenodo: <a href=3D= "https://zenodo.org/record/3490684" dir=3D"ltr" x-apple-data-detectors=3D"tr= ue" x-apple-data-detectors-type=3D"link" x-apple-data-detectors-result=3D"0"= style=3D"color: currentcolor;">https://zenodo.org/record/3490684</a> <= /span> The paper that presents Clo= tho is on arXiv: <a href=3D"https://arxiv.org/abs/1910.09387" dir=3D"lt= r" x-apple-data-detectors=3D"true" x-apple-data-detectors-type=3D"link" x-ap= ple-data-detectors-result=3D"1" style=3D"color: currentcolor;">https://arxiv= .org/abs/1910.09387</a>  We also have realised code for handling the dataset: <a href=3D"http= s://github.com/dr-costas/clotho-baseline-dataset" dir=3D"ltr" x-apple-data-d= etectors=3D"true" x-apple-data-detectors-type=3D"link" x-apple-data-detector= s-result=3D"2" style=3D"color: currentcolor;">https://github.com/dr-costas/c= lotho-baseline-dataset</a> = =E2=80=94 For those that are not familiar with audi= o captioning =E2=80=94  Audio captioning is the= novel task of general audio content description using free text. It is an i= ntermodal translation task (not speech-to-text), where a system accepts as a= n input an audio signal and outputs the textual description (i.e. the captio= n) of that signal.  </= span> Enjoy! Konstantinos Drossos,  Postdoc researcher, Au= dio Research Group <span= style=3D"-webkit-text-size-adjust: auto;">Tampere University, Finland = </body></html>= --Apple-Mail-815E701C-07A9-42B7-AEC8-5D79A9792552--

This message came from the mail archive
src/postings/2019/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University