[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AUDITORY] Audio captioning dataset

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: [AUDITORY] Audio captioning dataset
From: Kostas Drosos <000000e20a593251-dmarc-request@xxxxxxxxxxxxxxx>
Date: Sun, 17 Nov 2019 13:44:57 +0100
Approved-by: kostas.drosos@xxxxxxxxxx
Arc-authentication-results: i=1; mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.101 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mcgill.ca
Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:to:subject:from:sender:reply-to:date:message-id :mime-version:content-transfer-encoding:approved-by; bh=1b+S95iqWJ1e8sYH8K8fDPD16T60omDeZLVsKqL5Cu8=; b=dzVomQXfa5MJPDw59RtI4wdW4fzlpHcdZseL1kXP9EDKPZe0UqHAhd4YscAh19P9k8 i6eGayK1Ng4DE1TwLNXDS0eaNN8JeiqTgIlc6ITzuHKRbFnKsQwjEjldrVP/EBaCf9GJ hHxlEDym7jmxVnWt6b240P/ZEbtAYGssrLe+z2uf/oa6J56RAwBcGXozsCum2KZRWCBW 0V9feeSABzART6erSD2xY6pfzKmC6wtLa/DHVEi3UPeyQ/MeoHWvdvzQ4RYKX0HR3lxE YRSWQzsullU27Vx47ll5BPQt3ln+aFBZ+nhIeQYDf6LRHF2vlKUBUjqFn+TCosjWTiqj Yarg==
Arc-seal: i=1; a=rsa-sha256; t=1574055756; cv=none; d=google.com; s=arc-20160816; b=ApqKnvp8l6wh4J6FM+IzhCGueLjdwKe01ZQ3b2zPupjar9fJz3M8GRjOhbde+raHhq N937lIqmZFUaL3tTa+7uUyLVay6Z1wZefOA2ju7DpsiRQKzwdp3s+CvXGeOij8w7eq5u HxpwGRWTOlD6UleOxkxUWBFtS0eLrrA2tw03OglhfpfaZDF3KR+HcIClAwgCxZRsVTsV ND/zfqP5/xK9Pm/kJmlx9tU/Bnp+K81i/nL7Y3f0wGv/3Sg4+FS2C6ISEbZDP+gwulvn XOqdYiuck6QxMwgsUQKc/ELJRcPpDrQ22iaQ8kGL4oapLgjjKOaWhOArkm9VwjgRH9WU 4vRw==
Authentication-results: mx.google.com; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.101 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mcgill.ca
Delivered-to: dan.ellis@xxxxxxxxx
List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
Reply-to: Kostas Drosos <kostas.drosos@xxxxxxxxxx>
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Dear list,

=== Apologies for cross posting ===

We are happy to announce the release of Clotho, a novel and freely available dataset for audio captioning, which consists of 4981 audio samples (focusing on general sounds) of 15 to 30 seconds duration and 24 905 captions of eight to 20 words length.

Clotho is built with focus on audio content and caption diversity. All sounds are from the Freesound platform, and captions are crowdsourced using Amazon Mechanical Turk and annotators from English speaking countries. Unique words, named entities, and speech transcription are removed with post-processing.

You can find Clotho online at Zenodo: https://zenodo.org/record/3490684

The paper that presents Clotho is on arXiv: https://arxiv.org/abs/1910.09387

We also have realised code for handling the dataset: https://github.com/dr-costas/clotho-baseline-dataset

— For those that are not familiar with audio captioning —
Audio captioning is the novel task of general audio content description using free text. It is an intermodal translation task (not speech-to-text), where a system accepts as an input an audio signal and outputs the textual description (i.e. the caption) of that signal.

Enjoy!

Konstantinos Drossos,
Postdoc researcher, Audio Research Group
Tampere University, Finland

Prev by Date: [AUDITORY] Research Scientist / PhD Student (TV-L E 13, 65 %) Position at Osnabrück University
Next by Date: [AUDITORY] Lecture at BRAMS this Wednesday - electroacoustic aural training
Previous by thread: [AUDITORY] Intenship offer [Orange Labs]: Convolutional neural networks for sound source localization
Next by thread: [AUDITORY] Lecture at BRAMS this Wednesday - electroacoustic aural training
Index(es):
- Date
- Thread