MULTIPLE SIMULTANEOUS SPEAKER CORPUS - a proposal (Martin Cooke )


Subject: MULTIPLE SIMULTANEOUS SPEAKER CORPUS - a proposal
From:    Martin Cooke  <M.Cooke(at)DCS.SHEFFIELD.AC.UK>
Date:    Mon, 5 Jul 1993 16:11:47 GMT

MULTIPLE SIMULTANEOUS SPEAKER CORPUS - a proposal The Speech, Hearing and Language Group in Sheffield is about to embark upon the collection of a multiple simultaneous speaker corpus. The purpose of this mailing is to solicit suggestions and expressions of interest in the project. Possible uses for this corpus include studies in: * sound localisation * speaker identification * source segregation * dialogue/discourse analysis * intention processing * prosodic analysis * spoken language understanding * audio-visual analyses * automatic speech recognition * ...<** suggestions **> Background: Our requirement is a corpus of 'lightly-annotated' sampled data with which to evaluate our own work in sound source separation (the hearing people) and intention analysis (the NLP people). To this end, we are interested in environments in which several acoustic sources are present at the same time. One such scenario is exposed by considering the task of minute-taking. Proposed scenario: Five people will engage in a task-oriented discussion along with an artificial secretary (actually, just a mannekin equipped for binaural recording). The task will be chosen to provide not only the likelihood of much overlapped acoustic material, but also to be a realistic topic of discourse for researchers in NLP. We welcome suggestions for task domains. Outline requirements <** comments please **> A. Collection 1. The data will be collected in acoustically controlled conditions, with reverberation etc commensurate with an average-sized meetings room. 2. Eight channels of data will be recorded: 1 for each of five speakers, 2 binaural from a mannekin and 1 from a centrally-placed omnidirectional microphone. 3. A video recording of the session will be made. 4. A small amount of speech data will be collected from each speaker individually. B. Analysis 1. Individual microphone signals subjected to semi-automatic endpoint detection. 2. Orthographic transcription of each individual signal (and of any extraneous material picked up at the other receptors). 3. Each 'unit of discourse' may be tagged with speaker intention and other high-level descriptors. 4. <** suggestions **> C. Mastering/Distribution To make available (at cost) to the speech, hearing and natural language community the following on CD-ROMs: the 8 channels of sampled data, compressed video signal and whatever transcriptions exist. Our ideas are rather further developed than this deliberately loose description; however, we feel that the corpus may be of sufficient interest to warrant suggestions at this early stage of design. We have funding to satisfy our own requirements; ideally, however, we'd like to broaden the project so that it finds wider use in the community. If you can offer advice, assistance or analysis or would like to be involved in this project, we'd love to hear from you! Rough timetable: requirements analysis (now until end September) data collection (to complete by end '93) analysis (basic stuff to be complete by end Q2 '94) Martin Cooke pp Guy Brown, Malcolm Crawford, Phil Green, Paul Mc Kevitt and Yorick Wilks. email: m.cooke(at)dcs.shef.ac.uk fax: +44 742 780972 (circulated to auditory, ear-mail, salt, elsnet, ectl; please forward this to any other relevant lists)


This message came from the mail archive
http://www.auditory.org/postings/1993/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University