ANNOUNCEMENT: ShATR - A Corpus for Auditory Scene Analysis (Brian Karlsen )


Subject: ANNOUNCEMENT: ShATR - A Corpus for Auditory Scene Analysis
From:    Brian Karlsen  <B.Karlsen(at)DCS.SHEF.AC.UK>
Date:    Mon, 11 Dec 1995 17:27:11 GMT

ShATR - A Corpus for Auditory Scene Analysis -------------------------------------------- Welcome to the first full release of the ShATR CDROM! WHAT IS ShATR? -------------- ShATR is to our knowledge the first multi-simultaneous-speaker corpus, primarily intended to provide acoustic material for studies in auditory scene analysis. WHAT'S ON THE CDROM? -------------------- The CDROM contains around 500 Mb of compressed sound files, together with transcriptions down to the phone level, and some utilities to allow access to the data. - ACOUSTIC MATERIAL The recordings are organised into two parts. First, there are the enrollment recordings - material collected *individually* from each of the 5 participants. Second, there are the recordings made during the crossword solving session itself. We expect the enrollment material to be useful for training/adapting speech/speaker recognition systems (although there is insufficient material to train a full-blown speech recognition system from scratch). The crossword task provides material for auditory scene analysis systems. During this 37 minute session, participants worked in pairs to solve two different crosswords. A fifth participant, who had the crossword solutions, acted as a hint-giver. Eight microphones were used to collect acoustic material in this task: 5 individual microphones, an omnidirectional sensor (pressure zone microphone) and a binaurally-wired mannekin. All recordings were made directly on to digital audio tape at a sampling rate of 48 kHz. - TRANSCRIPTIONS Complete transcriptions of the crossword session have been made. They are organised into four levels * structural (what was going on in the recording) * orthographic (both sentences and word-endpointed) * phone level * non-speech events and processes (e.g. coughs, chairs scraping etc.) A limited amount of transcription of the enrollment material has been done. - SOFTWARE Two utilities are included that allow selective access to the ShATR corpus. The first is a search engine which represents a convenient interface to the transcriptions. The second is a sound extraction facility which allows specified parts of the corpus to be extracted from the CD. findsegs A multiple source environment is a busy place! Users will wish to find portions of the crossword task with interesting overlapping speech events. For example, users may wish to locate all simultaneous vowel segments. The 'findsegs' utility is designed to simplify such complex multi-channel searches. The source code (in C) is provided on a public-domain basis. sndextrc All sound files on ShATR are losslessly compressed using 'shorten' - Tony Robinson's (Cambridge University Engineering Department) public-domain compression program. To speed up decompression, the individual sensor recordings are split into chunks of 1 minute duration. The purpose of the 'sndextrc' utility is to make it easy to extract specified segments (for example, the results of a run of findsegs) from such compressed sound files. It is recommended that *all* access to the ShATR data is made via sndextrc - any other mechanism would have to take account of the compression and splitting of session files and would end up looking like sndextrc. The sndextrc facility can perform multiple extractions and is pretty fast. The source code (in C) is provided on a public-domain basis. HOW DO I ORDER? --------------- The price of this CDROM will be GBP 30. This will just cover the costs of materials used in producing the CDROM. This is a very good offer if you compare the price to what you have to pay for other large speech corpora. You can order the CDROM from: Speech and Hearing Research Group Department of Computer Science University of Sheffield Regents Court 211 Portobello Street Sheffield S1 4DP U.K. We accept foreign bank drafts in GBP as well as cheques drawn on a UK bank branch. The bank draft or cheque should be made payable to the University of Sheffield, code no. 560009 account no. 00021326 (National Westminister Bank, Sheffield City Branch, 42 High Street, Sheffield S1 1QG, U.K., University of Sheffield account no. 1). Delivery time will in most cases be less than two weeks, depending on stock and shipment. WHY ShATR? ---------- ShATR is an acronym for Sh(effield University)/A(dvanced) T(elecommunications) R(esearch Institute International). The reason for this is that the corpus was planned and transcribed by us here at Sheffield, but it was recorded at ATR in Kyoto, Japan with the use of some of their equipment and the help of several people at ATR. FURTHER INFORMATION ------------------- For more information about ShATR have a look at ShATR's homepage: http://www.dcs.shef.ac.uk/groups/research/spandh/ShATR/ShATR.html On this page you can also find links to the papers about ShATR, which describe the corpus in more detail. On behalf of the Speech and Hearing Research Group, I apologise if you receive this message more than once. Thank you for your time. Brian Karlsen ------------------------------------------------------------------------------ Brian Lykkegaard Karlsen | Speech and Hearing Research Group Research Associate | Department of Computer Science | University of Sheffield E-mail: B.Karlsen(at)dcs.shef.ac.uk | Regent Court http://www.dcs.shef.ac.uk/~brian | 211 Portobello Street Phone: (+44) (0)114 282 5562 | Sheffield S1 4DP Fax: (+44) (0)114 278 0972 | U.K. ------------------------------------------------------------------------------


This message came from the mail archive
http://www.auditory.org/postings/1995/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University