[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ANNOUNCEMENT: ShATR - A Corpus for Auditory Scene Analysis

         ShATR - A Corpus for Auditory Scene Analysis

Welcome to the first full release of the ShATR CDROM!


ShATR is to our knowledge the first multi-simultaneous-speaker corpus,
primarily intended to provide acoustic material for studies in auditory scene


The CDROM contains around 500 Mb of compressed sound files, together with
transcriptions down to the phone level, and some utilities to allow access to
the data.


The recordings are organised into two parts. First, there are the enrollment
recordings - material collected *individually* from each of the 5
participants. Second, there are the recordings made during the crossword
solving session itself. We expect the enrollment material to be useful for
training/adapting speech/speaker recognition systems (although there is
insufficient material to train a full-blown speech recognition system from

The crossword task provides material for auditory scene analysis systems.
During this 37 minute session, participants worked in pairs to solve two
different crosswords. A fifth participant, who had the crossword solutions,
acted as a hint-giver. Eight microphones were used to collect acoustic material
in this task: 5 individual microphones, an omnidirectional sensor (pressure
zone microphone) and a binaurally-wired mannekin. All recordings were made
directly on to digital audio tape at a sampling rate of 48 kHz.


Complete transcriptions of the crossword session have been made. They are
organised into four levels

* structural (what was going on in the recording)
* orthographic (both sentences and word-endpointed)
* phone level
* non-speech events and processes (e.g. coughs, chairs scraping etc.)

A limited amount of transcription of the enrollment material has been done.


Two utilities are included that allow selective access to the ShATR corpus.
The first is a search engine which represents a convenient interface to the
transcriptions. The second is a sound extraction facility which allows
specified parts of the corpus to be extracted from the CD.


A multiple source environment is a busy place! Users will wish to find portions
of the crossword task with interesting overlapping speech events. For example,
users may wish to locate all simultaneous vowel segments. The 'findsegs'
utility is designed to simplify such complex multi-channel searches. The
source code (in C) is provided on a public-domain basis.


All sound files on ShATR are losslessly compressed using 'shorten' - Tony
Robinson's (Cambridge University Engineering Department) public-domain
compression program. To speed up decompression, the individual sensor
recordings are split into chunks of 1 minute duration. The purpose of the
'sndextrc' utility is to make it easy to extract specified segments (for
example, the results of a run of findsegs) from such compressed sound files.
It is recommended that *all* access to the ShATR data is made via sndextrc -
any other mechanism would have to take account of the compression and splitting
of session files and would end up looking like sndextrc. The sndextrc facility
can perform multiple extractions and is pretty fast. The source code (in C) is
provided on a public-domain basis.


The price of this CDROM will be GBP 30. This will just cover the costs of
materials used in producing the CDROM. This is a very good offer if you compare
the price to what you have to pay for other large speech corpora. You can order
the CDROM from:

Speech and Hearing Research Group
Department of Computer Science
University of Sheffield
Regents Court
211 Portobello Street
Sheffield S1 4DP

We accept foreign bank drafts in GBP as well as cheques drawn on a UK bank
branch. The bank draft or cheque should be made payable to the University of
Sheffield, code no. 560009 account no. 00021326 (National Westminister Bank,
Sheffield City Branch, 42 High Street, Sheffield S1 1QG, U.K., University of
Sheffield account no. 1).
Delivery time will in most cases be less than two weeks, depending on stock and


ShATR is an acronym for Sh(effield University)/A(dvanced) T(elecommunications)
R(esearch Institute International). The reason for this is that the corpus was
planned and transcribed by us here at Sheffield, but it was recorded at ATR in
Kyoto, Japan with the use of some of their equipment and the help of several
people at ATR.


For more information about ShATR have a look at ShATR's homepage:


On this page you can also find links to the papers about ShATR, which describe
the corpus in more detail.

On behalf of the Speech and Hearing Research Group, I apologise if you
receive this message more than once.
Thank you for your time.
  Brian Karlsen
Brian Lykkegaard Karlsen           | Speech and Hearing Research Group
Research Associate                 | Department of Computer Science
                                   | University of Sheffield
E-mail: B.Karlsen@dcs.shef.ac.uk   | Regent Court
http://www.dcs.shef.ac.uk/~brian   | 211 Portobello Street
Phone: (+44) (0)114 282 5562       | Sheffield S1 4DP
Fax:   (+44) (0)114 278 0972       | U.K.