4aSC18. Design and data collection for a spoken dialog database in the Real World Computing (RWC) program.

Session: Thursday Morning, December 5

Time:


Author: Kazuyo Tanaka
Location: Electrotechnical Lab., 1-1-4 Umezono, Tsukubashi, Ibaraki, 305 Japan
Author: Satoru Hayamizu
Location: Electrotechnical Lab., 1-1-4 Umezono, Tsukubashi, Ibaraki, 305 Japan
Author: Yoichi Yamashita
Location: Osaka Univ.
Author: Kiyohiro Shikano
Location: AIST-Nara
Author: Shuichi Itahashi
Location: Univ. of Tsukuba
Author: Ryuichi Oka
Location: RWCP

Abstract:

The RWC program is constructing substantial databases for advancing and evaluating research and development conducted under its program and related domains. In this presentation the motivation of this effort, a basic design of spoken dialog databases, and the current status of data collection work are described. At the first stage, some fundamental data collection has been carried out to determine several environmental conditions and data-filing specifications. Here, two topics are selected for the dialog: one was dialogs between car dealers and customers, and the other was dialogs between travel agents and customers. Professional dealers and agents were employed to produce reality in the conversations. To date, 60 samples of dialogs were recorded and 48 of them were filed into CD-ROMs which included about 10 h of speech waveforms with transcriptions and labeling-related information. The speech data are almost completely spontaneous but are of good quality in the acoustic-phonetic sense. The CD-ROMs are ready for distribution as a database to be used in research. [Work supported by the RWC program, MITI, Japan.]


ASA 132nd meeting - Hawaii, December 1996