Abstract:
The RWC program is constructing substantial databases for advancing and evaluating research and development conducted under its program and related domains. In this presentation the motivation of this effort, a basic design of spoken dialog databases, and the current status of data collection work are described. At the first stage, some fundamental data collection has been carried out to determine several environmental conditions and data-filing specifications. Here, two topics are selected for the dialog: one was dialogs between car dealers and customers, and the other was dialogs between travel agents and customers. Professional dealers and agents were employed to produce reality in the conversations. To date, 60 samples of dialogs were recorded and 48 of them were filed into CD-ROMs which included about 10 h of speech waveforms with transcriptions and labeling-related information. The speech data are almost completely spontaneous but are of good quality in the acoustic-phonetic sense. The CD-ROMs are ready for distribution as a database to be used in research. [Work supported by the RWC program, MITI, Japan.]