Abstract:
Automatic speech understanding systems are beginning to attain a level of sophistication where commercial applications are within reach. However, if humans and machines are ever going to communicate in a natural way, it is of vital importance that language modeling go beyond the sentence level. A profound understanding of discourse structure is required, and to this end, knowledge concerning how prosody interacts with other linguistic phenomena is needed. Not only will better prosodic modeling of discourse lead to better speech recognition/understanding, it will also yield more natural-sounding speech synthesis. This paper reports on a dialogue/prosody project at Telia Research, Sweden. A Wizard-of-Oz simulation of a computerized reservation system was used to collect realistic speech data. Fifty subjects were given three tasks each that entailed the reservation of flights, trains, car hire, and hotel reservations. To avoid linguistic influence on the subjects' utterances, the tasks were given as maps and icons. A ToBI-style analysis was applied, adapted to meet language-specific requirements. The dialogues were analyzed with regard to phrase boundaries, tones, disfluencies, syntax (functions/categories), new versus given information, and pitch range. This paper describes our observations concerning the interaction between prosodic, syntactic, and higher-level linguistic phenomena, such as discourse structure.