Abstract:
Recent progress in the fundamental technologies, including speech signal processing, statistical modeling, and language processing, gives a high performance of speech understanding. However, the speech interface that can communicate with humans naturally has not been constructed yet. In order to realize such a system, it is necessary to analyze how humans behave when they communicate with each other by speech and to implement the structure of the spoken dialogue into the speech conversational system. This paper will describe the results obtained by analyzing several different types of spoken dialogue. A nontask oriented dialogue model is introduced on the basis of the dialogue corpora for several tasks, and is integrated into the speech conversational system. The dynamic structure of the function of each utterance, turn-taking, interrupt, and so on, can be modeled by the statistical method. The same method is applicable for the case in which the visual information channel exists along with the speech channel. In the coming multimodal dialogue system of the future, the use of visual information will be useful and both auditory and visual channels should be modeled simultaneously.