4pSC24. Multisensor-based speech processing for robust speech recognition.

Session: Thursday Afternoon, December 5

Time: 4:30


Author: Y. Zhao
Location: Beckman Inst. and Dept. of ECE, Univ. of Illinois, 405 N. Mathews Ave., Urbana, IL 61801
Author: K. Yen
Location: Beckman Inst. and Dept. of ECE, Univ. of Illinois, 405 N. Mathews Ave., Urbana, IL 61801
Author: X. Zhuang
Location: Univ. of Missouri, Columbia, MO 65211

Abstract:

A multichannel signal-processing technique is integrated with automatic speech recognition for dealing with time-varying interference signals. Two microphones are employed for signal acquisition, each picking up a convolutive mixture of two source signals. A decorrelation adaptive filtering [Weinstein et al., IEEE Trans. SAP 405--413 (1993)] is first performed to restore the source signals; a cross-spectral analysis is next performed on the restored signals to determine the presence regions of each source signal [K. Yen and Y. Zhao, Proc. ICSLP (1996)]. The restored speech signals within the presence regions are inputted to an HMM-based speaker-independent continuous speech recognition system [Y. Zhao, IEEE Trans. SAP, 345--361 (1993)]. Five test sets were constructed from the TIMIT database under the SNR conditions of 20, 10, 0, -10, and -20 dB, each consisting of 78 sentence pairs. The acoustic coupling channels were simulated by FIR filters. The recognition vocabulary size was 853 and the task perplexity was 105. It was found that the multichannel integrated recognition system significantly improved recognition performance; above -10 dB, the word accuracies were close to that of the interference-free condition (91% word accuracy). [Work supported by NSF IRI-95-02074.]


ASA 132nd meeting - Hawaii, December 1996