5pSC11. Recent progress in the INRS speech recognition system.

Session: Friday Afternoon, June 20

Time:

Author: Douglas D. O'Shaughnessy
Location: INRS-Telecommunications, 16 Pl. du Commerce, Verdun, PQ H3E 1H6, Canada
Author: Clark Z. Lee
Location: INRS-Telecommunications, 16 Pl. du Commerce, Verdun, PQ H3E 1H6, Canada
Author: Christophe Savariaux
Location: INRS-Telecommunications, 16 Pl. du Commerce, Verdun, PQ H3E 1H6, Canada
Author: Azarshid Farhat
Location: INRS-Telecommunications, 16 Pl. du Commerce, Verdun, PQ H3E 1H6, Canada
Author: Rachida El Meliani
Location: INRS-Telecommunications, 16 Pl. du Commerce, Verdun, PQ H3E 1H6, Canada
Author: Rivarol Vergin
Location: INRS-Telecommunications, 16 Pl. du Commerce, Verdun, PQ H3E 1H6, Canada
Author: Michel Heon
Location: INRS-Telecommunications, 16 Pl. du Commerce, Verdun, PQ H3E 1H6, Canada

Abstract:

The INRS large-vocabulary continuous-speech recognition system employs a two-pass search. First, inexpensive models prune the search space; then a powerful language model and detailed acoustic--phonetic models scrutinize the data. A new fast match with two-phone lookahead and pruning speeds up the search. In language modeling, excluding low-count statistics reduces memory (50% fewer bigrams and 92% fewer trigrams); with Wall Street Journal texts, excluding single-occurrence bigrams and trigrams with counts less than five yields little performance decrease. In acoustic modeling, separate male and female right-context VQ models and a bigram language model are used in the first pass, and right-context continuous models and a trigram language model are used in the second pass. A shared-distribution clustering uses a distortion measure based only on the weights of Gaussian mixtures in the HMM model. Testing the system with a 5000-word vocabulary, the word inclusion rate (i.e., correct word retained in the first pass) is about 99%; word recognition accuracy is about 92.5%. Keyword spotting with new types of fillers retains accuracy with 1.2 false alarms/hour/keyword. [Work supported by NSERC-Canada and FCAR-Quebec.]

ASA 133rd meeting - Penn State, June 1997