Abstract:
Large vocabulary, isolated word recognition requires a large amount of training data proportional to the vocabulary size to characterize each individual word model. A subword-unit-based approach is a more viable alternative than the word-based approach to overcome the problem of the training data size, since different words can share common segments in their representations in the former. This paper deals with a couple of isolated word recognition systems where the subword-unit-based approach is commonly employed, though their methods of segmentation are completely different. In one system a hidden Markov model is used to decompose a word into subword units (segments), and frequency spectra of those subword units are fed to a recurrent neural network to yield a subword code sequence for the word. This sequence is then recognized hopefully as the original word by a set of hidden Markov models for isolated words. In the other system subword boundaries within a word are detected by finding peaks of the delta cepstrum of the word, and the resulting sequence of subwords is deciphered into the original word by means of concatenated hidden Markov models of isolated words. Those systems attain average recognition accuracies over 92%--96%.