Abstract:
A system to retrieve names in a directory from their pronunciation and their spelling is presented, for telephone quality speech. Most of the previous approaches to letter recognition for spelled names are knowledge-based because of the high phonetic confusability between letters [C. S. Myers and L. R. Rabiner, J. Acoust. Soc. Am. 71(3), 716--727 (1982)] [Cole et al., Eurospeech'91, 479--482]. To avoid the knowledge-based approach, HMM modeling of speech units (SU) is used with word models for letters and phone models for name pronunciation. The directory is built automatically from a list of names using a grapheme-to-phoneme converter for the name and rules for the spelling, each entry in the directory consisting of several variants for both the pronounced and spelled names. From the acoustic recognition, the corresponding entry in the directory is found using DTW or a tree-search algorithm. Both methods allow insertion and deletion, and the cost for a substitution can either be fixed or determined from the train corpus confusion matrix. On a test database of 5000 names uttered by 3000 speakers (~ 50 000 SU), 70% of the units are correctly recognized. Preliminary experiments based on spelled names with a DTW lexical search, showed that substitution costs defined using the confusion matrix significantly improved the results.