4aSC12. Name retrieval from pronunciation and spelling over the telephone, using HMM modeling and robust directory access.

Session: Thursday Morning, December 4


Author: G. Gravier
Location: ENST, 46 rue Barrault, 75634 Paris Cedex 13, France
Author: F. Yvon
Location: ENST, 46 rue Barrault, 75634 Paris Cedex 13, France
Author: G. Etorre
Location: ENST, 46 rue Barrault, 75634 Paris Cedex 13, France
Author: G. Chollet
Location: ENST, 46 rue Barrault, 75634 Paris Cedex 13, France

Abstract:

A system to retrieve names in a directory from their pronunciation and their spelling is presented, for telephone quality speech. Most of the previous approaches to letter recognition for spelled names are knowledge-based because of the high phonetic confusability between letters [C. S. Myers and L. R. Rabiner, J. Acoust. Soc. Am. 71(3), 716--727 (1982)] [Cole et al., Eurospeech'91, 479--482]. To avoid the knowledge-based approach, HMM modeling of speech units (SU) is used with word models for letters and phone models for name pronunciation. The directory is built automatically from a list of names using a grapheme-to-phoneme converter for the name and rules for the spelling, each entry in the directory consisting of several variants for both the pronounced and spelled names. From the acoustic recognition, the corresponding entry in the directory is found using DTW or a tree-search algorithm. Both methods allow insertion and deletion, and the cost for a substitution can either be fixed or determined from the train corpus confusion matrix. On a test database of 5000 names uttered by 3000 speakers (~ 50 000 SU), 70% of the units are correctly recognized. Preliminary experiments based on spelled names with a DTW lexical search, showed that substitution costs defined using the confusion matrix significantly improved the results.


ASA 134th Meeting - San Diego CA, December 1997