Abstract:
Efficient parameter sharing is essential in the design of robust phoneme context-dependent HMMs for unseen phonetic context in the training data. Maximum likelihood success state splitting (ML-SSS) [H. Singer and M. Ostendorf, Proc. ICASSP, 601--604 (1996)], an iterative algorithm to design the topology of HMMs, is convenient for processing a speaker-independent database as it reduces the influence of speaker variation on the topology. However, to reduce the cost of iterative likelihood calculation, accurate phoneme boundaries are essential for this algorithm. This paper proposes a phoneme boundary estimation method for a speaker-independent database. First, for the individual speakers in the target database, speaker-adaptation HMMs are trained with embedded re-estimation using data that is to be used for boundary estimation. Then, phoneme boundaries in each speaker's speech data are estimated with a Viterbi algorithm using the speaker-adaptation HMMs of each speaker. By applying this phoneme boundary decision method to build speaker-independent models, it was confirmed that phoneme recognition error was reduced by 8.3% compared to conventional speaker-independent models. This result shows the effectiveness of the automatic topology derivation and shows that phoneme boundaries estimated by the proposed method are accurate enough for model training.