Abstract:
A method was proposed for posterior use of prosodic features to ensure correct recognition and to detect recognition errors. Fundamental frequency contours (F[inf 0] contours) are generated for recognition hypotheses using the prosodic rules developed for speech synthesis and are compared with the observed contour. Partial analysis-by-synthesis absorbs unexpected variations in the observed contour. This method can detect recognition errors accompanied by accent type changes and/or syntactic boundary shifts. While syntactic boundaries are useful for speech recognition, detecting them based on prior use of F[inf 0] contours is sometimes rather hard since they are less marked in the F[inf 0] contours. Therefore, the method was evaluated to determine how well it can detect syntactic boundaries using pitch information. Preliminary results given by K. Hirose and A. Sakurai [Proc. ICASSP-96, 809--812 (1996)] were further validated on the ATR continuous speech conference registration database, which includes 37 major syntactic boundaries (not preceded by long pauses but) accompanied by F[inf 0] rises reflecting to phrase components. The method identifies these boundaries with 92% accuracy within 2-mora position error, and, within 1-mora position error, with 86% accuracy. Discussion will extend to augmenting this method using statistical techniques such as HMMs.