5pSC5. Integrated acoustic and language modeling of speech disfluencies.

Session: Friday Afternoon, December 6

Time: 3:05


Author: Elizabeth E. Shriberg
Location: Speech Technol. and Res. Lab., SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025
Author: Rebecca A. Bates
Location: Speech Technol. and Res. Lab., SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025
Author: Andreas Stolcke
Location: Speech Technol. and Res. Lab., SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025

Abstract:

This work investigates the use of prosodic features in modeling disfluencies (filled pauses, repeated words, and self-repairs) in spontaneous speech. The main goal is to automatically detect and correct disfluencies, so that a ``fluent'' version of a disfluent utterance can be used as input for speech understanding and other applications. A second goal is to develop explicit acoustic and language models for disfluencies to improve speech recognition performance for spontaneous speech. The prosodic features examined include duration, fundamental frequency, amplitude, and features correlated with voice quality. Decision trees serve as the acoustic models that relate these prosodic features to disfluency events. To integrate the disfluency model into speech recognition, decision tree probabilities are combined with standard acoustic model scores and probabilities from a ``Clean-up'' language model to rescore N-best hypotheses. The Cleanup language model represents disfluencies as hidden events, and predicts words following a disfluency from the corresponding fluent word sequence. A linguistically hand-annotated version of the Switchboard corpus is used for model training and evaluation. [Work supported by the National Science Foundation under Grants No. IRI-9314967 and No. IRI-8905249.]


ASA 132nd meeting - Hawaii, December 1996