4pSC1. Dynamic auditory representations and statistical speech recognition.

Session: Thursday Afternoon, December 5

Time: 5:15


Author: Brian Strope
Location: Dept. of Elec. Eng., UCLA, 66-147E Eng. IV, 405 Hilgard Ave., Los Angeles, CA 90095
Author: Abeer Alwan
Location: Dept. of Elec. Eng., UCLA, 66-147E Eng. IV, 405 Hilgard Ave., Los Angeles, CA 90095

Abstract:

The most common spectral estimation algorithm used for automatic speech recognition incorporates rough approximations of basic aspects of auditory modeling: frequency selectivity and magnitude compression. Attempting to improve the robustness and overall performance of ASR, researchers have proposed more sophisticated auditory models as the spectral estimation front end, with generally modest success at best. One common concern throughout these efforts is that the representation derived from an auditory model may not be a good match for typical statistical recognition algorithms. Recently, a dynamic auditory model that emphasizes changing local spectral peaks and has improved recognition robustness compared to other common front ends has been derived and implemented. The present work uses specific case examples to show how the perceptual representation leads to a softening of the resulting statistical models. The works also proposes a simple mechanism to adapt dynamic spectral features into a form more suitable for segmentally static statistical characterization. The mechanism is based on approximating the temporal derivative of the frequency position of local spectra peaks. The impact of our auditory model with this processing mechanism on robust recognition performance is discussed. [Work supported by NIH Grant No. 1 R29 DC 02033-01A1, and by NSF.]


ASA 132nd meeting - Hawaii, December 1996