4pSC22. Compensation for speech recognition in degraded acoustical environments.

Session: Thursday Afternoon, December 5

Time: 4:00


Author: Richard M. Stern
Location: Dept. of Elec. and Comput. Eng. and School of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA 15213
Author: Pedro J. Moreno
Location: Dept. of Elec. and Comput. Eng. and School of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA 15213
Author: Bhiksha Raj
Location: Dept. of Elec. and Comput. Eng. and School of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA 15213

Abstract:

The accuracy of speech recognition systems degrades when operated in adverse acoustical environments. This paper discusses two ways in which more detailed mathematical descriptions of the effects of environmental degradation can improve speech recognition accuracy using both ``data-driven'' and ``model-based'' compensation strategies. Data-driven methods learn environmental characteristics through direct comparisons of speech recorded in the noisy environment with the same speech recorded under optimal conditions. Model-based methods use a mathematical model of the environment and attempt to use samples of the degraded speech to estimate model parameters. Two approaches to data-driven compensation, RATZ and STAR, are described, as well as a new approach to model-based compensation, referred to as the vector Taylor series (VTS) algorithm. Compensation algorithms are evaluated in a series of experiments measuring recognition accuracy for speech from the ARPA Wall Street Journal database that is corrupted by artificially added noise at various signal-to-noise ratios (SNRs). For any particular SNR, the greatest recognition accuracy obtained using a practical compensation algorithm is observed when that system is trained using noisy data at that SNR. The RATZ, VTS, and STAR algorithms achieve this bound at global SNRs as low as 15, 10, and 5 dB, respectively. [Work supported by ARPA.]


ASA 132nd meeting - Hawaii, December 1996