Abstract:
In noisy environment, performance of speech recognition systems trained in quiet environment is degraded. One of the reasons is acoustic phonetic modification caused by the Lombard effect, another is noise contamination of speech signal. This paper presents a new method for isolated word recognition in noisy environment. The method is based on two techniques. One of them is based on variability models for acoustic phonetic modification in Lombard speech, and another is to estimate additive noise spectrum frame by frame. The acoustic phonetic variability models represent the spectral difference between normal speech and Lombard speech. Each model is comprised of a nonlinear warping function on spectral domain and two spectral filters. The warping function represents formant shift. Two filters do the changes of formant bandwidth and of spectral tilt. The noise estimation is executed for each frame of noisy input with noise models and speech models made with clean speech data. These techniques were applied to speaker-dependent word recognition based on continuous density HMMs of subphoneme. Experimental evaluations were executed with the noisy Lombard speech data of 100 isolated words. From the experiments, the effectiveness of the proposed method has been confirmed.