Abstract:
This paper proposes a technique for adapting continuous density HMM speech recognition systems trained on clean speech data to make it robust to additive noise. The approach is based on cepstral parameter generation from speech and noise HMMs and parameter compensation using generated parameters of speech and noise. For clean speech and noise HMMs including cepstral and dynamic parameters, cepstral parameter vector sequences are generated in such a way that the probability of observing the parameter vector sequence from the given HMM is maximized using dynamic parameters. Then the generated cepstral vector sequences of speech and noise are combined to yield a noisy speech cepstral vector sequence. Compensated means of cepstral and dynamic parameters for noisy speech HMM are obtained from statistics of the noisy speech parameter sequences. Variances for noisy speech HMM are also estimated using the relationship between the clean and noisy speech parameter sequences. As a result, the technique provides cepstral and dynamic parameter compensation for noisy speech HMM. Moreover, it does not require the lognormal approximation used in cepstral parameter compensation based on the parallel model combination technique. Word recognition experiments based on phoneme HMMs show the effectiveness of the proposed technique.