Abstract:
Cepstrum mean normalization is an effective method for recognizing distorted telephone speech. This method compensates for the difference of bias on cepstrum coefficients (CC) between training data and test data by subtracting the mean value of the CC calculated from a certain amount of given speech data. Such adaptation data are not phonetically balanced, which makes it difficult to get an accurate mean value for the CC. In this paper, a new approach to resolve this problem is proposed. Before recognizing speech, not only the mean value of the adaptation data itself must be calculated, but also the mean value from Gaussian distribution of continuous density HMMs must be calculated, whose phonemes appear in the adaptation data. When recognition occurs, the difference of these two mean values is subtracted from speech data. This proposed method using various telephone speech data has been investigated, i.e., speech data from an ordinal analog telephone, a code-less handset telephone, and a digital cellular phone (Japanese full-rate digital phone based on VSELP), which was recorded through the public switched telephone network. These experimental results show an advantage of the new method.