Abstract:
Considerable research in the area of speech recognition, in the past two decades, using HMMs has led to successful reduction in word error rates (WER). Owing to their statistical nature, HMMs require large amounts of data to get good estimates of the model parameters. It has been shown that a speaker dependent (SD) model has about 2--3 times less WERs compared to a speaker independent (SI) model. But the heavy data requirement from one speaker is quite unreasonable. Speaker adaptation techniques mitigate this problem by the use of SI data to obtain robust estimates and then use small amounts of SD data to adjust these parameters so that they are optimal to that speaker (in some optimality criterionlike ML). In this paper, the different techniques of adaptation are studied. The salient methods, namely MLLR and MAP, of each type are discussed in detail. A method of constrained unsupervised adaptation, which discards parts of the original hypothesis that are deemed to be incorrect, is proposed. This decision is made on past experience obtained on training data. This decision making module uses a neural network and a CART tree.