Abstract:
An HMM labeler has been extended to detect poor correspondence between phonetic labels and underlying acoustic data. This paper will present work extending the labeler to model perceptual confusions of human listeners from a forced-choice word identification experiment which used dysarthric speech. The speech and perception data are from the Nemours Dysarthric Speech database [Menendez et al., Proceedings of ICSLP 96, SaP2P1.19 (1996)]. The perceptual data comprise distributions of listener identification responses over sets of from four to six words (the intended word plus several phonetically similar foils). In all, 37 words were produced twice by each of 10 dysarthric talkers providing a total dataset of 740 items. Each of these items was identified at least 12 times by five naive listeners for a total of at least 60 responses per item. Half of this data set will be used to adapt parameters of the HMM labeler to reproduce the distribution of human responses to the speech. The remaining half of the data set will be used to assess the ability of the labeler to select phonetic responses in a manner reflecting patterns of human perceptual confusions among the response set items. [Work supported by the Nemours Research Programs and NIDRR.]