Abstract:
A model is presented which was developed for the prediction of speech transmission quality of (low-bit-rate) speech codecs [Hansen and Kollmeier, ICASSP'97, paper #2056 (1997)]. The model is based on a quantitative psychoacoustical preprocessing scheme [Dau et al., J. Acoust. Soc. Am. 99, 3614--3622 (1996)] and was successfully applied to various speech codec test databases. This study presents measurements and modeling results of the detectability of band-specific modulated-noise distortions. Two sentences were used as stimuli. They were telephone-band-filtered and presented to subjects diotically via headphones in a sound-attenuating booth. To mimic band-specific speech codeclike distortions, two types of distortion were applied to the stimuli: modulation with wideband noise (MNRU) followed by critical-band bandpass-filtering, and vice versa. In an adaptive 2I-2AFC experiment, the modulation depth at threshold was measured as a function of the critical-band center-frequency. The measured results were compared with the objective speech quality measure and calculated for the same signals used in the two experiments. At threshold levels of modulation the computed objective quality measure was found to be constant. This indicates a direct monotonic relation between speech transmission quality and the detectability of the distortion.