Peter F. Assmann
William F. Katz
Kathleen M. Jenouri
Phillip W. Hamilton
School of Human Development, Univ. of Texas at Dallas, Box 830688, Richardson, TX 75083-0688
To examine developmental patterns in the production and perception of American English vowels, recordings were made of 12 /hVd/ words from 10 men, 10 women, and 30 children (ages 3, 5, 7). Fundamental frequency (F0) and formant center frequencies (F1--F4) were estimated and a subset of the measurements served as input to a cascade formant synthesizer. Natural and synthesized vowels were presented to adult listeners for identification. Overall, natural tokens were identified more accurately than synthesized versions. Performance was significantly lower when time-varying changes in either F1 or F2 were replaced by constant values drawn from the vowel nucleus. A further drop in accuracy resulted when all formants (F1-F4) and F0 were ``flattened,'' consistent with findings of Hillenbrand [J. Acoust. Soc. Am. 97, 3245(A) (1995)]. These findings highlight the perceptual importance of time-varying changes in vowel spectra. It has been suggested that time-varying changes in the formants can improve the intelligibility of vowels whose spectral envelopes are sparsely sampled by harmonics of the source spectrum. Although the vowels produced by children were generally less well identified, there was no evidence of an increased contribution of formant frequency dynamics with decreasing age. [Work supported by Texas Advanced Research Program.]