Abstract:
Current systems for speech recognition and synthesis are limited by our incomplete knowledge of the variability in the acoustic properites of speech sounds depending on the context, the speaker, and the mode of speaking. A biproduct of this variability is that there is redundancy in the acoustic correlates of phonetic units, and this redundancy can be used to advantage when speech is modified by noise or other distorting influences. A goal of research in the next decade is to quantify these sources of variability and to incorporate this knowledge into models for speech generation and reception. Variability in the acoustic properties of speech sounds is a consequence of articulatory variability. This process is illustrated with data for place-of-articulation features for stop consonants in English, and with formant data for the vowels /(cursive beta)/ and /(open aye)/. There is systematic variation in the bursts and formant transitions for individual stop consonants across speakers and contexts that can be explained in terms of shapes of cavities anterior and posterior to the consonant constriction. Variability in formants for the vowels can be accounted for by the requirements on the tongue-body position and tongue-body dynamics for the adjacent segments. [Supported in part by NIH Grant DC02978.]