Abstract:
A quantitative target-based model of articulatory trajectory formation in speech is proposed here, where for a same vowel sequence, equilibrium targets are assumed to remain identical, independent of speaking rate and stress; only their timing and the muscle cocontraction level are adjusted following prosodic requirements. This model is used to examine how relevant motor control information could be extracted from the acoustic signal to help identifying vowels by providing clues on the stress or rate conditions. Sequences [iai] and [i(epsilon)i] under three prosodic conditions (slow stressed ---ideal condition---, slow unstressed and fast stressed ---reduced conditions---) are analyzed. Equilibrium targets are imposed as the actual positions are reached by the tongue body under the ideal condition. The cocontraction level and the timing of the commands are inferred using a two-step inversion procedure: from the acoustic signal to tongue body trajectories then to motor commands. It is shown that at a given speaking rate, stressed sequences are produced with either longer transitions or a higher cocontraction level, or both. To assess the reliability of inferred motor commands, the model's sensitivity is evaluated around the inferred temporal and cocontraction patterns, and perceptual tests are run on synthetic stimuli.