Abstract:
Dynamic three-dimensional (3-D) vocal-tract (VT) shapes are fundamental to successfully quantifying the relation between articulatory and acoustic characteristics of the vocal tract. Unfortunately, direct measurements of the vocal tract are currently restricted to the acquisition of either static 3-D or dynamic 2-D shapes. Here, an indirect method to estimate dynamic 3-D VT shapes is described and evaluated. The procedure is as follows: first a set of static 3-D VT shapes (derived from MRI) is used to construct a parametric model of the vocal tract. This model constrains the 3-D shape space to be morphologically plausible. Next, 2-D point trajectories (obtained midsagitally with a magnetometer) are used in conjunction with the simultaneously measured first three formant frequencies to estimate the corresponding vocal-tract parameters. Indirect evaluation is performed in two ways: (i) Midsagittal 2-D points are extracted from MIR 3-D VT shapes, and the corresponding formant frequencies are calculated. This information is used to obtain estimated 3-D VT shapes which are then compared with the original MRI shapes. (ii) Area function trajectories are derived from estimated 3-D VT shape trajectories, and used as input to an area-driven speech synthesizer whose output is compared analytically with the original speech.