Abstract:
In default of a method for obtaining dynamic/time-varying 3-D vocal tract data, it remains the case that the midsagittal profile provides the best characterization of tract articulators. For this reason articulatory approaches to speech synthesis typically model the vocal tract as a series of concatenated tubes whose cross-sectional areas are related by some heuristic (the area function) to the midsagittal cross dimensions. But while areas alone are adequate for lower formants, accurate modeling of higher frequencies requires access to details of tract morphology. Previous work based on MRI volume parametrization from a single subject [Tiede et al., Proc. ETRW-SPM, 41--44 (1996)] has demonstrated the feasibility of recovering correlated cross-sectional shapes from the midsagittal profile alone. In the current study this approach is extended to analysis of four English and four Japanese subjects (five MRI-scanned vowels per subject). The results show that characteristic tract shapes for vowels can be recovered from a small and highly constrained set of parameters. Since area is easily recoverable from such shapes, this approach can be used to augment standard area function techniques.