ASA 127th Meeting M.I.T. 1994 June 6-10

1pSP44. Production of speech from a physiological model of speech organs.

Hiroyuki Hirai

Jianwu Dang

Kiyoshi Honda

ATR Human Inf. Process. Res. Labs., 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-02 Japan

A physiological model of laryngeal and supralaryngeal articulators has been designed based on the morphological data from MRI images. The model has a hybrid construction: a finite element method for tongue deformation and a mass-spring model for connecting other rigid structures by muscles. The posture of the model is determined by computing static equilibrium of muscle forces on all of the components, and is used to calculate the vocal tract areas and the transfer function of the model with reference to the MRI data. The cricothyroid angle determines parameters for the vibration of the two-mass model [Bell Syst. Tech. J. 51, (6) (1972)] and these source sounds are then fed to the model to generate synthesized speech. In this model, the kinematical connections among the jaw, the hyoid bone, the tongue, and the laryngeal cartilages represent so called tongue--larynx interaction that is observed in natural speech. The source-tract acoustic coupling also contributes to the naturalness of the output. The overall performance of the model for vowel production with various F0 levels has been tested by comparing the acoustic data from the model's output and corresponding recorded speech.