Abstract:
Using synthetic speech from an articulatory speech synthesizer, statistics are generated of the error between actual articulatory configurations and those estimated by an acoustic-to-articulatory mapping routine. Based solely on acoustics, neglecting aerodynamic and perceptual issues, histograms of total estimation error suggest that the inverse problem is no more ambiguous for fricatives than for vowels. By examining the error covariance, dominant articulatory dimensions are identified in the fricative model that have the greatest effect on the acoustic transfer function and, as a result, are better estimated by the acoustic-to-articulatory mapping routine. Weak articulatory dimensions are also found that the acoustic-to-articulatory mapping routine can barely estimate better than simply guessing. Suggestions are made for ways in which these error statistics, and specifically the knowledge of the unequal importance of different articulatory dimensions, can be used to motivate improved techniques for the acoustic-to-articulatory mapping of speech. Demonstrations of some of these ideas will be given on real and synthetic speech. [Work supported by the AFOSR.]