Abstract:
A voice mimic system has been designed to achieve an articulatory description of the speech signal based on an analytic description of the vocal tract (VT) parameters as a function of the first two formant frequencies. Articulatory based low bit-rate speech coding using the analytic mapping technique has been demonstrated. The natural speech input is analyzed to obtain the first two resonances of the VT, and the VT shape is estimated from the analytic acoustic-to-articulatory mapping. The shape is modeled using three parameters. Then, the articulatory parameters in addition to the frequency of the vocal cord vibration are transmitted. The receiver restores the VT shape from the articulatory parameters, computes the formant frequencies corresponding to the VT shape and synthesizes the output speech using a formant synthesizer. The bit-rate of the coding is determined from the parameter sampling rate and their digital representation. An intelligible output speech signal has been obtained for bit-rates as low as 624 bits/s using parameter sampling at 48 times/s, 3 bits for each of the VT parameters and 4 bits for the vocal cord vibration frequency. Coding at this bit-rate has been demonstrated on voiced sentences. [Research supported by ARPA DAST 63-93-C-0064.]