5aPP13. Feature extraction of phonemes from autocorrelation functions of auditory filterbank outputs.

Session: Friday Morning, December 6

Time: 11:15


Author: Koichi Sato
Location: Dept. of Information Eng., Faculty of Eng., Hokkaido Univ., Kita-ku, Sapporo, 060 Japan
Author: Jun Toyama
Location: Dept. of Information Eng., Faculty of Eng., Hokkaido Univ., Kita-ku, Sapporo, 060 Japan
Author: Masaru Shimbo
Location: Dept. of Information Eng., Faculty of Eng., Hokkaido Univ., Kita-ku, Sapporo, 060 Japan

Abstract:

Features extracted from auditory filterbank outputs are useful as a front end for speech recognition systems. If an auditory filter has a center frequency (CF)[Hz], the extracted feature has been, in most cases, used at the CF[Hz] itself. In the case of inputting an f [Hz] pure tone into the auditory filterbank, the feature spreads across a wide frequency, though the output frequencies of any filters are f [Hz]. Therefore, the feature must be used not at CF[Hz], but at a special frequency which depends on the input wave. An autocorrelation function is useful for extracting such a special frequency. If the function has peaks at every T [s], the output of the filter mostly includes the component of 1/T [Hz]. Thus the special frequency and the feature can be extracted from the peaks of the autocorrelation function. As a result, the features extracted from all filters are not spread across a wide range of frequency. At low frequencies (less than about 1 kHz), the features are distributed among the pitch of the voice and its harmonies. At high frequencies, where pitch harmonies are not separated, the features are distributed among voice formants.


ASA 132nd meeting - Hawaii, December 1996