Abstract:
Features extracted from auditory filterbank outputs are useful as a front end for speech recognition systems. If an auditory filter has a center frequency (CF)[Hz], the extracted feature has been, in most cases, used at the CF[Hz] itself. In the case of inputting an f [Hz] pure tone into the auditory filterbank, the feature spreads across a wide frequency, though the output frequencies of any filters are f [Hz]. Therefore, the feature must be used not at CF[Hz], but at a special frequency which depends on the input wave. An autocorrelation function is useful for extracting such a special frequency. If the function has peaks at every T [s], the output of the filter mostly includes the component of 1/T [Hz]. Thus the special frequency and the feature can be extracted from the peaks of the autocorrelation function. As a result, the features extracted from all filters are not spread across a wide range of frequency. At low frequencies (less than about 1 kHz), the features are distributed among the pitch of the voice and its harmonies. At high frequencies, where pitch harmonies are not separated, the features are distributed among voice formants.