Abstract:
Significant physical characteristics for speaker identification in speech spectral envelopes of vowels were investigated by psychoacoustic experiments. Previous studies [T. Kitamura and M. Akagi, J. Acoust. Soc. Jpn. 16(E), 283--289 (1995)] showed that the speaker individualities in spectral envelopes of vowels mainly existed in higher frequency bands. In this study, the effect of the elimination of the spectral peaks and/or dips of spectral envelopes in the higher frequency band on speaker identification was investigated. Additionally, the frequency band with speaker individualities was specified in detail. The stimuli for the experiments were vowels resynthesized from their FFT cepstral data by using the log magnitude approximation (LMA) analysis--synthesis system. They were normalized pitch frequencies and power, and handled specific frequency bands of the spectral envelope. The experimental results lead to the following conclusions: (1) The peaks in the spectral envelopes were more significant than the dips for speaker identification; and (2) speaker individualities mainly exist in the frequency band above the peak around the 20 ERB rate (1740 Hz), and the voice quality can be controlled by replacing the frequency band of one speaker with that of another.