4aSC21. Speaker recognition model using two-dimensional mel-cepstrum and self-growing LVQ.

Session: Thursday Morning, December 5

Time:

Author: Tadashi Kitamura
Location: Dept. of Intelligence and Comput. Sci., Nagoya Inst. of Technol., Gokiso, Showa, Nogoya, 466 Japan
Author: Kazue Hashimoto
Location: Dept. of Intelligence and Comput. Sci., Nagoya Inst. of Technol., Gokiso, Showa, Nogoya, 466 Japan
Author: Chiyomi Miyajima
Location: Dept. of Intelligence and Comput. Sci., Nagoya Inst. of Technol., Gokiso, Showa, Nogoya, 466 Japan

Abstract:

This paper describes a speaker recognition model using TDMC and self-growing LVQ. Two-dimensional mel-cepstrum (TDMC) consists of averaged and dynamic spectral features of the two-dimensional mel-log spectra in the analyzed interval. A self-growing algorithm is used for VQ. In the beginning of this algorithm, there is no centroid and the first input feature vector is set to the first centroid. Then an input vector is classified into a specified centroid. When the amount of data in the centroid becomes greater than a predetermined threshold, the centroid is divided into two so that each centroid has the same number of data. By these procedures the number of centroids increases gradually while the amount of the data in each centroid become almost equal. In this study, text-dependent speaker identification experiments for 30 speakers were carried out. Each speaker recorded 10 digits three times during five sessions. Each speaker model is created using the speech data of two sessions. The area, the number of centroids, and a combination of averaged and dynamic features of TDMC were studied. The experimental results have shown that the combination of averaged and dynamic features is very effective and the proposed model gives an averaged identification score of 95%.

ASA 132nd meeting - Hawaii, December 1996