Abstract:
A speaker-based automatic language identification system has been studied to discriminate both languages and dialects. Syllabic spectral features are extracted automatically by an artificial neural network. The system makes no use of phonetic segmentation for recognition or training. High performance has been obtained for monologue telephone recordings. The system has been applied to several different databases to study the discrimination possible among different languages, dialects, and accents with such limited phonetic--acoustic information. The system performance is also related to individual speaker characteristics and the size of reference population for each language, and the length of testing samples. This study will report experimental results related to performance and data environments. It will also report an attempt by others to compare recognition performance of trained or expert human listeners with automatic recognition results.