4aSC16. An automatic speech recognition system using time-delays self-organizing maps with physiological parametric extraction.

Session: Thursday Morning, December 4


Author: Jose M. Ferrandez
Location: Laboratorio de Comuncacion Oral R. W. Newcomb, Facultad de Informatica, Univ. Politecnica Madrid, C. Montegancedo, Boadilla del Monte, 28660 Madrid, Spain, jmanuel@naranjo.datsi.fi.upm.es
Author: Daniel del Valle
Location: Laboratorio de Comuncacion Oral R. W. Newcomb, Facultad de Informatica, Univ. Politecnica Madrid, C. Montegancedo, Boadilla del Monte, 28660 Madrid, Spain, jmanuel@naranjo.datsi.fi.upm.es
Author: Victoria Rodellar
Location: Laboratorio de Comuncacion Oral R. W. Newcomb, Facultad de Informatica, Univ. Politecnica Madrid, C. Montegancedo, Boadilla del Monte, 28660 Madrid, Spain, jmanuel@naranjo.datsi.fi.upm.es
Author: Pedro Gomez
Location: Laboratorio de Comuncacion Oral R. W. Newcomb, Facultad de Informatica, Univ. Politecnica Madrid, C. Montegancedo, Boadilla del Monte, 28660 Madrid, Spain, jmanuel@naranjo.datsi.fi.upm.es

Abstract:

Physiological parametric extraction uses auditory models as a front end for speech recognition. These last methods assume that if speech signals are coded in the same way that the auditory system does, speech could be later identified showing the main properties that biological systems do: robustness and accuracy. The proposed system consists of a cochlear model implemented by gammatone filterbanks as proposed by Patterson [J. Acoust. Soc. Am. 96, 1409--1418 (1994)]. This stage will feed a nonlinear mechanical-to-neural transduction module based on the Meddis hair-cell model [J. Acoust. Soc. Am. 79, 702--711 (1986)], which will compute auditory-nerve firings. Finally, a temporal integration/component--extraction module will integrate neural patterns for identifying the relevant components embedded in the speech signals [characteristic frequency (CF), frequency modulation (FM) and noise burst (NB)], which are shared by human speech and animal sounds for communication. The model adopts a spatiotemporal strategy, which uses temporal information in low CF fibers (phase-locking mechanism) and spatial information for the higher ones. The recognizing module consists in a time-delay self-organizing map, which will capture not only the spectral variability contained in the signal, but also the temporal one, providing better generalization properties. [Work supported by NATO CRG-960053.]


ASA 134th Meeting - San Diego CA, December 1997