Abstract:
Physiological data suggest that a two-dimensional signal representation of amplitude-modulation frequency against center frequency is extracted in the central nucleus of the inferior colliculus [C. E. Schreiner and G. Langner, J. Neurophysiol. 60, 1823--1840 (1988)]. The representation groups signals with common harmonics, and can easily be adapted to also enhance common onsets. This makes it an attractive basis for a representational model of auditory scene analyis. The map is used as a front end to a neural network pattern matching stage for vowel recognition. The model is tested against human performance in a concurrent vowel recognition task [P. F. Assmann and Q. Summerfield, J. Acoust. Soc. Am. 88, 680--697 (1990)]. It predicts human performance for a F0 grouping task, even for vowels with up to a 12-dB intensity difference, well. Another grouping cue, demonstrated in abstract sounds, is common onset. The double-vowel task was modified to include onset asynchrony. A model based on the change in the AM-map representation between successive frames predicts human performance well. It is possible to directly drive a pattern matching stage with the AM map representation removing the need for an explicit grouping processes. The model is inherently robust when driven by stimuli that violate grouping cues.