Subject: Autocorrelation-like representations and mechanisms From: Peter Cariani <peter(at)EPL.MEEI.HARVARD.EDU> Date: Tue, 4 Mar 2003 16:07:32 -0500On Saturday, February 22, 2003, at 02:01 PM, Martin Braun wrote: > As to the pitch model of autocorrelation, also this is a model that is > anatomically and physiologically unrealistic. Ray Meddis, one of the > major > advocates of this model has now given it up, in favor of a new model > (see > below) that is based on anatomical and physiological data that were > described in detail by Gerald Langner and me. Martin, I do regret not reading my email while at ARO last week. I don't think that Ray's recent proposed mechanism for central processing of peripheral temporal patterns is necessarily a repudiation of the notion of an early representation of pitch based on all-order interval information. In any case, most pitch phenomena do behave in a manner that is consistent with an autocorrelation-like representation (more on this below), so any putative central neural mechanism needs to behave accordingly. This is why Licklider constructed the neural temporal autocorrelation architecture that he did. We don't stop saying that the binaural system is a cross-correlator because inhibition may be involved. Likewise, the monaural system as a whole may behave in most important respects like an autocorrelator, even if the mechanism is not a Licklider style network with synaptic delays and coincidence counters that function as explicit pitch detectors. I do very much hope that Ray's mechanism works (it would solve what I think is the central open problem in auditory neurophysiology, how the auditory CNS uses monaural peripheral timing information), but there are strong reasons to doubt a priori that any mechanism based on modulation-tuning per se will work (more below). We need to make the distinction between neural representations of pitch and the central neural mechanisms that analyze the representations. In order to reverse-engineer the auditory system, we need to understand 3 things: 1) what auditory functions the system performs (detections, discriminations, groupings, etc -- psychophysics) 2) the nature of the neural representations for auditory percepts (the signals that the system uses) 3) the nature of the neural computational mechanisms by which the system operates on its internal representations to realize auditory functions It makes a great deal of sense to work on all three problems simultaneously, since they are all interrelated. We need to find aspects of neural activity/information processing that resemble or can support the perceptual dimensions and distinctions that are observed psychophysically. Meddis & Hewitt's simulations study (1991, 1998) and our experimental study (Cariani & Delgutte, 1996) were directed at the nature of the early neural REPRESENTATION of pitch, and not at the central MECHANISMS by which pitch is analyzed. These studies showed that features of the global interspike interval distribution of the auditory nerve predict very well a surprisingly wide variety of pitch phenomena. Some of these are: 1) Pitch of pure tones 2) Low F0 pitch of complex tones, of both low and high harmonics 3) Invariance of pitch over a very wide range of stimulus levels 4) Invariance of pitch of complex tones with low harmonics with respect to phase spectrum and phase-dependent envelope effects for high harmonics 5) Pitch shift of inharmonic complex tones 6) Low pitch of iterated ripple noise 7) Dominance frequency region for low pitch 8) Pitch equivalence between pure tones and low pitch of complex tones 9) Low pitch of AM noise 10) Relative pitch strength (salience) These representations are "anatomically and physiologically" realistic -- we know that the information is present in the temporal discharge patterns of neurons in the auditory nerve and cochlear nucleus, and at the very least in the inputs to the inferior colliculus, and possibly higher. There is nothing at all "unrealistic" about patterns of spikes. These representations also can handle timbral distinctions associated with differences in spectra of stationary sounds (e.g. vowel formant structure, qualities associated with lower frequency resonances of vocal tracts and musical instruments). My poster at ARO dealt with how competition between intervals could potentially explain aspects of pitch masking and harmonic resolution. Frankly, I am always surprised by how quickly people are willing to discard all-order interspike interval representations on the basis of very limited evidence that is interpreted in the most shallow way. There is also a long history in the auditory field of interval representations being dismissed for a variety of flimsy reasons (e.g the old canard that auditory nerve fibers can't support periodicities above 300 Hz) and the inadequacies of straw-man models. I just wish, for once, that the same criticality could be applied to the central neural analysis of rate-place patterns (someone give me a plausible neurally-grounded account at the level of the midbrain or higher how we discriminate the pitches of 1000 Hz from 1050 Hz pure tones when levels are roved between 60 and 90 dB SPL and/or when the two tones are presented at different locations in auditory space). We also need to be more judicious in where the all-order interval models apply. The interesting ARO poster by Rebecca Watkinson and Chris Plack that was mentioned involved transient phase-shifts that reminded me a great deal of Kubovy's demonstrations of popping out by transiently phase-shifted harmonics (when they are perceptually separated from the rest of the harmonic complex, presumably they don't contribute to the F0 pitch of the complex). Definitely, our models for pitch need to take into account auditory grouping/ fusion/ object formation factors and mechanisms. I think these mechanisms must precede analysis of pitch (which, depending on where one puts pitch analysis, may or may not mean that they are low down in the pathway). In any case, these shifts are very interesting, and it remains to be seen whether incorporating phase-shift resets in the autocorrelation model (e.g. as in Patterson's strobed temporal integration processing or my recurrent timing nets) will account for the observed pitch effects. I do readily agree that no obvious neuronal autocorrelators have been found in any abundance in the auditory pathway, but it is still possible that something like an autocorrelation analysis is carried out by other means. (Langner, et al had an interesting ARO poster with an intriguing potential anatomical structure for comb filtering/autocorrelation, but the physiological evidence is still pretty scant -- too early to tell how real it is). Modulation-tuned units have been found in abundance, but there are some basic problems with these when it comes to pitch: 1) they cannot explain pitch equivalence between pure and complex tones (big, big problem) 2) they are not likely to represent multiple competing pitches in a robust fashion (e.g. two musical instruments playing notes a third apart --) 3) they are not likely to yield a representation that does not degrade at high SPLs 4) they are not likely to explain the pitch shifts of inharmonic complex tones 5) it's not clear if predicted pitches of low harmonics will be invariant with respect to phase spectrum (as they should be) I'm sure Ray will test these kinds of contingencies in his model, and we'll see how well it works. In the meantime, it would probably be best to adopt a wait-and-see attitude before throwing out all-order interval representations. We should welcome all attempts to grapple with the problem of the central use of timing information (e.g. by R. Meddis, L. Carney, S. Shamma, yourself and others, more power to you all). I myself think that the problem may lie in our tacit expectations of the nature of the central mechanisms and representations -- wouldn't life be so much simpler if there were nice, level-invariant single-neuron pitch detectors somewhere in the auditory brainstem, midbrain or thalamus? It feels like we are missing something big. The problem could involve our fixation with single neurons as the atomic level of signal processing. Maybe the system doesn't work that way, and we need to consider central representations that are based on patterns of firing (across-neuron intervals, synchronies, latencies, rates) rather than which specific neural elements are firing how much. The all-order interval distributions still do follow the psychophysics better than any of our proposed central analysis mechanisms. I can also see how one might process interval information completely in the time domain with a rich set of delays and coincidence detectors ("neural timing nets"), but there is no really obvious place where such hypothetical processing could be carried out. There is a time and place to be literal-minded and a time and place to use one's imagination. Science necessarily involves BOTH conceptual hypothesis formation and empirically-based hypothesis testing. When a problem is ill-defined and we don't understand the basic nature of the solution, then we need to use our imaginations and temporarily suspend disbelief in order to formulate and entertain new hypotheses -- Using our imaginations is the only way of "getting out of the box" when we are stuck in a rut and none of our theories work very well (and without a host of ad-hoc assumptions -- sometimes we are too clever for our own good). Once we are on the right track, then it is time to do "normal science" and "puzzle-solving" (Popper's terms)-- to fill in the gaps and do hard-nosed empirical testing of hypotheses. The question is where on that continuum between ill-defined vs. well-defined do we think the problem of the neural coding of pitch currently lies? Do we see a clear direction for the path ahead? How best should we move forward? Peter Cariani Kubovy M (1981) Concurrent-pitch segregation and the theory of indispensible attributes. In: Perceptual Organization (Kubovy M, Pomerantz JR, eds), pp 55-98. Hillsdale, NJ: Lawrence Erlbaum Assoc. Kubovy M, Jordan R (1979) Tone-segregation by phase: On the phase sensitivity of the single ear. J Acoust Soc Am 66:100-106. Peter Cariani, PhD Eaton Peabody Laboratory of Auditory Physiology Massachusetts Eye & Ear Infirmary 243 Charles St., Boston, MA 02114 USA Assistant Professor Department of Otology & Laryngology Harvard Medical School voice (617) 573-4243 fax (617) 720-4408 email peter(at)epl.meei.harvard.edu web www.cariani.com [300] > Pitch Shifts For Unresolved Complex Tones And The Implications For > Models Of > Pitch Perception > > *Rebecca Kensey Watkinson, Christopher John Plack > Department of Psychology, University of Essex, Colchester, United > Kingdom > [376] > A Model of the Physiological basis of Pitch Perception. > > *Raymond Meddis > Psychology, University of Essex, Colchester, United Kingdom > > Little is known about how pitch is processed by the auditory nervous > system. Autocorrelation models of pitch extraction have been successful > in simulating a large number of psychophysical results in this area but > there is little support for the idea that the nervous system acts as an > explicit autocorrelation device. To address this issue, this poster > presents a design for a new model of pitch perception based upon > known neural architecture and also presents some preliminary pitch > analyses using the model. The model offers a physiologically plausible > system for periodicity coding that avoids the need for long delay lines > required by autocorrelation. The system incorporates a model of the > human auditory periphery including outer/middle ear transfer > characteristics, nonlinear frequency analysis and mechanical-electrical > transduction by inner hair cells. The resulting 'auditory nerve' spike > train is used as the input to three further stages of signal processing > thought to be located in the cochlear nucleus, central nucleus and the > external cortex of the inferior colliculus, respectively. The complete > model is implemented using DSAM, a development system for auditory > modelling. The output from the system is the activity of a single array > of neurons each sensitive to different periodicities. The pattern of > activity across this array is uniquely related to the fundamental > frequency of a harmonic complex. The testing of the model is still in > its > early stages but has so far been successfully tested using a range of > harmonic stimuli and iterated ripple noise stimuli. The poster will > report > on current progress in testing and refining the model. >