Re: Autocorrelation (Peter Cariani )


Subject: Re: Autocorrelation
From:    Peter Cariani  <peter(at)epl.meei.harvard.edu>
Date:    Fri, 14 Jul 2000 15:00:39 -0400

Christian Kaernbach wrote: > Peter Cariani wrote: > > 1) The "autocorrelation" model that they knocked down was not neural > > model; > > It would be difficult to knock down all possible neural autocorrelation > models. Our experiment gave a hint that there might be a problem with > higher-order regularities which are seen by autocorrelation, but not by > perception. I did not see a contribution up to now where somebody showe= d > that neural autocorrelation would produce this asymmetry between first- > and higher-order intervals. We supplied some modelers with our stimuli, > and they could not make their models produce this asymmetry. I'm not aware of the results of the other simulations, and I was commenting on the model your original paper tested, which was not a summary autocorrelation or population-interval model, but examined the autocorrelation only of the upper partials, and without the LP noise. If you have cochlear filtering and low-pass filtering that produces a decline in phase-locking at frequencies of 2 kHz and above, then the temporal structure of the ANF discharges follows the envelope of higher-frequency, psychophysically-unresolved harmonics. At about 2 kHz and fundamentals of 200 Hz, the pitches produced are sensitive to phase spectrum because they are based on envelope shape, which is sensitive to phase.I think the balance between interval representation of individual partials and envelopes depends both on absolute frequency (via declining strength of phase-locking) and harmonic number (via harmonic spacing becoming a smaller fraction of filter bandwidths). Spike precedence may also play a role. If the models in question neglect the broad tails of tuning curves and/or spike precedence/recovery effects arising from high degrees of stimulus-driven interneural synchrony (as one sees for CF's > 2 kHz), then these kinds of effects might not be produced. > > 2) The stimuli were harmonic complexes whose harmonics (F0=3D100 Hz) > > were all above 5 kHz > > In our new submitted paper (you should know of it) we are down to 2 kHz= , > and the asymmetry is just the same. It is a real problem to go down muc= h > lower, as one should exclude frequencies lower than 15 times the > fundamental. You did not specify whether in your below-2-kHz click > trains resolvable harmonics were excluded (and masked for distortion > products). In our understanding, it is not the frequency region but the > resolvability that counts. I tried many combinations of low-pass click trains with present and missing fundamentals, although I did not in either the low-pass or high-pass case introduce low-pass noise. So the click trains that I made, both the low-pass and high-pass produced strong, definite pitches, and I must say that the masking effect when HP clicks are interspersed in the HP case is striking. I do agree with you that psychophysical resolvability and these masking effects are coextensive with each other, but I think we disagree mostly on what we believe to be the nature of the mechanisms underlying whether harmonics are psychophysically resolvable. The question is really, what determines the resolvability of harmonics, and this depends on the nature of the neural representations that one supposes that the auditory system uses for frequency and periodicity. Your paper and some other earlier ones by Carlyon and others prompted me to rethink some of these issues. While it is often not stated explicitly, the underlying assumption for the hypothesis that there are two neural mechanisms for processing resolved and unresolved harmonics seems to be that pitches produced by resolved harmonics are the product of a neural spectral pattern analysis, while those produced by unresolved harmonics are the product of temporal discharge patterns that are ultimately caused by the failure of filters to resolve the individual partials. A deeper assumption is that the pitches of pure tones are represented in some sort of rate-place map (possibly with sharpening and cleaning up from synchrony, lateral inhibition, efferents, and top-down selection). So then one thinks that the two patterns of perceptual judgements, pitches of resolved and unresolved harmonics, are due to the operation of qualitatively different kinds of neural mechanisms. Is this a fair statement of your overall views? My difficulty with the resolvability/unresolvability distinction has been mainly with the tacit physiological assumptions that seem to accompany it. When one looks at auditory nerve responses for stimuli presented at moderate (60 dB SPL) levels or higher, one sees very little resolution of individual harmonics in rate-place profiles for anything but the first few harmonics (low harmonic numbers << 15). From the point of view of rate-place profiles, almost everything looks like it should be unresolved by the system. The alternative general hypothesis is that frequency representation is based not on spatial profiles of filter activation, but by an analysis of the temporal patterns that are produced by the filters. The different tunings are then not the vehicle for fine frequency discrimination, but the means by which multiple frequencies/periodicities/auditory objects can be simultaneously represented (which again may be why one can hear speech pretty well with 4-6 channels in quiet, but everything goes to pot if there is noise or competing sound). The filters still matter, but in a different way. When one looks at the intervals produced in the auditory nerve, one gets a much finer and robust picture that resembles much better the psychophysics of pitch perception, as Wever, Siebert, Goldstein, Moore and many others have demonstrated over many decades. I believe that population-interval representations can handle these observations. Models for the pitches produced by complex tones have been proposed, but these can also be applied to pure tones and the hearing out of individual partials in complex tones. This comes out of thinking about the multiplicity of pitches that are heard when harmonic complexes are presented. While our first approaches to estimating pitches from population-interval distributions involved (essentially) peak-picking, which is simplest to explain, a better approach is to examine which sets of regular interval patterns are seen in the population-interval distribution, i.e. that would resemble those patterns that would normally be produced by a single partial at particular frequencies. If one looks for all different patterns and the relative numbers of intervals participating in those patterns, then one has a means of estimating which pitches will be heard, both low pitches of the complex and pitches of partials (one can compute the correlation between the population-interval distribution and all possible partial patterns.) This is a temporal analog of Parncutt's frequency domain model of pitch multiplicity. The more harmonics, the less salient are the pitches of partials relative to the low pitch. All other factors being equal, the more intervals are produced by phase-locking to individual partials (which depends on phase-locking), the better their representation in the population-interval distribution and the greater their resolvability. Filter bandwidths and harmonic numbers also come into play. In this kind of representation, mistuned harmonics stand out, while tuned harmonics tend to fuse together. There is also a way of dealing with partial loudnesses and mutual masking in terms of the relative fractions of intervals that are related to the respective patterns and how one pattern reduces the fraction of the other. I have gone on way too long here already (I ask for your forebearance) -- the ultimate point is that resolved harmonics need not be associated with spectral pattern mechanisms, but that interval-based mechanisms can also be envisioned. I am working on such models. We should not adopt spectral pattern mechanisms by default or by custom or because we have no reasonable alternatives. > > 3. K & D assumed that each of their clicks would give rise to a spike= in an auditory nerve fiber. > > Not precisely so. We only assumed that _most_ of the inter-spike interv= als on the auditory nerve would correspond to inter-click > (inter-stimulus) intervals. But this assumption is not crucial to our a= rgument. Please let me cite from our General Discussion section: > > It is plausible that the =91=91final=92=92 temporal structure contr= ibuting to pitch sensations (either directly or after a conversion > into a place code) does not occur in the auditory nerve but at a hi= gher location in the auditory system. We believe that at this stage the = ISIs > that matter are first-order ISIs. However, the consecutive spikes= bounding these ISIs may originate from nonconsecutive spikes at the aud= itory > nerve level. The problems with first-order ISIs at any level of the system (and with modulation detectors to analyze them) have to do with 1) the rate (and hence, level) dependent nature of the distributions -- high spike rates eliminate longer intervals, 2) problems of explaining the imperviousness of the system to disruption by intervening transients, and 3) pitches of resolved inharmonic complex tones (Schouten & De Boer). Since you shift all processing of resolved harmonics over to a spectral pattern mechanism, 2) and 3) don't apply, but 1) still does. (What bothers me most is that this partitioning of mechanisms is never justified explicitl= y). It seems to me that for these envelope-generated periodicities, first-order and all-order interval distributions yield similar pitch estimates. In any case, even if there were differences and the psychophysics followed the first-order ISI, this would in no way invalidate the population-interval models for (the much more important) psychophysically-resolved harmonics. It also provides no positive evidence that any kind of harmonic analysis of spectral pattern is being carried out by the auditory system. > > 5. In short, lower frequency hearing has more autocorrelation-like > > qualities (intervening clicks don't mask much; ...), > > while high frequency hearing has more modulation-like qualities > > (intervening clicks mask, ...). > > Again, IMHO it is resolvability that counts. When I asked at the ASA me= eting in Berlin what could be a generally accepted boundary between > low- and high-frequency regions I was pointed to the 4 kHz boundary whe= re neural phase locking ends. With our new publication we are well > beyond this limit (i.e. intervening clicks mask in stimuli starting at = 2 kHz). On the other hand, it is a very simple demonstration that > intervening clicks don't mask in the high-frequency region AS LONG AS T= HERE ARE RESOLVABLE HARMONICS. I'm sorry I wasn't there in Berlin. What counts as a low- or high frequency region depends on what you are interested in, coding of carriers and fine structure or coding of envelopes. Phase-locking probably does not end abruptly at 4 kHz, and it declines rapidly above 2 kHz. In addition, because of cochlear lags, inter-CF synchrony to modulated stimuli is very high for CF's above 2 kHz (see the PST neurograms in J. Neurophysiol. 76:3:117-34.), and this I think sets up spike precedence effects. Whether harmonics will be psychophysically resolved (in my terms whether temporal patterns related to envelopes vs. fine structure will dominate) I think will depend both on absolute frequency and harmonic number. I agree with you wholeheartedly that psychophysical resolvability counts and that finding its neural correlates is a highly worthwhile project. --Peter Cariani


This message came from the mail archive
http://www.auditory.org/postings/2000/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University