Subject: Re: "pure place code" and "pure phase-locking code" From: Peter Cariani <peter(at)epl.meei.harvard.edu> Date: Mon, 23 Oct 2000 18:00:37 -0400Hi Annemarie, John, Martin (My apologies for the length of this) Great discussion! The Steinschneider paper is extremely useful because they presented their data in systematic form, so one has a fairly good sense of what the neural ensembles at the sites that they recorded from are doing. I think this highlights the utility of averaged local field potentials and current source density analysis. In interpreting their data, however, one needs to remember that their ave= raged potentials are reflections of the synchronized component of the population response. What they see is very reminiscent of what one sees in the auditory nerve if one takes responses from all CF's and sums them together to form a population PST. The fundamentals of click trains below about 150 Hz can be seen, but above this F0 frequency cochlear delays smear out the PST and one sees little temporal = structure. (Of course if one looks at the interspike intervals across the population, one will see time structure up to 5kHz, so that this temporal limit of the synchronized component of the ensemble response is not necessarily the temporal limit = of all the information available in spike trains.) If one high-pass filters the click trains or looks only in high CF regions, the F0 limit where one can see F0-related time structure in the ensemble PST increases to 300 or 400 Hz. This is due to the smaller range of cochlear delays presen= t for fibers with CF's above 2kHz. We presented some of these population PS= T's in our analysis of the Flanagan-Gutman "click rate " (buzz) pitch, see Cariani, P. A.and Delgutte, B. (1996). Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. II. Pitch shift, pitch ambiguity, phase-invariance, pitch circularity, and the dominance region for pitch. J. Neurophysiol. 76(3), 1698-1734. [I believe that the buzziness of high-pass filtered click trains has to d= o with this high degree of interneural synchrony on a population-wide scale= , and the softer tonal quality of low-pass (<1kHz) filtered trains has to d= o with the lack of interneural synchrony in the lower CF regions that are primarily activated.] Now this general situation with population PSTs in the auditory nerve is very similar to what Steinschneider et al saw in their MUA and CSD data. What it means is that 1) one should not take the upper limit of response periodicities that are seen in averaged potentials as the upper limit of timing information that is a= vailable in a neural population and 2) the LOWER limit of periodicity information available at the input layers of the awake auditory cortex is 300-400 Hz. I disagree with the interpretation that this means (in accord with the Schouten's old idea of the "residue") that temporal cues are primarily for unresolved ha= rmonics (due to the interactions of partials) that mainly occur in higher CF regi= ons (higher harmonic numbers). Yes, there is more interneural synchrony in hi= gh CF regions, but most of the temporal information in the AN lies in intraneural interval patterns rather than interneural ones. If we look at the situation in the low-frequency recording sites, Steinschneider et al presented systematically shifted partials (so one sees how well individual partials can be distinguished on the basis of activation patterns). My recollection is that F0's needed to be on the order of 300 Hz for an 800 Hz BF recording site in order for there to be separation of adjacent harmonics. This is also very similar to what we saw in the auditory nerve when we looked at rate-profiles for vowels with F0's of 150 and 350 Hz (no resolution of harmonics with 150 Hz spacings, but a crude resolution of harmonics with 350 Hz spacings): Hirahara, T. et al. (1996). Representation of low-frequency vowel formants in the auditory nerve. In Proceedings European Speech Communication Association (ESCA) Research Workshop on The Auditory Basis of Speech Perception, Keele University, United Kingdom, July 15 - 19, 1996 (pp. 4). The acid test for a neural representation is how well one can predict the percept from the neural data. In my opinion, the two biggest problems tha= t we have in the auditory system are 1) to account for the precision of perceptual discriminations (on the order of fractions of a percent of frequency in the case of pitch perception, on the order of tens of usecs in ITD in the case of binaural localization and 2) to simultaneously account for the extremely robust nature of these pre= cisions It's hard to tell whether even for 300 Hz harmonic separations whether one could predict the frequencies of the partials with anything like the requisite = precisions (or even within an order of magnitude) by using only the spatial activation profiles. Of course, one can never rule out the possibility that some other local group of neurons has better information (there is always more information, be it rate or temporal, in the individual spike trains than in averaged population responses). So their findings don't give a great deal of support to the notion of a spectral pattern representation in the auditory cortex, but, as usual, nothing can thus far be ruled out ent= irely. On the issue of a time-to-place transformation in the midbrain, the critical issues have to do with whether the observed MTF's can actually support a neural = representation of pitch that is both sufficiently precise and robust. Unless one wants to postulate various complicated ad-hoc mechanisms that subserve pitch perception at different sound pressure levels, one wants a single, unified mechanism that does it all in seamless fashio= n. We want to see a neural representation that is capable of representing periodicities to within a precision of less than 1 percent over an SPL range of 40-100+ dB SPL. Suresh Krishna's recent excellent paper bears directly on these issues (as does earlier work by Palmer, Rees, and Moller: Krishna, B. S. and Semple, M. N. Auditory temporal processing: responses to sinusoidally amplitude-modulated tones in the inferior colliculus. Journal of Neurophysiology Jul 2000; 84(1) 255-273 As I interpret their data, IC MTF's are relatively broad to begin with an= d tend to broaden further at higher intensities. THis would not be so probl= ematic if we also saw more highly tuned MTF's in higher stations, but their conspicuous absence generates a great deal of cognitive dissonance in my mind, this discrepancy of about 2 orders of magnitude between MTF tunings and the precisions of the percepts. The MTF's look to me to be consequences of recovery charact= eristics of neurons (taking into account relative balances of excitation and inhibition) rather than dedicated "pitch detectors". Spontaneous rates and best MTF's decline together as one ascends the pathway. The other difficulty is that MTF's are not the right way of encoding pitch. The pitches produced by harmonics below about 1.5 kHz follow an autocorrelation-like pattern (e.g. de Boer's rule) rather than one based on waveform envelope or adjacent peaks in filtered waveforms. In interval terms, the low pitches produced by these complexes follow all-order intervals rather than first-order intervals (which, like MTF's are associated with renewal processes rather than correlations). IF one intersperses random clicks in between the regular clicks of an isochronous click train composed of harmonics < 1.5 kHz,, one still hears the pitch of the click train. It seems that this basic observation is inconsistent with an MTF-based representation of pitch. For higher-frequency stimuli, above 2kHz, pitches are affected by envelopes and such masking does occur and is quite striking (see Kaernbach, C. and Demany, L. (1998). "Psychophysical evidence against the autocorrelation theory of auditory temporal processing." J. Acoust. Soc. Am., 104, 2298-2306.) The psychophysics of pitch resembles an autocorrelation-based pattern for lower harmonics (some would argue lower harmonic numbers or both), while it resembles an envelope-based pattern analysis for higher ones. A population-interval representation of pitch is consistent with this picture if temporal discharge patterns of auditory nerve fibers reflect individual partials for low harmonics and interactions of partials for high ones. But by far the stronger pitches are those produced by lower harmonics. In other words, to account for pitch shifts of inharmonic complex tones and the relative transparency of pitch representations (we can hear two pitches when we listen to double vowels with different F0's -- they don't obliterate each other), we need something more like an autocorrelator rather than an envelope or MTF-based analyzer. This would mean either comb filter rate tunings or all-order intervals. All-order intervals related to pitch are everywhere, and comb filter rate tunings are almost nowhere to be found. The big, big advantage of the intervals is that they faithfully and precisely represent stimulus periodicities at all relevant sound intensities. Converting to a place code in mibrain re-introduces all of the problems and complexities of traditional rate-place codes, albeit at a higher station (of course, we can always pass the gnarly processing buck up to some more central, more omniscent processors......). I have just one point about binaural pitches. John Culling wrote: >It depends a little on your theoretical position about what auditory > processing gives rise to dichotic pitches, but, if you believe that the= y > are produced by a mechanism that detects interaural decorrelation (or > more precisely "incoherence"), then they are purely spectral pitches. > They only occur below about 1500 Hz, because (in part) the process of > analysing the correlation is dependent upon phase locking. The result > of this analysis, however, is a channel-by-channel coherence measuremen= t. > So, a "purely spectral" pitch may be achieved by, for instance, replaci= ng > one sub-band of a diotic noise at one ear with an independently generat= ed > band of noise. The result is a noise which is diotic at most frequencie= s, > but uncorrelated in one sub-band. The stimulus at each ear is (and soun= ds > like) white noise. When both earphones are used, however, a distinct > whistling sound is heard above this noise. The pitch of this whistling > sound corresponds with the centre-frequency of the manipulated band. Of course, the binaural temporal cancellation models (yours and Alain de = Cheveigne's) best account for these decorrelation-based binaurally-created pitches. And the usual assumption, following Jeffress, is that the binaural correlator use= s coincidence counters that integrate the rates of coincidence detections, and that then the result is read out in a rate-place profile, a central spectrum. But the output of the coincidence detectors also has time structure that is related to the pitch. If one runs these kinds of stimuli through a filterbank and through a binaural coincidence net (e.g. Huggins pitch, Bilsen multiple phase delay pitch, other interaurally decorrelated signals, I did this several years ago), and one looks at the summary autocorrelation of the output of the binaural temporal cross-correlator, one finds that there are dips in the interval distribution at the pitch period and its multiples. I constructed some monaural stimuli with flat autocorrelations except for one dip at tau0, and these also create noisy pitches (odd that they are) at tau0; This led me to believe that the auditory system has the means of an= alyzing temporal correlations both positive and negative. (There exist arrays of spatial binocular anticorrelation units in the visual system). A temporal anticoincidence process would yield positive peaks at the pitch period and its multiples. A central autocorrelation analysis could = therefore also potentially account for these pitches if there exist anticoincidence detectors in the pathway (EI units). As much as I like your cancellation models, they don't exhaust the realm of possibilities. I think therefore that we can't say it MUST be spectral because the only models that come immediately to mind are of the Jeffress-type time-to-place networks. The arguments for spectral pattern analysis that are based on the existence of binaural pitches are IMHO overstated, they rely more on conventional assumptions rather than any kind of logical necessity. (The older arguments along these lines erected a false dichotomy between temporal models for pitch where harmonics needed = to interact in the same cochlea,( e.g. "residue" temporal models) vs. spectral pattern models. The existence of binaurally created pitches rules out temporal interactions between harmonics as the only mechanism, but it does not rule out in any way other kinds of temporal mechanisms that rely on summation of intervals rather than interacting harmonics.) -- Peter Cariani Martin Braun wrote: > > Annemarie wrote: > > "I had in mind the study by Steinschneider et al., JASA, 104 (5), 1998, > 2935 ff. who found that at the level of the primary auditory cortex > phase locked responses occurred only at sites with high best frequencie= s > up to about 200 Hz (stimuli: alternating polarity click trains), > ............ > > Does that mean that the temporal code might not play a role at all in > the low frequency channels or is it more likely that phase locking had > been transformed into a rate-place code before the A1 (perhaps in the > midbrain)?..........." > > Answers: > > 1) As soon as a harmonic is resolved in the cochlea, spectral coding ta= kes > place and then runs along the complete auditory pathway. > > 2) If the spectral information is poor in the cochlea, as with click > stimuli, it is also poor anywhere else in the auditory system. > > 3) Current evidence indicates that f0 in the main speech and music rang= e is > transcoded from a temporal to a place code in the central nucleus of th= e > inferior colliculus (ICC). In other words, time-locking in this f0 rang= e > disappears above the ICC, and the extracted f0-pitch is coded at its > frequency place by discharge rate, as most other information that is > transported into and around the cortex. (See references below) > > 4) Phase-locking to acoustic frequencies recorded in the cortex possibl= y is > not related to pitch extraction at all. It may be a by-product of other > functions of the auditory system, e.g. orientation in space. > > In conclusion: > > A) In the cortex, f0-pitch in the main speech and music range is coded > purely spectrally. (No phase-locking in pitch coding) > > B) Up to the ICC, f0-pitch in the main speech and music range can be co= ded > purely temporally, but for all natural, i.e. non-laboratory, complex to= nes > it is coded spectrally and temporally. (Phase-locking necessary for pit= ch > coding) > > Langner, G., 1992. Periodicity coding in the auditory system. Hear. Res= . 60, > 115-142. > > Schreiner, C.E., Langner, G., 1997. Laminar fine structure of frequency > organization in auditory midbrain. Nature 388, 383-386. > > Langner, G., Schreiner, C.E., Biebel, U.W., 1998. Functional implicatio= ns of > frequency and periodicity coding in auditory midbrain. In: Palmer, A.R.= , > Rees, A., Summerfield, A.Q., Meddis, R. (Eds.), Psychophysical and > Physiological Advances in Hearing. Whurr, London, pp. 277-285. > > Braun, M., 1999. Auditory midbrain laminar structure appears adapted to= f0 > extraction: further evidence and implications of the double critical > bandwidth. Hear. Res. 129, 71-82. > > Braun, M., 2000. Inferior colliculus as candidate for pitch extraction: > multiple support from statistics of bilateral spontaneous otoacoustic > emissions. Hear. Res. 145, 130-140. > > Martin > > Martin Braun > Neuroscience of Music > Gansbyn 14 > S-671 95 Kl=E4ssbol > Sweden > nombraun(at)post.netlink.se