Re: CBW, phase deafness, etc. (Peter Cariani )


Subject: Re: CBW, phase deafness, etc.
From:    Peter Cariani  <peter(at)epl.meei.harvard.edu>
Date:    Thu, 13 Jul 2000 20:09:30 -0400

Dear Eckard, I had imagined you, like Martin Luther, nailing your auditory theses on the auditorium door, but little did I anticipate your moral wrath! (I'd be interested in seeing the full set of theses). I'm not sure I understand your moral/existential question or your parable. Perhaps it would be more direct to simply state your beliefs and values and then give whatever objections you have to particular research being done. I am fully prepared to justify my work and the work of many others both on scientific and moral grounds, if need be. The situation is far from your allegory of an evil physician (or Nazi doctors for that matter) sacrificing children, and is more commensurate with the moral questions involved with eating meat (am I to assume that you are a vegetarian?). But perhaps one should not ask a question unless one really wants an answer. I'm not clear on your "late understanding" of the (nature of?) the data -- were you unaware that these pictures of the neural activity in the auditory nerve were compiled from experimental data? I am also unclear as to the nature of your question concerning variation of the phase of the fundamental -- in our experiments stimuli were numerically synthesized and delivered via a D/A converter that also sent a sync pulse to an event timer, so the stimuli are always the same in every (measurable) respect from presentation to presentation. The data are based on many repetitions, but the discharges of the fibers are describable via the interaction of stimulus-driven deterministic processes and stochastic ones, such that the PST histogram is a fair representation of the probability of a fiber's firing at a given time relative to the onset of the stimulus. As far as we know, the individual discharges of different auditory nerve fibers innervating different hair cells (e.g. with different CF's) occur independently of one another, once one factors out the account the common, stimulus-driven process. (As far as I know no one has seen intrinsic inter-fiber spike correlations, e.g. in which in the absence of a stimulus, a "spontaneous" discharge in fiber A increases the probability of a discharge in fiber B at some later time). In each fiber, there are short-range correlations that are due to recovery from the last action potential, and some very weak, long range correlations that are seen in pulse-count distributions (>>100 ms). If this independence assumption is largely correct, as it appears to be, then the ensemble of PSTs is also a fair picture of what is occuring in the auditory nerve as a function of time. It is by far the most detailed and precise account we have of the information that the brain receives when an acoustic stimulus is presented to the ear. Having such a picture of what is going on in the auditory nerve is indispensible if we are to understand how we hear, what aspects of neural activity translate into auditory percepts, and how to design prosthetic devices that improve and restore hearing function. More of these neurograms need to be constructed, displayed, and pondered. Where do you come up with this assumption of phase-drift, if I understand you correctly? The fibers faithfully register the phase structure of the stimulus as it is presented to them after cochlear filtering. The fine temporal structure of the stimulus is impressed on the fine structure of the neural discharges, taking into account cochlear filtering and the limits of phase-locking. The phase information is there, and there are some situations where phase transients can alter which patterns fuse together. Regarding autocorrelation, which is not an "easy" or intuitive tool for many (would that it were so!; it would be so much easier, believe me, to go along with the spectrographic, frequency-domain perspective, and you have pointed out some of the deficiencies in that worldview), there are a number of points that need to be kept in mind. As far as objections to population-interval representations go, the conclusions of Kaernbach and Demany should not be taken at face value, and certainly not in their entirety. It's an interesting and worthwhile paper, but one should read it critically. 1) The "autocorrelation" model that they knocked down was not neural model; it was a simple autocorrelation of the stimulus. Their model did not compute a summary autocorrelation over all CFs, it did not take into account the cochlear tuning, the broad, asymmetric nature of the tails of tuning curves, decline of phase locking with frequency, nor spontaneous activity. The population-interval representations that we estimated from neural data and Meddis and Hewitt estimated from their computer simulations take into account all of these factors, some of which can play a significant role when particular stimuli are considered. The model that they knocked down was not the model that anybody holds literally. It is true that the population-interval distributions resemble stimulus autocorrelation functions in many respects, especially for stimulus with components below 2 kHz, so I call them "autocorrelation-like" representations but there are still differences between the autocorrelation function and the interval-based representation (see below). 2) The stimuli were harmonic complexes whose harmonics (F0=3D100 Hz) were all above 5 kHz and were mixed with low-pass noise. The pitches produced were weak without the intervening clicks. I made some high-pass click trains with intervening clicks but without the LP noise, and the intervening clicks do effectively mask the 100 Hz pitch of the HP click train. Without the noise, the difference is like night and day. This is a very valuable perceptual observation that K & D have made, that intervening clicks mask out the pitch of the isochronous train if these are high pass clicks. I also made click trains with harmonics of less than 2 kHz, and in this case the intervening clicks do NOT effectively mask the 100 Hz train. A simple interpretation is that for the high-frequency harmonics, there is a representation of the waveform envelope (based on first-order intervals and modulation analysis), while for low frequencies, the representation looks more like an autocorrelation (intervening clicks don't disrupt the periodic pattern). K & D's demonstrations and conclusions, right or wrong, apply to these pitches produced by high harmonics, not to low ones (which yield the strongest pitches and by far are most important for understanding speech and music). 3. K & D assumed that each of their clicks would give rise to a spike in an auditory nerve fiber. It turns out that this may be an incorrect assumption. I observed the responses of a few high CF auditory nerve fibers to such stimuli. The fibers show plenty of 10 msec all-order intervals when there are no intervening clicks, but do not show prominent interval peaks at 10 msec when there are intervening clicks. I believe that this is because when the intervening click comes just before one in the isochronous pattern, large numbers of high CF fibers reliably fire and are in refraction for the subsequent click. As a result, the all-order interval distribution that is produced by such stimuli is not what K & D supposed, and in this case, the all-order interval distribution seems to follow the psychophysics. 4. So, what do we have here? The population-interval models assume that some kind of analysis is performed on the population-interval distribution that is the product of many prior processes (e.g. cochlear filtering, transduction, synatptic, spike initiation, and possibly even the effects of efferents). For high-frequency harmonics, all-order interval distributions reflect the shapes of envelopes rather than the fine structure of the stimulus (as they do for low-frequency harmonics). This representation takes into account these differences, and thus provides a "unified" explanation for pitches produced (or masked) by both low and high harmonics. K & D presented an interesting, provocative, and useful demonstration, but (I think) their interpretation had faults. It's been somewhat of a surprise to me how fast and easily people have taken their (in my opinion, overdrawn) conclusions at face value. 5. In short, lower frequency hearing has more autocorrelation-like qualities (intervening clicks don't mask much; fine structure not envelopes matters, phase is largely irrelevant for pitch and timbre), while high frequency hearing has more modulation-like qualities (intervening clicks mask, envelope matters, phase can change envelope shape and modify pitch). High frequency hearing looks a great deal like what we visualize the situation in the electrically stimulated nerve to be: many fibers firing at initial wavefronts and being together in refraction for subsequent ones). The autocorrelation-like character of low-frequency hearing calls periodicity representations based on modulation tuning into question -- intervening clicks, phase manipulations, and inharmonic tunings that would be expected to disrupt representations based on first-order intervals or modulation-tuned units (I'd appreciate counterarguments here, perhaps I am mistaken). 6. There are interesting questions that concern how pitches created by psychophysically resolved and unresolved harmonics relate to the different means of generating interspike intervals (by means of envelopes produced by interacting harmonics, by means of phase-locking to the individual harmonics themselves). I have the impression that many psychophysicists tacitly associate resolved harmonics with spectral pattern mechanisms and unresolved harmonics with temporal ones (following Schouten, perhaps, but not Licklider). There is a natural way of making this distinction in interval-based theories: between intervals that are produced by individual harmonics and those that are produced by interacting harmonics. As one increases in absolute frequency above 2 kHz, phase-locking declines and intervals associated with envelopes dominate. Likewise, as harmonic numbers increase, harmonic spacings become smaller relative to tunings, their interactions prevail and envelopes dominate. So, there can be a linkage between psychophysically--resolved/unresolved harmonics and different modes of generating all-order interspike intervals. Whether one wants to call these "two mechanisms", or rather the consequence of a "unified representation" depends on one's perspective. I do believe that the auditory system has a unified, general purpose, phylogenetically-primitive means of representing sounds, be it some kind of central spectrum or central autocorrelation or central periodicity maps. Just as there is an anatomical bauplan, there may be a neurocomputational bauplan -- basic strategies for representing and processing information. It is easy to give up on looking for underlying order, harder to actually find it. And it is always tempting to proliferate special-purpose mechanisms for this or that little function, and to pass the integration and coordination buck upwards to omniscent central processors somewhere in the cortex. -- Peter Cariani Autocorrelation and population-interval distributions: similarities and d= ifferences First, the population-interval representations which I am discussing are "autocorrelation-like" in many respects, but they are not identical in all respects to the autocorrelation function, being the product of cochlear and neural processes. The ways in which they resemble stimulus autocorrelations lie in the positions of major and minor interval peaks. The ways in which they differ have to do with neural absolute and relative refractory periods (no intervals less than 700 usec), cochlear filtering and nonlinearities, and half-wave rectification (no negative amplitudes). Many of the nonlinearities in cochlear and neural processes manifest themselves in altering the relative sizes of interval peaks (but not their positions), and in some cases the introduction of additional (small) interval peaks associated with cochlear distortion products (at 1/(2f1- f2)). A representation system like this is very well suited for estimation of frequency/periodicity over a wide dynamic range -- the nonlinearities do not affect the positions of the interval peaks on which those estimates are based. Thus whether and how cochlear nonlinearities matter for some perceptual function depends crucially on the nature of the neural representation involved in that function. Eckard Blumschein wrote: > > Dear Peter Cariani and List, > > Do we live up to our responsibility? I remember of the reason why a > physician committed an incredible crime. He performed deadly experiment= s > with children just because he intended to become a professor. Even more > tragically, the girls and boys were sacrificed for nothing. The > unscrupulous experiments were based on wrong assumptions. Nonetheless, = the > doctor falsified his identity and managed getting recognized for a whil= e. > Cats are quite different from humans. However, I am not sure whether or= not > I myself might sometimes be to blame for carelessness that could cost > further lives of animals. > > In particular, I asked for more data concerning "block-voting". > Fortunately, Peter Cariani outed himself. I have to apologize not just = for > not mentioning him but also for late understanding the data by Miller a= nd > Sachs, and, of course, the similar ones by Delgutte et al., too. Maybe,= I > am just not aware of awareness of others concerning some consequences o= f > two peculiarities. My first suspicion has proven correct. The figures b= y > Secker-Walker and Searle or by Shamma are somewhat misleading since the= y > are based on many repetitions but possibly suggest a snapshot. My secon= d > suspicion is, phase of fundamental might have varied each time. In > principle, it would be possible to check this by means of a synchronize= d > stimulus. Referring to my initial remark, I would not consider this > necessary. > > I see a lot of consequences. CBW or more naturally speaking the width o= f > neural tuning curves depends on variance of phase and approximately amo= unts > half a period (i.e. 1/2CF) for periods below refractory time, even if > frequency resolution at inner hair cells (notice, I am avoiding the ter= m > basilar membrane) might be much higher. Deafness against phase also bec= omes > understandable, etc. > > Finally for this time, I would like to briefly take issue against > application of autocorrelation function. I know, this easy tool was fav= ored > not just by Peter Cariani. Possibly we both can agree. I realized him > writing autocorrelation-like representations and operations. What about= me, > I go along with Kaernbach/Demany who provided psychoacoustical evidence > against autocorrelation theories in JASA (1999), more strictly speaking > against perception of all-order inter click interval (ICI). Please forg= ive > me my heretical mistrust in general suitability of any available > mathematical tool in case of hearing. Since M=FCller (1838), I see all > efforts doomed to failure, so far. Instead, I imagine the neurons to > preferably detect coincidence of lowest order. For instance, a first or= der > ICI dominates over any second or higher order ICI. A key to many keys m= ight > hopefully be my suggestion that tonal perception is based on zero order > ICIs. I uttered this idea for the first time this year in Oldenburg aft= er I > got aware that atonal perception across all CFs starts to become gradua= lly > amenable as soon as period exceeds refractory time. In that case, the > normally dominating zero order tonotopic intervals are presumably getti= ng > increasingly corrupted. > > Sincerely, > Eckard Blumschein


This message came from the mail archive
http://www.auditory.org/postings/2000/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University