Re: Get lost, Mr. Cochlea!! --- The Brain (Jont Allen ) (Ramdas Kumaresan )

Subject: Re: Get lost, Mr. Cochlea!! --- The Brain (Jont Allen ) From: Ramdas Kumaresan <kumar(at)ELE.URI.EDU> Date: Wed, 7 Mar 2001 09:45:52 -0500 Hi Jont and List people, >Jont Allen Wrote: >You got me there. I was investigating how far one could go with zero-crossing. >I concluded (not in that paper) that it is hard to represent intensity (loudness) >with such models. This is mathematically well quantified by Ben Logan's theory >of reconstruction of signals from their zero crossings. As I understand his theory, >when you can do it, you loose the scale factor information. I would guess that the >same is true of LSPs. Gitza's multi-level crossing is an attempt to get around >this problem I believe, somewhat inspired by the distribution of thresholds in >auditory nerve fibers. We now believe, as first proposed by fletcher, that the >loudness is coded by the overall rate of firing. However this is unlikely to be >a simple one to one code. Namely loudness is not just a measure of the total rate. We have significantly extended Ben Logan's theory and made it useful. This is what is new in our approach. Logan and Voelcker tried to use a signal's zero-crossings to represent the signal. This is impossible except in special cases. Contrary to this we show that by adaptively processing a signal, its envelope and phase (and hence implicitly the signal) can be represented by zero-crossings. My conjecture is that this is the adaptive processing that cochlea perfoms, however outrageous it may sound! According to our proposal, intensity is not just coded by the rate of firing. Ramdas Kumaresan xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Jont Allen wrote: > Ramdas, > I am sorry but I have not seen your response until just now. I suspect > the date on your computer is off, and my window was set too small > to see your message pop up. > > Ramdas Kumaresan wrote: > > > Dear Jont: > > > > Jont Allen wrote in > > http://sound.media.mit.edu/dpwe-bin/mhmessage.cgi/AUDITORY/postings/2001/135 > > > > >The ear IS similar to a floating point converter. The ear does not have > > > > an infinite > > >dynamic range or signal to noise ratio. This limited dynamic range > > >shows up as masking. Do you disagree? > > > > I don't know, but masking of a weak signal due to an intense signal in > > its neighbourhood, is it entirely due to what happens in the periphery? > > (We know about asymmetry, spreading and shifting of excitation to higher > > > > frequencies.) > > What if the periphery still accurately (to the extent it is > > allowable by timing jitter etc) represents the weak and intense signal > > combo and the higher centers ignore the weak component, say, because > > there is much more precise phase locking to the intense signal. > > I am not too hot on the trail in masking. > > Is it established that the information loss (masking) is entirely due to > > > > the > > periphery? > > As I understand it, so called "suppressive masking" (which I dont view as real masking, > but that gets me in lots of hot water with several people) happens in the cochlea. > This is the same a "two tone suppression" and the so-called "upward spread of masking." > This is caused by compression within the cochlea, due to the outer hair cells. > > The other type of masking is due to neural noise. This is the component that causes true > masking, in the sense of the floating point converter that I mentioned above. > This does not occur in the cochlea, but in the auditory nerve. The point process > introduces > noise. > > > Jont Allen wrote: > > >The auditory nerve signal is not about zero crossing. Even zero crossing > > >are not exact, and would have jitter. But masking is NOT timing jitter. > > > > We thought the classical theory of a neuron firing says that if the > > membrane potential exceeds a threshold then it fires. If so, then it IS > > some form of zero or level crossing detector. It is a question of how > > the cochlear mechanics transforms the signal and presents it to the > > neuron/haircell. > > If you make a histogram of the time of the spike relative to the signal, you > would see that the spike does not code zero crossings, rather it codes > a half wave rectified version of the signal. Personally I would not call > this a zero crossing detector any more than a half-wave rectifier is a > zero crossing detector. One important difference is that the half-wave signal > has intensity information encoded in it. > > > Zero-crossings, as descriptors of a signal, have acquired > > an undeserved bad reputation. As we have pointed out in our > > original post, the zero-crossings of a STIMULUS SIGNAL, > > themselves are NOT of much use. > > But there are ways to carry reliably in zero-crossings (of other > > related signals) > > information about the temporal envelope and phase of a stimulus signal, > > thereby implicitly, but completely representing a signal. > > This is our Main point. > > Those familiar with speech signal processing know > > about what is called Line-Spectrum-Frequencies (LSFs) > > originally proposed by Fumitada Itakura, which represent > > the spectral envelope of a signal. These LSFs are used reliably and > > successfully in speech coding, recognition etc. These are > > indeed 'zero-crossings' that represent the > > spectral envelope, except that these zero-crossings occur along > > the frequency axis, instead of time axis. Thus, there is already > > evidence > > albeit in the other (frequency) domain that these > > the zero-crossings are reliable. > > > > On a lighter note, I asked Yadong Wang (my grad student), two years > > ago, to take a look at zero-crossings after reading your 1985 paper > > in which you seemed to be saying that the auditory nerve signal IS based > > > > on zero-crossings. (Jont B.Allen, "Cochlear Modeling", IEEE Acoustics, > > Speech and Signal Processing Magazine,January 1985, p.3-28.) Refer to > > Figure 25 > > and Figure 26 in this paper. Quoting from captions of Figure 25: > > "Based on the model of the haircell, we assume here that the information > > > > is carried by the zero-crossings of the multitudinous narrow band > > signals. This is because the hair cell cilia appear to act as a switch, > > given moderate and high level signals, transforming the signals > > to peak-clipped signal. In an infinitely peak-clipped signal the > > the information is coded by the zero-crossings..." > > It is heart breaking to see that you would abandon zero-crossings and us > > > > midstream. > > You got me there. I was investigating how far one could go with zero-crossing. > I concluded (not in that paper) that it is hard to represent intensity (loudness) > with such models. This is mathematically well quantified by Ben Logan's theory > of reconstruction of signals from their zero crossings. As I understand his theory, > when you can do it, you loose the scale factor information. I would guess that the > same is true of LSPs. Gitza's multi-level crossing is an attempt to get around > this problem I believe, somewhat inspired by the distribution of thresholds in > auditory nerve fibers. We now believe, as first proposed by fletcher, that the > loudness is coded by the overall rate of firing. However this is unlikely to be > a simple one to one code. Namely loudness is not just a measure of the total rate. > > > Rmadas Kumaresan > > Yadong Wang > > Jont > > > xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > > > Subject: Re: Get lost, Mr. Cochlea!! --- The Brain > > From: Jont Allen <jba(at)RESEARCH.ATT.COM> > > Date: Tue, 27 Feb 2001 00:01:19 -0500 > > > > Yadong, > > > > This is all very cute, and I dont want to be accused of not having a > > sense of humor, > > (clearly you do, and it is refreshing), but there is a thing called > > masking. > > Information is lost in the early auditory stages, due to neural coding. > > The auditory nerve signal is not about zero crossing. Even zero > > crossing > > are not exact, and would have jitter. But masking is NOT timing jitter. > > > > The ear IS similar to a floating point converter. The ear does not have > > an infinite > > dynamic > > range or signal to noise ratio. This limited dynamic range shows up as > > masking. > > > > Do you disagree? > > > > Jont > > -- > Jont B. Allen > AT&T Labs-Research, Shannon Laboratory, E161 > 180 Park Ave., Florham Park NJ, 07932-0971 > 973/360-8545voice, x7111fax, http://www.research.att.com/~jba

This message came from the mail archive
http://www.auditory.org/postings/2001/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University