[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] Seeking advice on using ANF firing rate to reslove front-back confusion in sound localization model



Qin,

It sounds like you could use a good tutorial on auditory representations, which this list might not be the "best place" for.  But that won't stop me from starting.

"Place" refers to positions along the cochlear partition, typically described also by "CF" or characteristic frequency.  A rate-vs-place profile is like a spectrum, with a value at each of a number of places or CFs.  A vector function of time, short-time-averaged firing rates, every few milliseconds typically.

When you model auditory neurons with the Zilany model, you get the instantaneous rate at one place, based on the CF of the fiber that you're modeling.  You need to induce a place dimension by running many such models, with an appropriate set of CFs, all processing the same audio input in parallel.  Or use a model that inherently has a place dimension, e.g. my CARFAC model, if you want it to be more efficient.

Similarly for your MSO output.  You can't work with one MSO rate output.  You need a 2D place parameterization, with cochlear place on one dimension and ITD or MSO place on the other (that is, a binaural correlogram, or binaural stabilized auditory image, as we've called it).  Without the whole pattern, it will be impossible to tell direction for a range of different signals, with different time patterns, spectra, and intensities.  Your plots suggest that you are thinking there is a mapping from direction to rate, but that mapping is different for every stimulus, and for every MSO neuron output.  You need to be looking at the "pattern", or "profile" as I called it at the level of the auditory nerve.

Combining the MSO output pattern with auditory nerve rate-vs-place patterns may give you the cues you need, or nearly so.

Also note that the hardest signal to localize, with no hope of resolving front-back confusion, is a sine wave.  Why not start with simple signals that are easy to localize, such as clicks?  The auditory nerve and brainstem are well set up to focus on transients and onsets, not ongoing sounds such as sine waves, or even sums of sine waves.

And of course I recommend the Binaural chapter of my book, in addition to tons of other good works on binaural hearing.  See http://machinehearing.org

Dick



On Tue, Mar 4, 2025 at 12:08 AM Qin Liu <qin.liu@xxxxxxx> wrote:

Dear Dick,

Thank you for your suggestions.

I am a little confused about the concept of "rate-vs-place profiles." Could you please provide more references or explain it a bit more? I believe spectral cues will definitely help, but I haven't found a proper way to manipulate them yet.

I've attempted to use head rotation to determine whether the sound source is from the front or back based on the MSO firing rate, which requires turning +/- 90 degrees each time. This is why I am seeking alternative methods to address this issue.

Best regards,

Qin


From: Richard F. Lyon <dicklyon@xxxxxxx>
Sent: Thursday, 27 February 2025 12:08:01
To: Qin Liu
Cc: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: Seeking advice on using ANF firing rate to reslove front-back confusion in sound localization model
 
Qin,

The rate-vs-place profiles from the two ears may have most of what you need to supplement the MSO's output that represents ITD, which is mostly a left-right cue.  The cues for elevation, including front-back, are generally thought to be more subtle spectral features, related to the individual's HRTF, and are not as robust as the ITD cues.  ILD cues are of intermediate robustness, I think, but still primarily left-right.

I hadn't thought about doing what Jan Schnupp suggested, looking at slightly different cones of confusion for different frequencies, but that sounds like another way to conceptualize the subtle HRTF-dependent spectral cues.

So you don't have to use "HRTF template matching", but you do have to use HRTFs.

If you want to do this in anything like the real world, as opposed to an anechoic environment, you'll need a strong precedence effect to pay attention to the first arrival and ignore echos, or something along those lines. 

Also, in the real world, we usually resolve front-back confusion quickly and easily by rotating our heads a little.  The effect of rotation on ITD is opposite for front vs back, so this gives a very robust front-back cue; up-down is still hard.

Dick


On Wed, Feb 26, 2025 at 4:21 PM Qin Liu <000003c563e12bd3-dmarc-request@xxxxxxxxxxxxxxx> wrote:

Dear auditory list,


I am currently working on a project involving sound localization using firing rates from auditory nerve fibers (ANFs) and the medial superior olive (MSO). However, I have encountered an issue: I am unable to distinguish between front and back sound sources using MSO firing rates alone but only the left-right.

I am considering whether auditory nerve fiber (ANF) firing rates might provide a solution, but I am uncertain how to utilize them effectively. For instance, I have experimented with analyzing the positive gradients of ANF firing rates but have not yet achieved meaningful results.

Could anyone suggest an auditory metric derived from binaural signals, ANF firing rates, or MSO that could classify front/back sources without relying on HRTF template matching? Any insights or alternative approaches would be invaluable to my work.

Thank you in advance. I sincerely appreciate any guidance you can offer.

Best regards,

Qin Liu
Doctoral Student
Laboratory of Wave Engineering, École Polytechnique Fédérale de Lausanne (EPFL)
Email: qin.liu@xxxxxxx