Re: [AUDITORY] Seeking advice on using ANF firing rate to reslove front-back confusion in sound localization model ("Richard F. Lyon" )


Subject: Re: [AUDITORY] Seeking advice on using ANF firing rate to reslove front-back confusion in sound localization model
From:    "Richard F. Lyon"  <0000030301ff4bce-dmarc-request@xxxxxxxx>
Date:    Tue, 4 Mar 2025 09:29:06 +1100

--000000000000e6dd4d062f77af9e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Qin, It sounds like you could use a good tutorial on auditory representations, which this list might not be the "best place" for. But that won't stop me from starting. "Place" refers to positions along the cochlear partition, typically described also by "CF" or characteristic frequency. A rate-vs-place profile is like a spectrum, with a value at each of a number of places or CFs. A vector function of time, short-time-averaged firing rates, every few milliseconds typically. When you model auditory neurons with the Zilany model, you get the instantaneous rate at one place, based on the CF of the fiber that you're modeling. You need to induce a place dimension by running many such models, with an appropriate set of CFs, all processing the same audio input in parallel. Or use a model that inherently has a place dimension, e.g. my CARFAC model, if you want it to be more efficient. Similarly for your MSO output. You can't work with one MSO rate output. You need a 2D place parameterization, with cochlear place on one dimension and ITD or MSO place on the other (that is, a binaural correlogram, or binaural stabilized auditory image, as we've called it). Without the whole pattern, it will be impossible to tell direction for a range of different signals, with different time patterns, spectra, and intensities. Your plots suggest that you are thinking there is a mapping from direction to rate, but that mapping is different for every stimulus, and for every MSO neuron output. You need to be looking at the "pattern", or "profile" as I called it at the level of the auditory nerve. Combining the MSO output pattern with auditory nerve rate-vs-place patterns may give you the cues you need, or nearly so. Also note that the hardest signal to localize, with no hope of resolving front-back confusion, is a sine wave. Why not start with simple signals that are easy to localize, such as clicks? The auditory nerve and brainstem are well set up to focus on transients and onsets, not ongoing sounds such as sine waves, or even sums of sine waves. And of course I recommend the Binaural chapter of my book, in addition to tons of other good works on binaural hearing. See http://machinehearing.or= g Dick On Tue, Mar 4, 2025 at 12:08=E2=80=AFAM Qin Liu <qin.liu@xxxxxxxx> wrote: > Dear Dick, > > Thank you for your suggestions. > > I am a little confused about the concept of "rate-vs-place profiles." Cou= ld > you please provide more references or explain it a bit more? I believe > spectral cues will definitely help, but I haven't found a proper way to > manipulate them yet. > > I've attempted to use head rotation to determine whether the sound source > is from the front or back based on the MSO firing rate, which requires > turning +/- 90 degrees each time. This is why I am seeking alternative > methods to address this issue. > > Best regards, > > Qin > > ------------------------------ > *From:* Richard F. Lyon <dicklyon@xxxxxxxx> > *Sent:* Thursday, 27 February 2025 12:08:01 > *To:* Qin Liu > *Cc:* AUDITORY@xxxxxxxx > *Subject:* Re: Seeking advice on using ANF firing rate to reslove > front-back confusion in sound localization model > > Qin, > > The rate-vs-place profiles from the two ears may have most of what you > need to supplement the MSO's output that represents ITD, which is mostly = a > left-right cue. The cues for elevation, including front-back, are > generally thought to be more subtle spectral features, related to the > individual's HRTF, and are not as robust as the ITD cues. ILD cues are o= f > intermediate robustness, I think, but still primarily left-right. > > I hadn't thought about doing what Jan Schnupp suggested, looking at > slightly different cones of confusion for different frequencies, but that > sounds like another way to conceptualize the subtle HRTF-dependent spectr= al > cues. > > So you don't have to use "HRTF template matching", but you do have to use > HRTFs. > > If you want to do this in anything like the real world, as opposed to an > anechoic environment, you'll need a strong precedence effect to pay > attention to the first arrival and ignore echos, or something along those > lines. > > Also, in the real world, we usually resolve front-back confusion quickly > and easily by rotating our heads a little. The effect of rotation on ITD > is opposite for front vs back, so this gives a very robust front-back cue= ; > up-down is still hard. > > Dick > > > On Wed, Feb 26, 2025 at 4:21=E2=80=AFPM Qin Liu < > 000003c563e12bd3-dmarc-request@xxxxxxxx> wrote: > >> Dear auditory list, >> >> I am currently working on a project involving sound localization using >> firing rates from auditory nerve fibers (ANFs) and the medial superior >> olive (MSO). However, I have encountered an issue: I am unable to >> distinguish between front and back sound sources using MSO firing rates >> alone but only the left-right. >> >> I am considering whether auditory nerve fiber (ANF) firing rates might >> provide a solution, but I am uncertain how to utilize them effectively. = For >> instance, I have experimented with analyzing the positive gradients of A= NF >> firing rates but have not yet achieved meaningful results. >> >> Could anyone suggest an auditory metric derived from binaural signals, >> ANF firing rates, or MSO that could classify front/back sources without >> relying on HRTF template matching? Any insights or alternative approache= s >> would be invaluable to my work. >> >> Thank you in advance. I sincerely appreciate any guidance you can offer. >> >> Best regards, >> >> *Qin Liu* >> Doctoral Student >> Laboratory of Wave Engineering, =C3=89cole Polytechnique F=C3=A9d=C3=A9r= ale de Lausanne >> (EPFL) >> Email: qin.liu@xxxxxxxx >> >> >> --000000000000e6dd4d062f77af9e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-size:small">Qin= ,</div><div class=3D"gmail_default" style=3D"font-size:small"><br></div><di= v class=3D"gmail_default" style=3D"font-size:small">It sounds like you coul= d use a good tutorial on auditory representations, which this list might no= t be the &quot;best place&quot; for.=C2=A0 But that won&#39;t stop me from = starting.</div><div class=3D"gmail_default" style=3D"font-size:small"><br><= /div><div class=3D"gmail_default" style=3D"font-size:small">&quot;Place&quo= t; refers to positions along the cochlear partition, typically described al= so by &quot;CF&quot; or characteristic frequency.=C2=A0 A rate-vs-place pro= file is like a spectrum, with a value at each of a number of places or CFs.= =C2=A0 A vector function of time, short-time-averaged firing rates, every f= ew milliseconds typically.</div><div class=3D"gmail_default" style=3D"font-= size:small"><br></div><div class=3D"gmail_default" style=3D"font-size:small= ">When you model auditory neurons with the Zilany model, you get the instan= taneous rate at one place, based on the CF of the fiber that you&#39;re mod= eling.=C2=A0 You need to induce a place dimension by running many such mode= ls, with an appropriate set of CFs, all processing the same audio input in = parallel.=C2=A0 Or use a model that inherently has a place dimension, e.g. = my CARFAC model, if you want it to be more efficient.</div><div class=3D"gm= ail_default" style=3D"font-size:small"><br></div><div class=3D"gmail_defaul= t" style=3D"font-size:small">Similarly for your MSO output.=C2=A0 You can&#= 39;t work with one MSO rate output.=C2=A0 You need a 2D place parameterizat= ion, with cochlear place on one dimension and ITD or MSO place on the other= (that is, a binaural correlogram, or binaural stabilized auditory image, a= s we&#39;ve called it).=C2=A0 Without the whole pattern, it will be impossi= ble to tell direction for a range of different signals, with different time= patterns, spectra, and intensities.=C2=A0 Your plots suggest that you are = thinking there is a mapping from direction to rate, but that mapping is dif= ferent for every stimulus, and for every MSO neuron output.=C2=A0 You need = to be looking at the &quot;pattern&quot;, or &quot;profile&quot; as I calle= d it at the level of the auditory nerve.</div><div class=3D"gmail_default" = style=3D"font-size:small"><br></div><div class=3D"gmail_default" style=3D"f= ont-size:small">Combining the MSO output pattern with auditory nerve rate-v= s-place patterns may give you the cues you need, or nearly so.</div><div cl= ass=3D"gmail_default" style=3D"font-size:small"><br></div><div class=3D"gma= il_default" style=3D"font-size:small">Also note that the hardest signal to = localize, with no hope of resolving front-back confusion, is a sine wave.= =C2=A0 Why not start with simple signals that are easy to localize, such as= clicks?=C2=A0 The auditory nerve and brainstem are well set up to focus on= transients and onsets, not ongoing sounds such as sine waves, or even sums= of sine waves.</div><div class=3D"gmail_default" style=3D"font-size:small"= ><br></div><div class=3D"gmail_default" style=3D"font-size:small">And of co= urse I recommend the Binaural chapter of my book, in addition to tons of ot= her good works on binaural hearing.=C2=A0 See <a href=3D"http://machinehear= ing.org">http://machinehearing.org</a></div><div class=3D"gmail_default" st= yle=3D"font-size:small"><br></div><div class=3D"gmail_default" style=3D"fon= t-size:small">Dick</div><div class=3D"gmail_default" style=3D"font-size:sma= ll"><br></div><div class=3D"gmail_default" style=3D"font-size:small"><br></= div></div><br><div class=3D"gmail_quote gmail_quote_container"><div dir=3D"= ltr" class=3D"gmail_attr">On Tue, Mar 4, 2025 at 12:08=E2=80=AFAM Qin Liu &= lt;<a href=3D"mailto:qin.liu@xxxxxxxx">qin.liu@xxxxxxxx</a>&gt; wrote:<br></d= iv><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bord= er-left:1px solid rgb(204,204,204);padding-left:1ex"> <div> <div id=3D"m_-8818972563815078639divtagdefaultwrapper" style=3D"font-size:1= 2pt;color:rgb(0,0,0);font-family:Calibri,Helvetica,sans-serif" dir=3D"ltr"> <p></p> <div>Dear Dick,</div> <div><br> </div> <div>Thank you for your suggestions.<br> <br> I am a little confused about the concept of &quot;rate-vs-place profiles.&q= uot; <span style=3D"font-family:Calibri,Helvetica,sans-serif,EmojiFont,&quo= t;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&quot;,NotoColorEmoji,&quot;S= egoe UI Symbol&quot;,&quot;Android Emoji&quot;,EmojiSymbols;font-size:16px"= > Could you please provide more references or explain it a bit more? I believ= e spectral cues will definitely help, but I haven&#39;t found a proper way = to manipulate them yet.</span><br> <br> I&#39;ve attempted to use head rotation to determine whether the sound sour= ce is from the front or back based on the MSO firing rate, which requires t= urning +/- 90 degrees each time. This is why I am seeking alternative metho= ds to address this issue.<br> <br> Best regards,<br> <br> Qin</div> <p></p> </div> <hr style=3D"display:inline-block;width:98%"> <div id=3D"m_-8818972563815078639divRplyFwdMsg" dir=3D"ltr"><font face=3D"C= alibri, sans-serif" style=3D"font-size:11pt" color=3D"#000000"><b>From:</b>= Richard F. Lyon &lt;<a href=3D"mailto:dicklyon@xxxxxxxx" target=3D"_blank">= dicklyon@xxxxxxxx</a>&gt;<br> <b>Sent:</b> Thursday, 27 February 2025 12:08:01<br> <b>To:</b> Qin Liu<br> <b>Cc:</b> <a href=3D"mailto:AUDITORY@xxxxxxxx" target=3D"_blank">AU= DITORY@xxxxxxxx</a><br> <b>Subject:</b> Re: Seeking advice on using ANF firing rate to reslove fron= t-back confusion in sound localization model</font> <div>=C2=A0</div> </div> <div> <div dir=3D"ltr"> <div class=3D"gmail_default" style=3D"font-size:small">Qin,</div> <div class=3D"gmail_default" style=3D"font-size:small"><br> </div> <div class=3D"gmail_default" style=3D"font-size:small">The rate-vs-place pr= ofiles from the two ears may have most of what you need to supplement the M= SO&#39;s output that represents ITD, which is mostly a left-right cue.=C2= =A0 The cues for elevation, including front-back, are generally thought to be more subtle spectral features, related to the = individual&#39;s HRTF, and are not as robust as the ITD cues.=C2=A0 ILD cue= s are of intermediate robustness, I think, but still primarily left-right.<= /div> <div class=3D"gmail_default" style=3D"font-size:small"><br> </div> <div class=3D"gmail_default" style=3D"font-size:small">I hadn&#39;t thought= about doing what Jan Schnupp suggested, looking at slightly different cone= s of confusion for different frequencies, but that sounds like another way = to conceptualize the subtle HRTF-dependent spectral cues.</div> <div class=3D"gmail_default" style=3D"font-size:small"><br> </div> <div class=3D"gmail_default" style=3D"font-size:small">So you don&#39;t hav= e to use &quot;HRTF template matching&quot;, but you do have to use HRTFs.<= /div> <div class=3D"gmail_default" style=3D"font-size:small"><br> </div> <div class=3D"gmail_default" style=3D"font-size:small">If you want to do th= is in anything like the real world, as opposed to an anechoic environment, = you&#39;ll need a strong precedence effect to pay attention to the first ar= rival and ignore echos, or something along those lines.=C2=A0 <br> </div> <div class=3D"gmail_default" style=3D"font-size:small"><br> </div> <div class=3D"gmail_default" style=3D"font-size:small">Also, in the real wo= rld, we usually resolve front-back confusion quickly and easily by rotating= our heads a little.=C2=A0 The effect of rotation on ITD is opposite for fr= ont vs back, so this gives a very robust front-back cue; up-down is still hard.</div> <div class=3D"gmail_default" style=3D"font-size:small"><br> </div> <div class=3D"gmail_default" style=3D"font-size:small">Dick</div> <div class=3D"gmail_default" style=3D"font-size:small"><br> </div> </div> <br> <div class=3D"gmail_quote"> <div dir=3D"ltr" class=3D"gmail_attr">On Wed, Feb 26, 2025 at 4:21=E2=80=AF= PM Qin Liu &lt;<a href=3D"mailto:000003c563e12bd3-dmarc-request@xxxxxxxx= l.ca" target=3D"_blank">000003c563e12bd3-dmarc-request@xxxxxxxx</a>&= gt; wrote:<br> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-= left:1px solid rgb(204,204,204);padding-left:1ex"> <div> <div dir=3D"ltr"> <div id=3D"m_-8818972563815078639m_8012116099777483196divtagdefaultwrapper"= dir=3D"ltr" style=3D"font-size:12pt;color:rgb(0,0,0);font-family:Calibri,H= elvetica,sans-serif,EmojiFont,&quot;Apple Color Emoji&quot;,&quot;Segoe UI = Emoji&quot;,NotoColorEmoji,&quot;Segoe UI Symbol&quot;,&quot;Android Emoji&= quot;,EmojiSymbols"> <p>Dear auditory list,</p> <div><br> I am currently working on a project involving sound localization using firi= ng rates from auditory nerve fibers (ANFs) and the medial superior olive (M= SO). However, I have encountered an issue: I am unable to distinguish betwe= en front and back sound sources using MSO firing rates alone but only the left-right.<br> <br> I am considering whether auditory nerve fiber (ANF) firing rates might prov= ide a solution, but I am uncertain how to utilize them effectively. For ins= tance, I have experimented with analyzing the positive gradients of ANF fir= ing rates but have not yet achieved meaningful results.<br> <br> Could anyone suggest an auditory metric derived from binaural signals, ANF = firing rates, or MSO that could classify front/back sources without relying= on HRTF template matching? Any insights or alternative approaches would be= invaluable to my work.<br> <br> Thank you in advance. I sincerely appreciate any guidance you can offer.<br= > <br> Best regards,<br> <br> <b></b><strong>Qin Liu</strong><span style=3D"color:rgba(0,0,0,0.9);font-fa= mily:&quot;PingFang SC&quot;,-apple-system,BlinkMacSystemFont,&quot;Segoe U= I&quot;,Roboto,Ubuntu,&quot;Helvetica Neue&quot;,Helvetica,Arial,&quot;Hira= gino Sans GB&quot;,&quot;Microsoft YaHei UI&quot;,&quot;Microsoft YaHei&quo= t;,&quot;Source Han Sans CN&quot;,sans-serif;font-size:16px;background-colo= r:rgb(252,252,252)"></span><br style=3D"color:rgba(0,0,0,0.9);font-family:&= quot;PingFang SC&quot;,-apple-system,BlinkMacSystemFont,&quot;Segoe UI&quot= ;,Roboto,Ubuntu,&quot;Helvetica Neue&quot;,Helvetica,Arial,&quot;Hiragino S= ans GB&quot;,&quot;Microsoft YaHei UI&quot;,&quot;Microsoft YaHei&quot;,&qu= ot;Source Han Sans CN&quot;,sans-serif;font-size:16px;background-color:rgb(= 252,252,252)"> <span>Doctoral Student</span><span style=3D"color:rgba(0,0,0,0.9);font-fami= ly:&quot;PingFang SC&quot;,-apple-system,BlinkMacSystemFont,&quot;Segoe UI&= quot;,Roboto,Ubuntu,&quot;Helvetica Neue&quot;,Helvetica,Arial,&quot;Hiragi= no Sans GB&quot;,&quot;Microsoft YaHei UI&quot;,&quot;Microsoft YaHei&quot;= ,&quot;Source Han Sans CN&quot;,sans-serif;font-size:16px;background-color:= rgb(252,252,252)"></span><br style=3D"color:rgba(0,0,0,0.9);font-family:&qu= ot;PingFang SC&quot;,-apple-system,BlinkMacSystemFont,&quot;Segoe UI&quot;,= Roboto,Ubuntu,&quot;Helvetica Neue&quot;,Helvetica,Arial,&quot;Hiragino San= s GB&quot;,&quot;Microsoft YaHei UI&quot;,&quot;Microsoft YaHei&quot;,&quot= ;Source Han Sans CN&quot;,sans-serif;font-size:16px;background-color:rgb(25= 2,252,252)"> <span style=3D"color:rgba(0,0,0,0.9);font-family:&quot;PingFang SC&quot;,-a= pple-system,BlinkMacSystemFont,&quot;Segoe UI&quot;,Roboto,Ubuntu,&quot;Hel= vetica Neue&quot;,Helvetica,Arial,&quot;Hiragino Sans GB&quot;,&quot;Micros= oft YaHei UI&quot;,&quot;Microsoft YaHei&quot;,&quot;Source Han Sans CN&quo= t;,sans-serif;font-size:16px;background-color:rgb(252,252,252)">Laboratory of Wave Engineering,=C2=A0</span><span style=3D"color:rgba(0,0,0,0.9);font= -family:&quot;PingFang SC&quot;,-apple-system,BlinkMacSystemFont,&quot;Sego= e UI&quot;,Roboto,Ubuntu,&quot;Helvetica Neue&quot;,Helvetica,Arial,&quot;H= iragino Sans GB&quot;,&quot;Microsoft YaHei UI&quot;,&quot;Microsoft YaHei&= quot;,&quot;Source Han Sans CN&quot;,sans-serif;font-size:16px;background-c= olor:rgb(252,252,252)">=C3=89cole Polytechnique F=C3=A9d=C3=A9rale de Lausanne (EPFL)</span><br style=3D"col= or:rgba(0,0,0,0.9);font-family:&quot;PingFang SC&quot;,-apple-system,BlinkM= acSystemFont,&quot;Segoe UI&quot;,Roboto,Ubuntu,&quot;Helvetica Neue&quot;,= Helvetica,Arial,&quot;Hiragino Sans GB&quot;,&quot;Microsoft YaHei UI&quot;= ,&quot;Microsoft YaHei&quot;,&quot;Source Han Sans CN&quot;,sans-serif;font= -size:16px;background-color:rgb(252,252,252)"> <span style=3D"color:rgba(0,0,0,0.9);font-family:&quot;PingFang SC&quot;,-a= pple-system,BlinkMacSystemFont,&quot;Segoe UI&quot;,Roboto,Ubuntu,&quot;Hel= vetica Neue&quot;,Helvetica,Arial,&quot;Hiragino Sans GB&quot;,&quot;Micros= oft YaHei UI&quot;,&quot;Microsoft YaHei&quot;,&quot;Source Han Sans CN&quo= t;,sans-serif;font-size:16px;background-color:rgb(252,252,252)">Email: <a href=3D"mailto:qin.liu@xxxxxxxx" target=3D"_blank">qin.liu@xxxxxxxx</a></s= pan></div> <br> <br> <p></p> </div> </div> </div> </blockquote> </div> </div> </div> </blockquote></div> --000000000000e6dd4d062f77af9e--


This message came from the mail archive
postings/2025/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University