Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency (Junfeng Li )


Subject: Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency
From:    Junfeng Li  <junfeng.li.1979@xxxxxxxx>
Date:    Sat, 13 Aug 2022 07:42:02 +0800

--000000000000e0818e05e613d43f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Dear Frederick, Thank you so much for the references that you mentioned. "[...] up=E2=80=93down cues are located mainly in the 6=E2=80=9312-kHz band= , and front=E2=80=93back cues in the 8=E2=80=9316-kHz band." According to this statement, it seems impossible to solve the problems of elevation perception and front-back confusion when the output signal is sampled at 16kHz. Though I know it is difficult, I always try to find some solutions. Thanks again. Best regards, Junfeng On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun <fgallun@xxxxxxxx> wrote= : > The literature on the HRTF over the past 60 years has made it very clear > that "[...] up=E2=80=93down cues are located mainly in the 6=E2=80=9312-k= Hz band, and > front=E2=80=93back cues in the 8=E2=80=9316-kHz band." (Langendiijk and B= ronkhorst, 2002) > > Here are a few places to start: > > Langendijk, E. H. A., & Bronkhorst, A. W. (2002). Contribution of spectra= l > cues to human sound localization. *The Journal of the Acoustical Society > of America*, *112*(4), 1583=E2=80=931596. https://doi.org/10.1121/1.15019= 01 > > Mehrgardt, S., & Mellert, V. (1977). Transformation characteristics of th= e > external human ear. *The Journal of the Acoustical Society of America*, > *61*(6), 1567=E2=80=931576. https://doi.org/10.1121/1.381470 > > Shaw, E. a. G., & Teranishi, R. (1968). Sound Pressure Generated in an > External=E2=80=90Ear Replica and Real Human Ears by a Nearby Point Source= . *The > Journal of the Acoustical Society of America*, *44*(1), 240=E2=80=93249. > https://doi.org/10.1121/1.1911059 > > --------------------------------------------- > > Frederick (Erick) Gallun, PhD, FASA, FASHA | he/him/his > Professor, Oregon Hearing Research Center, Oregon Health & Science > University > "Diversity is like being invited to a party, Inclusion is being asked to > dance, and Belonging is dancing like no one=E2=80=99s watching" - Gregory= Lewis > > > On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li <junfeng.li.1979@xxxxxxxx> > wrote: > >> Dear Leslie, >> >> When downsampling to 8/16kHz, we really found the localization accuracy >> decreases, even for horizon >> Do you have any good ideas to solve it? >> >> Thanks a lot. >> >> Best regards, >> Junfeng >> >> >> On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith < >> l.s.smith@xxxxxxxx> wrote: >> >>> I'd also wonder about the time resolution: 16KHz =3D 1/16000 sec betwee= n >>> samples =3D 62 microseconds >>> . >>> That's relatively long for ITD (TDOA) estimation, which would suggest >>> that >>> localisation of lower frequency signals would be impeded. >>> >>> (I don't have evidence for this: it's just a suggestion). >>> >>> --Leslie Smith >>> >>> Junfeng Li wrote: >>> > Dear all, >>> > >>> > We are working on 3D audio rendering for signals with low sampling >>> > frequency. >>> > As you may know, the HRTFs are normally measured at the high samplin= g >>> > frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of >>> > sound >>> > signals in our application is restricted to 16 kHz. Therefore, to >>> render >>> > this low-frequency (=E2=89=A48kHz) signal, one straight way is to fir= st >>> > downsample >>> > the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound >>> > signals. >>> > However, the sound localization performance of the signal rendered wi= th >>> > this approach is greatly decreased, especially elevation perception. = To >>> > improve the sound localization performance, I am now wondering whethe= r >>> > there is a certain good method to solve or mitigate this problem in >>> this >>> > scenario. >>> > >>> > Any discussion is welcome. >>> > >>> > Thanks a lot again. >>> > >>> > Best regards, >>> > Junfeng >>> > >>> >>> >>> -- >>> Prof Leslie Smith (Emeritus) >>> Computing Science & Mathematics, >>> University of Stirling, Stirling FK9 4LA >>> Scotland, UK >>> Tel +44 1786 467435 >>> Web: http://www.cs.stir.ac.uk/~lss >>> Blog: http://lestheprof.com >>> >>> --000000000000e0818e05e613d43f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Dear Frederick,<div><br></div><div>Thank you so much for t= he references that you mentioned.=C2=A0</div><div><br></div><div>&quot;[...= ]=C2=A0up=E2=80=93down cues are located mainly in the 6=E2=80=9312-kHz band= , and front=E2=80=93back cues in the 8=E2=80=9316-kHz band.&quot;=C2=A0<br>= </div><div>According to this statement, it seems impossible to solve the pr= oblems of elevation perception and front-back confusion when the output sig= nal is sampled at 16kHz.=C2=A0</div><div>Though I know it is difficult, I a= lways=C2=A0try to find some solutions.</div><div><br></div><div>Thanks agai= n.</div><div><br></div><div>Best regards,</div><div>Junfeng=C2=A0</div></di= v><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On S= at, Aug 13, 2022 at 12:50 AM Frederick Gallun &lt;<a href=3D"mailto:fgallun= @xxxxxxxx">fgallun@xxxxxxxx</a>&gt; wrote:<br></div><blockquote class=3D"= gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(20= 4,204,204);padding-left:1ex"><div dir=3D"ltr">The literature on the HRTF ov= er the past 60 years has made it very clear that &quot;[...]=C2=A0up=E2=80= =93down cues are located mainly in the 6=E2=80=9312-kHz band, and front=E2= =80=93back cues in the 8=E2=80=9316-kHz band.&quot; (Langendiijk=C2=A0and B= ronkhorst, 2002)=C2=A0=C2=A0<div><br></div><div>Here are a few places to st= art:<div><br></div><div>Langendijk, E. H. A., &amp; Bronkhorst, A. W. (2002= ). Contribution of spectral cues to human sound localization. <i>The Journa= l of the Acoustical Society of America</i>, <i>112</i>(4), 1583=E2=80=93159= 6. <a href=3D"https://doi.org/10.1121/1.1501901" target=3D"_blank">https://= doi.org/10.1121/1.1501901</a></div><div><br></div><div>Mehrgardt, S., &amp;= Mellert, V. (1977). Transformation characteristics of the external human e= ar. <i>The Journal of the Acoustical Society of America</i>, <i>61</i>(6), = 1567=E2=80=931576. <a href=3D"https://doi.org/10.1121/1.381470" target=3D"_= blank">https://doi.org/10.1121/1.381470</a></div><div><div style=3D"line-he= ight:2;margin-left:2em"> <span title=3D"url_ver=3DZ39.88-2004&amp;ctx_ver=3DZ39.88-2004&amp;rfr_id= =3Dinfo%3Asid%2Fzotero.org%3A2&amp;rft_id=3Dinfo%3Adoi%2F10.1121%2F1.381470= &amp;rft_val_fmt=3Dinfo%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.genre=3Da= rticle&amp;rft.atitle=3DTransformation%20characteristics%20of%20the%20exter= nal%20human%20ear&amp;rft.jtitle=3DThe%20Journal%20of%20the%20Acoustical%20= Society%20of%20America&amp;rft.volume=3D61&amp;rft.issue=3D6&amp;rft.aufirs= t=3DS.&amp;rft.aulast=3DMehrgardt&amp;rft.au=3DS.%20Mehrgardt&amp;rft.au=3D= V.%20Mellert&amp;rft.date=3D1977-06&amp;rft.pages=3D1567-1576&amp;rft.spage= =3D1567&amp;rft.epage=3D1576&amp;rft.issn=3D0001-4966"></span></div></div><= div><br></div><div>Shaw, E. a. G., &amp; Teranishi, R. (1968). Sound Pressu= re Generated in an External=E2=80=90Ear Replica and Real Human Ears by a Ne= arby Point Source. <i>The Journal of the Acoustical Society of America</i>,= <i>44</i>(1), 240=E2=80=93249. <a href=3D"https://doi.org/10.1121/1.191105= 9" target=3D"_blank">https://doi.org/10.1121/1.1911059</a></div><div><div><= div style=3D"line-height:2;margin-left:2em"> <span title=3D"url_ver=3DZ39.88-2004&amp;ctx_ver=3DZ39.88-2004&amp;rfr_id= =3Dinfo%3Asid%2Fzotero.org%3A2&amp;rft_id=3Dinfo%3Adoi%2F10.1121%2F1.191105= 9&amp;rft_val_fmt=3Dinfo%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.genre=3D= article&amp;rft.atitle=3DSound%20Pressure%20Generated%20in%20an%20External%= E2%80%90Ear%20Replica%20and%20Real%20Human%20Ears%20by%20a%20Nearby%20Point= %20Source&amp;rft.jtitle=3DThe%20Journal%20of%20the%20Acoustical%20Society%= 20of%20America&amp;rft.volume=3D44&amp;rft.issue=3D1&amp;rft.aufirst=3DE.%2= 0a.%20G.&amp;rft.aulast=3DShaw&amp;rft.au=3DE.%20a.%20G.%20Shaw&amp;rft.au= =3DR.%20Teranishi&amp;rft.date=3D1968-07&amp;rft.pages=3D240-249&amp;rft.sp= age=3D240&amp;rft.epage=3D249&amp;rft.issn=3D0001-4966"></span></div></div>= <div><br></div><div><div><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"= ><div dir=3D"ltr"><div dir=3D"ltr"><p style=3D"margin:5pt 0in"><span style= =3D"font-size:10pt"><font color=3D"#000000" face=3D"Calibri">--------------= -------------------------------</font></span></p><p style=3D"margin:0in 0in= 0pt"><font color=3D"#000000" face=3D"Calibri">Frederick (Erick) Gallun, Ph= D, FASA, FASHA | he/him/his<br></font></p><div style=3D"margin:0in 0in 0pt"= ><font color=3D"#000000" face=3D"Calibri">Professor,=C2=A0Oregon Hearing Re= search Center,=C2=A0<font color=3D"#000000" face=3D"Calibri">Oregon Health = &amp; Science University</font></font></div><div style=3D"margin:0in 0in 0p= t"><font face=3D"monospace" size=3D"1"><font color=3D"#000000">&quot;D</fon= t><span style=3D"color:rgb(76,76,76)">iversity is like being invited to a p= arty, Inclusion is being asked to dance, and Belonging is dancing like no o= ne=E2=80=99s watching&quot; - Gregory Lewis</span></font></div><div style= =3D"margin:0in 0in 0pt"><span><font color=3D"#000000" face=3D"Times New Rom= an" size=3D"3"> </font></span></div><font color=3D"#000000" face=3D"Times New Roman" size= =3D"3"> </font></div></div></div></div></div></div><br></div></div></div></div><br>= <div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Au= g 11, 2022 at 11:59 PM Junfeng Li &lt;<a href=3D"mailto:junfeng.li.1979@xxxxxxxx= il.com" target=3D"_blank">junfeng.li.1979@xxxxxxxx</a>&gt; wrote:<br></div= ><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border= -left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Dear=C2= =A0 Leslie,<div><br></div><div>When downsampling to 8/16kHz, we really found th= e localization accuracy decreases,=C2=A0even for horizon</div><div>Do you h= ave any good ideas to solve it?</div><div><br></div><div>Thanks a lot.</div= ><div><br></div><div>Best regards,</div><div>Junfeng=C2=A0</div><div><br></= div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_at= tr">On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith &lt;<a href=3D"mailto= :l.s.smith@xxxxxxxx" target=3D"_blank">l.s.smith@xxxxxxxx</a>&gt;= wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px = 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I&#39;d = also wonder about the time resolution: 16KHz =3D 1/16000 sec between<br> samples =3D 62 microseconds<br> .<br> That&#39;s relatively long for ITD (TDOA) estimation, which would suggest t= hat<br> localisation of lower frequency signals would be impeded.<br> <br> (I don&#39;t have evidence for this: it&#39;s just a suggestion).<br> <br> --Leslie Smith<br> <br> Junfeng Li wrote:<br> &gt; Dear all,<br> &gt;<br> &gt; We are working on 3D audio rendering for signals with low sampling<br> &gt; frequency.<br> &gt; As you may know, the HRTFs=C2=A0 are normally measured at the high sam= pling<br> &gt; frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of<= br> &gt; sound<br> &gt; signals in our application is restricted to 16 kHz. Therefore, to rend= er<br> &gt; this low-frequency (=E2=89=A48kHz) signal, one straight way is to firs= t<br> &gt; downsample<br> &gt; the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound<br> &gt; signals.<br> &gt; However, the sound localization performance of the signal rendered wit= h<br> &gt; this approach is greatly decreased, especially elevation perception. T= o<br> &gt; improve the sound localization performance, I am now wondering whether= <br> &gt; there is a certain good method to solve or mitigate this problem in th= is<br> &gt; scenario.<br> &gt;<br> &gt; Any discussion is welcome.<br> &gt;<br> &gt; Thanks a lot again.<br> &gt;<br> &gt; Best regards,<br> &gt; Junfeng<br> &gt;<br> <br> <br> -- <br> Prof Leslie Smith (Emeritus)<br> Computing Science &amp; Mathematics,<br> University of Stirling, Stirling FK9 4LA<br> Scotland, UK<br> Tel +44 1786 467435<br> Web: <a href=3D"http://www.cs.stir.ac.uk/~lss" rel=3D"noreferrer" target=3D= "_blank">http://www.cs.stir.ac.uk/~lss</a><br> Blog: <a href=3D"http://lestheprof.com" rel=3D"noreferrer" target=3D"_blank= ">http://lestheprof.com</a><br> <br> </blockquote></div> </blockquote></div> </blockquote></div> --000000000000e0818e05e613d43f--


This message came from the mail archive
src/postings/2022/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University