Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency (Frederick Gallun )


Subject: Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency
From:    Frederick Gallun  <fgallun@xxxxxxxx>
Date:    Fri, 12 Aug 2022 09:49:02 -0700

--000000000000ce23ff05e60e122d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The literature on the HRTF over the past 60 years has made it very clear that "[...] up=E2=80=93down cues are located mainly in the 6=E2=80=9312-kHz= band, and front=E2=80=93back cues in the 8=E2=80=9316-kHz band." (Langendiijk and Bro= nkhorst, 2002) Here are a few places to start: Langendijk, E. H. A., & Bronkhorst, A. W. (2002). Contribution of spectral cues to human sound localization. *The Journal of the Acoustical Society of America*, *112*(4), 1583=E2=80=931596. https://doi.org/10.1121/1.1501901 Mehrgardt, S., & Mellert, V. (1977). Transformation characteristics of the external human ear. *The Journal of the Acoustical Society of America*, *61*(6), 1567=E2=80=931576. https://doi.org/10.1121/1.381470 Shaw, E. a. G., & Teranishi, R. (1968). Sound Pressure Generated in an External=E2=80=90Ear Replica and Real Human Ears by a Nearby Point Source. = *The Journal of the Acoustical Society of America*, *44*(1), 240=E2=80=93249. https://doi.org/10.1121/1.1911059 --------------------------------------------- Frederick (Erick) Gallun, PhD, FASA, FASHA | he/him/his Professor, Oregon Hearing Research Center, Oregon Health & Science University "Diversity is like being invited to a party, Inclusion is being asked to dance, and Belonging is dancing like no one=E2=80=99s watching" - Gregory L= ewis On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li <junfeng.li.1979@xxxxxxxx> wrote: > Dear Leslie, > > When downsampling to 8/16kHz, we really found the localization accuracy > decreases, even for horizon > Do you have any good ideas to solve it? > > Thanks a lot. > > Best regards, > Junfeng > > > On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith <l.s.smith@xxxxxxxx= k> > wrote: > >> I'd also wonder about the time resolution: 16KHz =3D 1/16000 sec between >> samples =3D 62 microseconds >> . >> That's relatively long for ITD (TDOA) estimation, which would suggest th= at >> localisation of lower frequency signals would be impeded. >> >> (I don't have evidence for this: it's just a suggestion). >> >> --Leslie Smith >> >> Junfeng Li wrote: >> > Dear all, >> > >> > We are working on 3D audio rendering for signals with low sampling >> > frequency. >> > As you may know, the HRTFs are normally measured at the high sampling >> > frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of >> > sound >> > signals in our application is restricted to 16 kHz. Therefore, to rend= er >> > this low-frequency (=E2=89=A48kHz) signal, one straight way is to firs= t >> > downsample >> > the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound >> > signals. >> > However, the sound localization performance of the signal rendered wit= h >> > this approach is greatly decreased, especially elevation perception. T= o >> > improve the sound localization performance, I am now wondering whether >> > there is a certain good method to solve or mitigate this problem in th= is >> > scenario. >> > >> > Any discussion is welcome. >> > >> > Thanks a lot again. >> > >> > Best regards, >> > Junfeng >> > >> >> >> -- >> Prof Leslie Smith (Emeritus) >> Computing Science & Mathematics, >> University of Stirling, Stirling FK9 4LA >> Scotland, UK >> Tel +44 1786 467435 >> Web: http://www.cs.stir.ac.uk/~lss >> Blog: http://lestheprof.com >> >> --000000000000ce23ff05e60e122d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">The literature on the HRTF over the past 60 years has made= it very clear that &quot;[...]=C2=A0up=E2=80=93down cues are located mainl= y in the 6=E2=80=9312-kHz band, and front=E2=80=93back cues in the 8=E2=80= =9316-kHz band.&quot; (Langendiijk=C2=A0and Bronkhorst, 2002)=C2=A0=C2=A0<d= iv><br></div><div>Here are a few places to start:<div><br></div><div>Langen= dijk, E. H. A., &amp; Bronkhorst, A. W. (2002). Contribution of spectral cu= es to human sound localization. <i>The Journal of the Acoustical Society of= America</i>, <i>112</i>(4), 1583=E2=80=931596. <a href=3D"https://doi.org/= 10.1121/1.1501901">https://doi.org/10.1121/1.1501901</a></div><div><br></di= v><div>Mehrgardt, S., &amp; Mellert, V. (1977). Transformation characterist= ics of the external human ear. <i>The Journal of the Acoustical Society of = America</i>, <i>61</i>(6), 1567=E2=80=931576. <a href=3D"https://doi.org/10= .1121/1.381470">https://doi.org/10.1121/1.381470</a></div><div><div class= =3D"gmail-csl-bib-body" style=3D"line-height:2;margin-left:2em"> <span class=3D"gmail-Z3988" title=3D"url_ver=3DZ39.88-2004&amp;ctx_ver=3D= Z39.88-2004&amp;rfr_id=3Dinfo%3Asid%2Fzotero.org%3A2&amp;rft_id=3Dinfo%3Ado= i%2F10.1121%2F1.381470&amp;rft_val_fmt=3Dinfo%3Aofi%2Ffmt%3Akev%3Amtx%3Ajou= rnal&amp;rft.genre=3Darticle&amp;rft.atitle=3DTransformation%20characterist= ics%20of%20the%20external%20human%20ear&amp;rft.jtitle=3DThe%20Journal%20of= %20the%20Acoustical%20Society%20of%20America&amp;rft.volume=3D61&amp;rft.is= sue=3D6&amp;rft.aufirst=3DS.&amp;rft.aulast=3DMehrgardt&amp;rft.au=3DS.%20M= ehrgardt&amp;rft.au=3DV.%20Mellert&amp;rft.date=3D1977-06&amp;rft.pages=3D1= 567-1576&amp;rft.spage=3D1567&amp;rft.epage=3D1576&amp;rft.issn=3D0001-4966= "></span></div></div><div><br></div><div>Shaw, E. a. G., &amp; Teranishi, R= . (1968). Sound Pressure Generated in an External=E2=80=90Ear Replica and R= eal Human Ears by a Nearby Point Source. <i>The Journal of the Acoustical S= ociety of America</i>, <i>44</i>(1), 240=E2=80=93249. <a href=3D"https://do= i.org/10.1121/1.1911059">https://doi.org/10.1121/1.1911059</a></div><div><d= iv><div class=3D"gmail-csl-bib-body" style=3D"line-height:2;margin-left:2em= "> <span class=3D"gmail-Z3988" title=3D"url_ver=3DZ39.88-2004&amp;ctx_ver=3D= Z39.88-2004&amp;rfr_id=3Dinfo%3Asid%2Fzotero.org%3A2&amp;rft_id=3Dinfo%3Ado= i%2F10.1121%2F1.1911059&amp;rft_val_fmt=3Dinfo%3Aofi%2Ffmt%3Akev%3Amtx%3Ajo= urnal&amp;rft.genre=3Darticle&amp;rft.atitle=3DSound%20Pressure%20Generated= %20in%20an%20External%E2%80%90Ear%20Replica%20and%20Real%20Human%20Ears%20b= y%20a%20Nearby%20Point%20Source&amp;rft.jtitle=3DThe%20Journal%20of%20the%2= 0Acoustical%20Society%20of%20America&amp;rft.volume=3D44&amp;rft.issue=3D1&= amp;rft.aufirst=3DE.%20a.%20G.&amp;rft.aulast=3DShaw&amp;rft.au=3DE.%20a.%2= 0G.%20Shaw&amp;rft.au=3DR.%20Teranishi&amp;rft.date=3D1968-07&amp;rft.pages= =3D240-249&amp;rft.spage=3D240&amp;rft.epage=3D249&amp;rft.issn=3D0001-4966= "></span></div></div><div><br></div><div><div><div dir=3D"ltr" class=3D"gma= il_signature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div dir= =3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><p style=3D"margin:5pt 0in"><spa= n style=3D"font-size:10pt"><font color=3D"#000000" face=3D"Calibri">-------= --------------------------------------</font></span></p><p style=3D"margin:= 0in 0in 0pt"><font color=3D"#000000" face=3D"Calibri">Frederick (Erick) Gal= lun, PhD, FASA, FASHA | he/him/his<br></font></p><div style=3D"margin:0in 0= in 0pt"><font color=3D"#000000" face=3D"Calibri">Professor,=C2=A0Oregon Hea= ring Research Center,=C2=A0<font color=3D"#000000" face=3D"Calibri">Oregon = Health &amp; Science University</font></font></div><div style=3D"margin:0in= 0in 0pt"><font face=3D"monospace" size=3D"1"><font color=3D"#000000">&quot= ;D</font><span style=3D"color:rgb(76,76,76)">iversity is like being invited= to a party, Inclusion is being asked to dance, and Belonging is dancing li= ke no one=E2=80=99s watching&quot; - Gregory Lewis</span></font></div><div = style=3D"margin:0in 0in 0pt"><span><font color=3D"#000000" face=3D"Times Ne= w Roman" size=3D"3"> </font></span></div><font color=3D"#000000" face=3D"Times New Roman" size= =3D"3"> </font></div></div></div></div></div></div><br></div></div></div></div><br>= <div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Au= g 11, 2022 at 11:59 PM Junfeng Li &lt;<a href=3D"mailto:junfeng.li.1979@xxxxxxxx= il.com">junfeng.li.1979@xxxxxxxx</a>&gt; wrote:<br></div><blockquote class= =3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg= b(204,204,204);padding-left:1ex"><div dir=3D"ltr">Dear=C2=A0 Leslie,<div><br></div><div>When downsampling to 8/16kHz, we really found th= e localization accuracy decreases,=C2=A0even for horizon</div><div>Do you h= ave any good ideas to solve it?</div><div><br></div><div>Thanks a lot.</div= ><div><br></div><div>Best regards,</div><div>Junfeng=C2=A0</div><div><br></= div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_at= tr">On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith &lt;<a href=3D"mailto= :l.s.smith@xxxxxxxx" target=3D"_blank">l.s.smith@xxxxxxxx</a>&gt;= wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px = 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I&#39;d = also wonder about the time resolution: 16KHz =3D 1/16000 sec between<br> samples =3D 62 microseconds<br> .<br> That&#39;s relatively long for ITD (TDOA) estimation, which would suggest t= hat<br> localisation of lower frequency signals would be impeded.<br> <br> (I don&#39;t have evidence for this: it&#39;s just a suggestion).<br> <br> --Leslie Smith<br> <br> Junfeng Li wrote:<br> &gt; Dear all,<br> &gt;<br> &gt; We are working on 3D audio rendering for signals with low sampling<br> &gt; frequency.<br> &gt; As you may know, the HRTFs=C2=A0 are normally measured at the high sam= pling<br> &gt; frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of<= br> &gt; sound<br> &gt; signals in our application is restricted to 16 kHz. Therefore, to rend= er<br> &gt; this low-frequency (=E2=89=A48kHz) signal, one straight way is to firs= t<br> &gt; downsample<br> &gt; the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound<br> &gt; signals.<br> &gt; However, the sound localization performance of the signal rendered wit= h<br> &gt; this approach is greatly decreased, especially elevation perception. T= o<br> &gt; improve the sound localization performance, I am now wondering whether= <br> &gt; there is a certain good method to solve or mitigate this problem in th= is<br> &gt; scenario.<br> &gt;<br> &gt; Any discussion is welcome.<br> &gt;<br> &gt; Thanks a lot again.<br> &gt;<br> &gt; Best regards,<br> &gt; Junfeng<br> &gt;<br> <br> <br> -- <br> Prof Leslie Smith (Emeritus)<br> Computing Science &amp; Mathematics,<br> University of Stirling, Stirling FK9 4LA<br> Scotland, UK<br> Tel +44 1786 467435<br> Web: <a href=3D"http://www.cs.stir.ac.uk/~lss" rel=3D"noreferrer" target=3D= "_blank">http://www.cs.stir.ac.uk/~lss</a><br> Blog: <a href=3D"http://lestheprof.com" rel=3D"noreferrer" target=3D"_blank= ">http://lestheprof.com</a><br> <br> </blockquote></div> </blockquote></div> --000000000000ce23ff05e60e122d--


This message came from the mail archive
src/postings/2022/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University