Subject: Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency From: "Richard F. Lyon" <dicklyon@xxxxxxxx> Date: Sat, 13 Aug 2022 21:51:56 -0700--00000000000001046005e62c4763 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I should have mentioned that the same figure in my book also shows low-frequency cues to vertical angle, from torso and shoulder bounce effects. I don't know how effective these are in overcoming the loss of higher-frequency cues, but they're something you could work with, and perhaps try to exaggerate. Dick On Sat, Aug 13, 2022 at 8:03 PM Richard F. Lyon <dicklyon@xxxxxxxx> wrote: > Yes, good idea to find some solutions to the difficult. > > Reviewing my book's Figure 22.7, there's a pretty good spectral notch cue > to elevation in the 5.5-8 kHz region (and higher); 8 kHz might be enough > for elevation up to about 45 degrees (find free book PDF via > machinehearing.org -- search that blog for "free".) > > For resolving front/back confusion, that's hard unless you add the effect= s > of lateralization change with head turning. Using a head tracker or gyro > to change the lateral angle to the sound, relative to the head, is very > effective for letting the user disambiguate, if they have time to move a > little. So it depends on what you're trying to do. > > If it was impossible to localize sounds with a 16 kHz sample rate, it > would be equally impossible to localize sounds with no energy about 8 kHz= . > I don't think that's the case. I can't hear anything about 8 kHz (unless > it's quite intense), and I don't sense that I have any difficulty > localizing sounds around me. Probably if we measured though we'd find I'= m > not as accurate as a person with better hearing. > > And I don't think the sample interval of 1/16000 sec provides a strong > inherent limit on ITD accuracy. The bandwidth of 8 kHz is about half of > what's "normal", so theoretical TDOA resolution should be expected to be = no > worse than double normal, say 20=E2=80=9340 microseconds (about half of a= sample > interval) instead of 10=E2=80=9320 microseconds. I wouldn't be surprised= if the > ITD resolution threshold was even closer to normal (around 1/4 sample > interval), since our ITD-computing structure is dominated by > lower-frequency input. > > Dick > > > > > On Fri, Aug 12, 2022 at 9:20 PM Junfeng Li <junfeng.li.1979@xxxxxxxx> > wrote: > >> Dear Frederick, >> >> Thank you so much for the references that you mentioned. >> >> "[...] up=E2=80=93down cues are located mainly in the 6=E2=80=9312-kHz b= and, and >> front=E2=80=93back cues in the 8=E2=80=9316-kHz band." >> According to this statement, it seems impossible to solve the problems o= f >> elevation perception and front-back confusion when the output signal is >> sampled at 16kHz. >> Though I know it is difficult, I always try to find some solutions. >> >> Thanks again. >> >> Best regards, >> Junfeng >> >> On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun <fgallun@xxxxxxxx> >> wrote: >> >>> The literature on the HRTF over the past 60 years has made it very clea= r >>> that "[...] up=E2=80=93down cues are located mainly in the 6=E2=80=9312= -kHz band, and >>> front=E2=80=93back cues in the 8=E2=80=9316-kHz band." (Langendiijk and= Bronkhorst, 2002) >>> >>> Here are a few places to start: >>> >>> Langendijk, E. H. A., & Bronkhorst, A. W. (2002). Contribution of >>> spectral cues to human sound localization. *The Journal of the >>> Acoustical Society of America*, *112*(4), 1583=E2=80=931596. >>> https://doi.org/10.1121/1.1501901 >>> >>> Mehrgardt, S., & Mellert, V. (1977). Transformation characteristics of >>> the external human ear. *The Journal of the Acoustical Society of >>> America*, *61*(6), 1567=E2=80=931576. https://doi.org/10.1121/1.381470 >>> >>> Shaw, E. a. G., & Teranishi, R. (1968). Sound Pressure Generated in an >>> External=E2=80=90Ear Replica and Real Human Ears by a Nearby Point Sour= ce. *The >>> Journal of the Acoustical Society of America*, *44*(1), 240=E2=80=93249= . >>> https://doi.org/10.1121/1.1911059 >>> >>> --------------------------------------------- >>> >>> Frederick (Erick) Gallun, PhD, FASA, FASHA | he/him/his >>> Professor, Oregon Hearing Research Center, Oregon Health & Science >>> University >>> "Diversity is like being invited to a party, Inclusion is being asked >>> to dance, and Belonging is dancing like no one=E2=80=99s watching" - Gr= egory Lewis >>> >>> >>> On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li <junfeng.li.1979@xxxxxxxx> >>> wrote: >>> >>>> Dear Leslie, >>>> >>>> When downsampling to 8/16kHz, we really found the localization accurac= y >>>> decreases, even for horizon >>>> Do you have any good ideas to solve it? >>>> >>>> Thanks a lot. >>>> >>>> Best regards, >>>> Junfeng >>>> >>>> >>>> On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith < >>>> l.s.smith@xxxxxxxx> wrote: >>>> >>>>> I'd also wonder about the time resolution: 16KHz =3D 1/16000 sec betw= een >>>>> samples =3D 62 microseconds >>>>> . >>>>> That's relatively long for ITD (TDOA) estimation, which would suggest >>>>> that >>>>> localisation of lower frequency signals would be impeded. >>>>> >>>>> (I don't have evidence for this: it's just a suggestion). >>>>> >>>>> --Leslie Smith >>>>> >>>>> Junfeng Li wrote: >>>>> > Dear all, >>>>> > >>>>> > We are working on 3D audio rendering for signals with low sampling >>>>> > frequency. >>>>> > As you may know, the HRTFs are normally measured at the high >>>>> sampling >>>>> > frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency = of >>>>> > sound >>>>> > signals in our application is restricted to 16 kHz. Therefore, to >>>>> render >>>>> > this low-frequency (=E2=89=A48kHz) signal, one straight way is to f= irst >>>>> > downsample >>>>> > the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound >>>>> > signals. >>>>> > However, the sound localization performance of the signal rendered >>>>> with >>>>> > this approach is greatly decreased, especially elevation perception= . >>>>> To >>>>> > improve the sound localization performance, I am now wondering >>>>> whether >>>>> > there is a certain good method to solve or mitigate this problem in >>>>> this >>>>> > scenario. >>>>> > >>>>> > Any discussion is welcome. >>>>> > >>>>> > Thanks a lot again. >>>>> > >>>>> > Best regards, >>>>> > Junfeng >>>>> > >>>>> >>>>> >>>>> -- >>>>> Prof Leslie Smith (Emeritus) >>>>> Computing Science & Mathematics, >>>>> University of Stirling, Stirling FK9 4LA >>>>> Scotland, UK >>>>> Tel +44 1786 467435 >>>>> Web: http://www.cs.stir.ac.uk/~lss >>>>> Blog: http://lestheprof.com >>>>> >>>>> --00000000000001046005e62c4763 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-size:small">I s= hould have mentioned that the same figure in my book also shows low-frequen= cy cues to vertical angle, from torso and shoulder bounce effects.=C2=A0 I = don't know how effective these are in overcoming the loss of higher-fre= quency cues, but they're something you could work with, and perhaps try= to exaggerate.</div><div class=3D"gmail_default" style=3D"font-size:small"= >Dick</div><div class=3D"gmail_default" style=3D"font-size:small"><br></div= ></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr"= >On Sat, Aug 13, 2022 at 8:03 PM Richard F. Lyon <<a href=3D"mailto:dick= lyon@xxxxxxxx">dicklyon@xxxxxxxx</a>> wrote:<br></div><blockquote class=3D= "gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(2= 04,204,204);padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_default"= style=3D"font-size:small">Yes, good idea to find some solutions to the dif= ficult.</div><div class=3D"gmail_default" style=3D"font-size:small"><br></d= iv><div class=3D"gmail_default" style=3D"font-size:small">Reviewing my book= 's Figure 22.7, there's a pretty good spectral notch cue to elevati= on in the 5.5-8 kHz region (and higher); 8 kHz might be enough for elevatio= n up to about 45 degrees (find free book PDF via <a href=3D"http://machineh= earing.org" target=3D"_blank">machinehearing.org</a> -- search that blog fo= r "free".)<br></div><div class=3D"gmail_default" style=3D"font-si= ze:small"><br></div><div class=3D"gmail_default" style=3D"font-size:small">= For resolving front/back confusion, that's hard unless you add the effe= cts of lateralization change with head turning.=C2=A0 Using a head tracker = or gyro to change the lateral angle to the sound, relative to the head, is = very effective for letting the user disambiguate, if they have time to move= a little.=C2=A0 So it depends on what you're trying to do.</div><div c= lass=3D"gmail_default" style=3D"font-size:small"><br></div><div class=3D"gm= ail_default" style=3D"font-size:small">If it was impossible to localize sou= nds with a 16 kHz sample rate, it would be equally impossible to localize s= ounds with no energy about 8 kHz.=C2=A0 I don't think that's the ca= se.=C2=A0 I can't hear anything about 8 kHz (unless it's quite inte= nse), and I don't sense that I have any difficulty localizing sounds ar= ound me.=C2=A0 Probably if we measured though we'd find I'm not as = accurate as a person with better hearing.</div><div class=3D"gmail_default"= style=3D"font-size:small"><br></div><div class=3D"gmail_default" style=3D"= font-size:small">And I don't think the sample interval of 1/16000 sec p= rovides a strong inherent limit on ITD accuracy.=C2=A0 The bandwidth of 8 k= Hz is about half of what's "normal", so theoretical TDOA reso= lution should be expected to be no worse than double normal, say 20=E2=80= =9340 microseconds (about half of a sample interval) instead of 10=E2=80=93= 20 microseconds.=C2=A0 I wouldn't be surprised if the ITD resolution th= reshold was even closer to normal (around 1/4 sample interval), since our I= TD-computing structure is dominated by lower-frequency input.<br></div><div= class=3D"gmail_default" style=3D"font-size:small"><br></div><div class=3D"= gmail_default" style=3D"font-size:small">Dick</div><div class=3D"gmail_defa= ult" style=3D"font-size:small"><br></div><div class=3D"gmail_default" style= =3D"font-size:small"><br></div><div class=3D"gmail_default" style=3D"font-s= ize:small"><br></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" = class=3D"gmail_attr">On Fri, Aug 12, 2022 at 9:20 PM Junfeng Li <<a href= =3D"mailto:junfeng.li.1979@xxxxxxxx" target=3D"_blank">junfeng.li.1979@xxxxxxxx= il.com</a>> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"m= argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left= :1ex"><div dir=3D"ltr">Dear Frederick,<div><br></div><div>Thank you so much= for the references that you mentioned.=C2=A0</div><div><br></div><div>&quo= t;[...]=C2=A0up=E2=80=93down cues are located mainly in the 6=E2=80=9312-kH= z band, and front=E2=80=93back cues in the 8=E2=80=9316-kHz band."=C2= =A0<br></div><div>According to this statement, it seems impossible to solve= the problems of elevation perception and front-back confusion when the out= put signal is sampled at 16kHz.=C2=A0</div><div>Though I know it is difficu= lt, I always=C2=A0try to find some solutions.</div><div><br></div><div>Than= ks again.</div><div><br></div><div>Best regards,</div><div>Junfeng=C2=A0</d= iv></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_att= r">On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun <<a href=3D"mailto:= fgallun@xxxxxxxx" target=3D"_blank">fgallun@xxxxxxxx</a>> wrote:<br></= div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor= der-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">The = literature on the HRTF over the past 60 years has made it very clear that &= quot;[...]=C2=A0up=E2=80=93down cues are located mainly in the 6=E2=80=9312= -kHz band, and front=E2=80=93back cues in the 8=E2=80=9316-kHz band." = (Langendiijk=C2=A0and Bronkhorst, 2002)=C2=A0=C2=A0<div><br></div><div>Here= are a few places to start:<div><br></div><div>Langendijk, E. H. A., & = Bronkhorst, A. W. (2002). Contribution of spectral cues to human sound loca= lization. <i>The Journal of the Acoustical Society of America</i>, <i>112</= i>(4), 1583=E2=80=931596. <a href=3D"https://doi.org/10.1121/1.1501901" tar= get=3D"_blank">https://doi.org/10.1121/1.1501901</a></div><div><br></div><d= iv>Mehrgardt, S., & Mellert, V. (1977). Transformation characteristics = of the external human ear. <i>The Journal of the Acoustical Society of Amer= ica</i>, <i>61</i>(6), 1567=E2=80=931576. <a href=3D"https://doi.org/10.112= 1/1.381470" target=3D"_blank">https://doi.org/10.1121/1.381470</a></div><di= v><div style=3D"line-height:2;margin-left:2em"> <span title=3D"url_ver=3DZ39.88-2004&ctx_ver=3DZ39.88-2004&rfr_id= =3Dinfo%3Asid%2Fzotero.org%3A2&rft_id=3Dinfo%3Adoi%2F10.1121%2F1.381470= &rft_val_fmt=3Dinfo%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=3Da= rticle&rft.atitle=3DTransformation%20characteristics%20of%20the%20exter= nal%20human%20ear&rft.jtitle=3DThe%20Journal%20of%20the%20Acoustical%20= Society%20of%20America&rft.volume=3D61&rft.issue=3D6&rft.aufirs= t=3DS.&rft.aulast=3DMehrgardt&rft.au=3DS.%20Mehrgardt&rft.au=3D= V.%20Mellert&rft.date=3D1977-06&rft.pages=3D1567-1576&rft.spage= =3D1567&rft.epage=3D1576&rft.issn=3D0001-4966"></span></div></div><= div><br></div><div>Shaw, E. a. G., & Teranishi, R. (1968). Sound Pressu= re Generated in an External=E2=80=90Ear Replica and Real Human Ears by a Ne= arby Point Source. <i>The Journal of the Acoustical Society of America</i>,= <i>44</i>(1), 240=E2=80=93249. <a href=3D"https://doi.org/10.1121/1.191105= 9" target=3D"_blank">https://doi.org/10.1121/1.1911059</a></div><div><div><= div style=3D"line-height:2;margin-left:2em"> <span title=3D"url_ver=3DZ39.88-2004&ctx_ver=3DZ39.88-2004&rfr_id= =3Dinfo%3Asid%2Fzotero.org%3A2&rft_id=3Dinfo%3Adoi%2F10.1121%2F1.191105= 9&rft_val_fmt=3Dinfo%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=3D= article&rft.atitle=3DSound%20Pressure%20Generated%20in%20an%20External%= E2%80%90Ear%20Replica%20and%20Real%20Human%20Ears%20by%20a%20Nearby%20Point= %20Source&rft.jtitle=3DThe%20Journal%20of%20the%20Acoustical%20Society%= 20of%20America&rft.volume=3D44&rft.issue=3D1&rft.aufirst=3DE.%2= 0a.%20G.&rft.aulast=3DShaw&rft.au=3DE.%20a.%20G.%20Shaw&rft.au= =3DR.%20Teranishi&rft.date=3D1968-07&rft.pages=3D240-249&rft.sp= age=3D240&rft.epage=3D249&rft.issn=3D0001-4966"></span></div></div>= <div><br></div><div><div><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"= ><div dir=3D"ltr"><div dir=3D"ltr"><p style=3D"margin:5pt 0in"><span style= =3D"font-size:10pt"><font face=3D"Calibri" color=3D"#000000">--------------= -------------------------------</font></span></p><p style=3D"margin:0in 0in= 0pt"><font face=3D"Calibri" color=3D"#000000">Frederick (Erick) Gallun, Ph= D, FASA, FASHA | he/him/his<br></font></p><div style=3D"margin:0in 0in 0pt"= ><font face=3D"Calibri" color=3D"#000000">Professor,=C2=A0Oregon Hearing Re= search Center,=C2=A0<font face=3D"Calibri" color=3D"#000000">Oregon Health = & Science University</font></font></div><div style=3D"margin:0in 0in 0p= t"><font size=3D"1" face=3D"monospace"><font color=3D"#000000">"D</fon= t><span style=3D"color:rgb(76,76,76)">iversity is like being invited to a p= arty, Inclusion is being asked to dance, and Belonging is dancing like no o= ne=E2=80=99s watching" - Gregory Lewis</span></font></div><div style= =3D"margin:0in 0in 0pt"><span><font size=3D"3" face=3D"Times New Roman" col= or=3D"#000000"> </font></span></div><font size=3D"3" face=3D"Times New Roman" color=3D"#000= 000"> </font></div></div></div></div></div></div><br></div></div></div></div><br>= <div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Au= g 11, 2022 at 11:59 PM Junfeng Li <<a href=3D"mailto:junfeng.li.1979@xxxxxxxx= il.com" target=3D"_blank">junfeng.li.1979@xxxxxxxx</a>> wrote:<br></div= ><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border= -left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Dear=C2= =A0 Leslie,<div><br></div><div>When downsampling to 8/16kHz, we really found th= e localization accuracy decreases,=C2=A0even for horizon</div><div>Do you h= ave any good ideas to solve it?</div><div><br></div><div>Thanks a lot.</div= ><div><br></div><div>Best regards,</div><div>Junfeng=C2=A0</div><div><br></= div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_at= tr">On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith <<a href=3D"mailto= :l.s.smith@xxxxxxxx" target=3D"_blank">l.s.smith@xxxxxxxx</a>>= wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px = 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I'd = also wonder about the time resolution: 16KHz =3D 1/16000 sec between<br> samples =3D 62 microseconds<br> .<br> That's relatively long for ITD (TDOA) estimation, which would suggest t= hat<br> localisation of lower frequency signals would be impeded.<br> <br> (I don't have evidence for this: it's just a suggestion).<br> <br> --Leslie Smith<br> <br> Junfeng Li wrote:<br> > Dear all,<br> ><br> > We are working on 3D audio rendering for signals with low sampling<br> > frequency.<br> > As you may know, the HRTFs=C2=A0 are normally measured at the high sam= pling<br> > frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of<= br> > sound<br> > signals in our application is restricted to 16 kHz. Therefore, to rend= er<br> > this low-frequency (=E2=89=A48kHz) signal, one straight way is to firs= t<br> > downsample<br> > the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound<br> > signals.<br> > However, the sound localization performance of the signal rendered wit= h<br> > this approach is greatly decreased, especially elevation perception. T= o<br> > improve the sound localization performance, I am now wondering whether= <br> > there is a certain good method to solve or mitigate this problem in th= is<br> > scenario.<br> ><br> > Any discussion is welcome.<br> ><br> > Thanks a lot again.<br> ><br> > Best regards,<br> > Junfeng<br> ><br> <br> <br> -- <br> Prof Leslie Smith (Emeritus)<br> Computing Science & Mathematics,<br> University of Stirling, Stirling FK9 4LA<br> Scotland, UK<br> Tel +44 1786 467435<br> Web: <a href=3D"http://www.cs.stir.ac.uk/~lss" rel=3D"noreferrer" target=3D= "_blank">http://www.cs.stir.ac.uk/~lss</a><br> Blog: <a href=3D"http://lestheprof.com" rel=3D"noreferrer" target=3D"_blank= ">http://lestheprof.com</a><br> <br> </blockquote></div> </blockquote></div> </blockquote></div> </blockquote></div> </blockquote></div> --00000000000001046005e62c4763--