Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency (Neeraj Sharma )

Subject: Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency From: Neeraj Sharma <neerajww@xxxxxxxx> Date: Thu, 11 Aug 2022 11:15:00 +0530 --00000000000033e5ad05e5f0abe8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Dear Junfeng, Thanks for sharing this observation here. I do not have a solution now but curious to know more. I can relate the loss in elevation to poor capture of the spectral notches present in the HRTF. But I did not assume that the notches beyond 8kHz are this crucial. Are the HRTF personalized? Also, I am now wondering, is it always the case that elevation information is poor for 16 kHz audio signals. Is there literature on this? Just a quick shot, I will also try downsampling (without low pass filtering) the HRTF to 16 kHz and see if the aliased HRTF spectrum significantly corrupts the 3-D perception. I will bet - not much. But will keep fingers crossed. Cheers, Neeraj On Thu, Aug 11, 2022 at 11:04 AM Junfeng Li <junfeng.li.1979@xxxxxxxx> wrote: > Dear Dick, > > Thanks a lot for your information. > > Yeah, the main problem for us is the limitation of the 16kHz sampling > frequency at the output side. Therefore, even if we do bandwidth extensio= n > for input signal, we have to downsample to 16kHz after 3D rendering > processing. I am wondering there is any possible/potential method using > some pychoacoustic principle, like that? > > Thanks again. > > Best regards > Junfeng > > On Thu, Aug 11, 2022 at 12:29 PM Richard F. Lyon <dicklyon@xxxxxxxx> wrote= : > >> You could do "bandwidth extension" on the signals you want to spatialize= , >> e.g. with some of the methods at >> https://gfx.cs.princeton.edu/pubs/Su_2021_BEI/ICASSP2021_Su_Wang_BWE.pdf >> and then apply the high-sample-rate HRTFs. >> Of course, if your system has a 16 ksps limitation on the output side, >> that will be of no use. >> >> Dick >> >> >> On Wed, Aug 10, 2022 at 9:22 PM Junfeng Li <junfeng.li.1979@xxxxxxxx> >> wrote: >> >>> Dear all, >>> >>> We are working on 3D audio rendering for signals with low sampling >>> frequency. >>> As you may know, the HRTFs are normally measured at the high sampling >>> frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of s= ound >>> signals in our application is restricted to 16 kHz. Therefore, to rende= r >>> this low-frequency (=E2=89=A48kHz) signal, one straight way is to first= downsample >>> the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound sign= als. >>> However, the sound localization performance of the signal rendered with >>> this approach is greatly decreased, especially elevation perception. To >>> improve the sound localization performance, I am now wondering whether >>> there is a certain good method to solve or mitigate this problem in thi= s >>> scenario. >>> >>> Any discussion is welcome. >>> >>> Thanks a lot again. >>> >>> Best regards, >>> Junfeng >>> >> --00000000000033e5ad05e5f0abe8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Dear Junfeng,<div><br></div><div>Thanks for sharing this o= bservation here. I do not have a solution now but curious to know more.</di= v><div>I can relate the=C2=A0loss in elevation to poor capture of the spect= ral notches present in the HRTF. But I did not assume=C2=A0that the notches= beyond 8kHz are this crucial. Are the HRTF personalized?</div><div><br></d= iv><div>Also, I am now wondering, is it always the case that elevation info= rmation is poor for=C2=A016 kHz audio signals. Is there literature on this?= </div><div>Just a quick shot, I will also try downsampling (without low pas= s filtering) the HRTF to 16 kHz and see if the aliased=C2=A0HRTF spectrum s= ignificantly corrupts the 3-D perception. I will bet - not much. But will k= eep fingers crossed.</div><div><br></div><div>Cheers,</div><div>Neeraj</div= ></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr"= >On Thu, Aug 11, 2022 at 11:04 AM Junfeng Li <<a href=3D"mailto:junfeng.= li.1979@xxxxxxxx">junfeng.li.1979@xxxxxxxx</a>> wrote:<br></div><block= quote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1= px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Dear Dick,=C2= =A0<div><br></div><div>Thanks a lot for your information.</div><div><br></d= iv><div>Yeah, the main problem for us is the limitation of the 16kHz sampli= ng frequency at the output side. Therefore, even if we do bandwidth extensi= on for input signal, we have to downsample to 16kHz after 3D rendering proc= essing. I am wondering there is any possible/potential method using some py= choacoustic principle, like that?</div><div><br></div><div>Thanks again.</d= iv><div><br></div><div>Best regards</div><div>Junfeng=C2=A0</div></div><br>= <div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Au= g 11, 2022 at 12:29 PM Richard F. Lyon <<a href=3D"mailto:dicklyon@xxxxxxxx= rg" target=3D"_blank">dicklyon@xxxxxxxx</a>> wrote:<br></div><blockquote = class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px sol= id rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_= default" style=3D"font-size:small">You could do "bandwidth extension&q= uot; on the signals you want to spatialize, e.g. with some of the methods a= t <br></div><div class=3D"gmail_default" style=3D"font-size:small"><a href= =3D"https://gfx.cs.princeton.edu/pubs/Su_2021_BEI/ICASSP2021_Su_Wang_BWE.pd= f" target=3D"_blank">https://gfx.cs.princeton.edu/pubs/Su_2021_BEI/ICASSP20= 21_Su_Wang_BWE.pdf</a></div><div class=3D"gmail_default" style=3D"font-size= :small">and then apply the high-sample-rate HRTFs.=C2=A0 <br></div><div cla= ss=3D"gmail_default" style=3D"font-size:small">Of course, if your system ha= s a 16 ksps limitation on the output side, that will be of no use.<br></div= ><div class=3D"gmail_default" style=3D"font-size:small"><br></div><div clas= s=3D"gmail_default" style=3D"font-size:small">Dick</div><div class=3D"gmail= _default" style=3D"font-size:small"><br></div></div><br><div class=3D"gmail= _quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, Aug 10, 2022 at 9:22 = PM Junfeng Li <<a href=3D"mailto:junfeng.li.1979@xxxxxxxx" target=3D"_b= lank">junfeng.li.1979@xxxxxxxx</a>> wrote:<br></div><blockquote class= =3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg= b(204,204,204);padding-left:1ex"><div dir=3D"ltr">Dear all,=C2=A0<div><br><= /div><div>We are working on 3D audio rendering for signals with low samplin= g frequency.=C2=A0</div><div>As you may know, the HRTFs=C2=A0 are normally = measured at the high sampling frequency, e.g., 48kHz or 44.1kHz. However, t= he sampling frequency of sound signals in our application=C2=A0is restricte= d to 16 kHz. Therefore, to render this low-frequency (=E2=89=A48kHz) signal= , one straight way is to first=C2=A0downsample the HRTFs from 48kHz/44.1kHz= to 16kHz and then=C2=A0convolve with sound signals. However, the sound loc= alization performance of the signal rendered=C2=A0with this approach is gre= atly decreased, especially elevation perception. To improve the=C2=A0sound = localization performance, I am now wondering whether there is a certain goo= d method to solve or mitigate this problem in this scenario.=C2=A0</div><di= v><br></div><div>Any discussion is welcome.</div><div><br></div><div>Thanks= a lot again.</div><div><br></div><div>Best regards,</div><div>Junfeng=C2= =A0</div></div> </blockquote></div> </blockquote></div> </blockquote></div> --00000000000033e5ad05e5f0abe8--

This message came from the mail archive
src/postings/2022/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University