Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency (Adam Weisser )


Subject: Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency
From:    Adam Weisser  <adam_weisser@xxxxxxxx>
Date:    Sun, 14 Aug 2022 19:52:59 -0300

--1eebb7b3f1084a8c8a0c74dbb243bb72 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable Dear Junfeng, Alain, and all, I think that some solutions to the undersampling / aliasing problem that= you described should exist, but they likely depend on where the samplin= g-rate bottleneck lies: at the input, in processing, at the output stage= , or in all of them. Also, it depends on the computational capabilities = of the system and whether it has to work in real time, and if so what th= e permissible delay is.=20 I'm aware of two general approaches to circumvent the Nyquist criterion:=20 1. Compressed sensing - This heavily researched signal-processing method= uses signal sparsity to faithfully reconstruct undersampled signals [1]. 2. Trading off aliasing and noise - This is a classical result that empl= oys nonuniform sampling at lower rates than Nyquist, whereby the aliasin= g that would otherwise arise is replaced by noise [2]. It is thought tha= t this is what happens in the retina, where the optical image is densely= sampled in the fovea by the photoreceptors, but becomes gradually under= sampled away from the fovea [3]. Had the photoreceptor density been unif= orm and regular over the retina, the resolution of the central vision wo= uld great suffer and the image would also be severely aliased. However, = this trick works only if the sampling is truly stochastic. If the "local= ization noise" level (maybe manifest as audio noise) can be sacrificed, = then this approach may work, combined with dither. Regardless of the specific system architecture at hand, none of these me= thods appears straightforward to implement. Finally, regarding Alain's comment about auditory sampling - the neat tr= ick that is found in spatial processing of vision may be analogous to wh= at goes on in temporal processing of stimuli at the transduction stage o= f the auditory nerve. Neural adaptation can be thought of as dense sampl= ing of the signal around its onset / transient portion, which becomes mo= re sparsely sampled quickly after the onset. Because of adaptation, this= effect is very illusive, but I believe that it is measurable notwithsta= nding. I tried to demonstrate it psychoacoustically in Appendix E of [4]= . While I don't know how it relates to binaural processing directly, the= re may be instantaneous effects that may be detectable there too, given = that the input to both processing types is the same.=20 All the best, Adam. [1] Candes, E. J., Romberg, J. K., & Tao, T. (2006). Stable signal recov= ery from incomplete and inaccurate measurements. Communications on Pure = and Applied Mathematics: A Journal Issued by the Courant Institute of Ma= thematical Sciences, 59(8), 1207-1223. [2] Shapiro, Harold S and Silverman, Richard A. Alias-free sampling of r= andom noise. Journal of the Society for Industrial and Applied Mathematics, 8(2):225?248, 1960. [3] Yellott, John I. Spectral consequences of photoreceptor sampling in = the rhesus retina. Science, 221 (4608):382?385, 1983. [4] Weisser, A. (2021). Treatise on Hearing: The Temporal Auditory Imagi= ng Theory Inspired by Optics and Communication. <https://arxiv.org/abs/= 2111.04338>*arXiv preprint arXiv:2111.04338 <https://arxiv.org/abs/2111.= 04338>*. <https://arxiv.org/abs/2111.04338> On Sun, Aug 14, 2022, at 4:47 AM, Alain de Cheveigne wrote: > Hi Dick, all,=20 >=20 > A couple of thoughts. I'm no expert of spatial hearing, so they may b= e off the mark.=20 >=20 > > And I don't think the sample interval of 1/16000 sec provides a stro= ng inherent limit on ITD accuracy. The bandwidth of 8 kHz is about half= of what's "normal", so theoretical TDOA resolution should be expected t= o be no worse than double normal, say 20=E2=80=9340 microseconds (about = half of a sample interval) instead of 10=E2=80=9320 microseconds. I wou= ldn't be surprised if the ITD resolution threshold was even closer to no= rmal (around 1/4 sample interval), since our ITD-computing structure is = dominated by lower-frequency input. >=20 >=20 > Indeed, the sample interval does not limit ITD estimation resolution. = You can get arbitrary resolution by interpolating the cross-correlation= function near its peak (for example by fitting a parabola to three samp= les closest to the peak). A similar argument applies to fundamental fre= quency estimation (--> pitch) from the autocorrelation function as in th= e YIN method.=20 >=20 > This assumes that the CCF or ACF is smooth enough for the interpolatio= n to be accurate, and for that the audio signals must be smooth, i.e. ba= nd-limited. The purpose of a low-pass antialiasing filter associated wi= th sampling or resampling is to *insure* that this is the case for typic= al signals, but that "insurance" is unnecessary if the signals contain n= o high-frequency power to start with. =20 >=20 > Thus, the choice of low-pass filter is a bit of a free parameter under= the control of the engineer or experimenter. A wide filter (or none) is= OK if the signals are known to contain little or no high-frequency powe= r, a sharp filter is needed if the signals are strongly high-pass. Engin= eers typically err on the side of precaution by designing filters with s= trong attenuation beyond Nyquist, usually with the additional goal of ke= eping the pass-band flat. This requires a filter with a long impulse res= ponse. There's lee-way in the exact choice. EEs love the topic. >=20 > This brings me to my second point. Are there perceptual correlates of = antialiasing filtering? There are two reasons to suspect an effect on s= patial hearing. First, a long IR might widen the CCF peak and blur the "= crisp" peak in the short-term CCF associated with a transient. Second, t= he frequency-domain features of the filter transfer function might inter= act with spectral notches characteristic of elevation or front-vs-back p= osition of sources, particularly if those features are estimated by neur= al circuits also sensitive to time. >=20 > Again, this is pure speculation. Unfortunately, antialiasing filters a= re rarely specified in detail (in systems or studies), and I'm not aware= of any study aiming to characterize their perceptual effects or demonst= rate that there are none. Anecdotally, I remember being annoyed when li= stening to music on an early CD player, by what I attributed to high-fre= quency ringing of antialiasing or reconstruction filters with poles just= below Nyquist. That was when I could still hear in that region... >=20 > Alain >=20 >=20 >=20 >=20 >=20 >=20 >=20 > > On 14 Aug 2022, at 05:03, Richard F. Lyon <DickLyon@xxxxxxxx> wrote: > >=20 > > Yes, good idea to find some solutions to the difficult. > >=20 > > Reviewing my book's Figure 22.7, there's a pretty good spectral notc= h cue to elevation in the 5.5-8 kHz region (and higher); 8 kHz might be = enough for elevation up to about 45 degrees (find free book PDF via mach= inehearing.org -- search that blog for "free".) > >=20 > > For resolving front/back confusion, that's hard unless you add the e= ffects of lateralization change with head turning. Using a head tracker= or gyro to change the lateral angle to the sound, relative to the head,= is very effective for letting the user disambiguate, if they have time = to move a little. So it depends on what you're trying to do. > >=20 > > If it was impossible to localize sounds with a 16 kHz sample rate, i= t would be equally impossible to localize sounds with no energy about 8 = kHz. I don't think that's the case. I can't hear anything about 8 kHz = (unless it's quite intense), and I don't sense that I have any difficult= y localizing sounds around me. Probably if we measured though we'd find= I'm not as accurate as a person with better hearing. > >=20 > > And I don't think the sample interval of 1/16000 sec provides a stro= ng inherent limit on ITD accuracy. The bandwidth of 8 kHz is about half= of what's "normal", so theoretical TDOA resolution should be expected t= o be no worse than double normal, say 20=E2=80=9340 microseconds (about = half of a sample interval) instead of 10=E2=80=9320 microseconds. I wou= ldn't be surprised if the ITD resolution threshold was even closer to no= rmal (around 1/4 sample interval), since our ITD-computing structure is = dominated by lower-frequency input. > >=20 > > Dick > >=20 > >=20 > >=20 > >=20 > > On Fri, Aug 12, 2022 at 9:20 PM Junfeng Li <junfeng.li.1979@xxxxxxxx= m> wrote: > > Dear Frederick, > >=20 > > Thank you so much for the references that you mentioned.=20 > >=20 > > "[...] up=E2=80=93down cues are located mainly in the 6=E2=80=9312-k= Hz band, and front=E2=80=93back cues in the 8=E2=80=9316-kHz band."=20 > > According to this statement, it seems impossible to solve the proble= ms of elevation perception and front-back confusion when the output sign= al is sampled at 16kHz.=20 > > Though I know it is difficult, I always try to find some solutions. > >=20 > > Thanks again. > >=20 > > Best regards, > > Junfeng=20 > >=20 > > On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun <fgallun@xxxxxxxx= > wrote: > > The literature on the HRTF over the past 60 years has made it very c= lear that "[...] up=E2=80=93down cues are located mainly in the 6=E2=80=93= 12-kHz band, and front=E2=80=93back cues in the 8=E2=80=9316-kHz band." = (Langendiijk and Bronkhorst, 2002) =20 > >=20 > > Here are a few places to start: > >=20 > > Langendijk, E. H. A., & Bronkhorst, A. W. (2002). Contribution of sp= ectral cues to human sound localization. The Journal of the Acoustical S= ociety of America, 112(4), 1583=E2=80=931596. https://doi.org/10.1121/1.= 1501901 > >=20 > > Mehrgardt, S., & Mellert, V. (1977). Transformation characteristics = of the external human ear. The Journal of the Acoustical Society of Amer= ica, 61(6), 1567=E2=80=931576. https://doi.org/10.1121/1.381470 > >=20 > > Shaw, E. a. G., & Teranishi, R. (1968). Sound Pressure Generated in = an External=E2=80=90Ear Replica and Real Human Ears by a Nearby Point So= urce. The Journal of the Acoustical Society of America, 44(1), 240=E2=80= =93249. https://doi.org/10.1121/1.1911059 > >=20 > > --------------------------------------------- > >=20 > > Frederick (Erick) Gallun, PhD, FASA, FASHA | he/him/his > > Professor, Oregon Hearing Research Center, Oregon Health & Science U= niversity > > "Diversity is like being invited to a party, Inclusion is being aske= d to dance, and Belonging is dancing like no one=E2=80=99s watching" - G= regory Lewis > >=20 > >=20 > > On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li <junfeng.li.1979@xxxxxxxx= om> wrote: > > Dear Leslie, > >=20 > > When downsampling to 8/16kHz, we really found the localization accur= acy decreases, even for horizon > > Do you have any good ideas to solve it? > >=20 > > Thanks a lot. > >=20 > > Best regards, > > Junfeng=20 > >=20 > >=20 > > On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith <l.s.smith@xxxxxxxx= .ac.uk> wrote: > > I'd also wonder about the time resolution: 16KHz =3D 1/16000 sec bet= ween > > samples =3D 62 microseconds > > . > > That's relatively long for ITD (TDOA) estimation, which would sugges= t that > > localisation of lower frequency signals would be impeded. > >=20 > > (I don't have evidence for this: it's just a suggestion). > >=20 > > --Leslie Smith > >=20 > > Junfeng Li wrote: > > > Dear all, > > > > > > We are working on 3D audio rendering for signals with low sampling > > > frequency. > > > As you may know, the HRTFs are normally measured at the high samp= ling > > > frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency= of > > > sound > > > signals in our application is restricted to 16 kHz. Therefore, to = render > > > this low-frequency (=E2=89=A48kHz) signal, one straight way is to = first > > > downsample > > > the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound > > > signals. > > > However, the sound localization performance of the signal rendered= with > > > this approach is greatly decreased, especially elevation perceptio= n. To > > > improve the sound localization performance, I am now wondering whe= ther > > > there is a certain good method to solve or mitigate this problem i= n this > > > scenario. > > > > > > Any discussion is welcome. > > > > > > Thanks a lot again. > > > > > > Best regards, > > > Junfeng > > > > >=20 > >=20 > > --=20 > > Prof Leslie Smith (Emeritus) > > Computing Science & Mathematics, > > University of Stirling, Stirling FK9 4LA > > Scotland, UK > > Tel +44 1786 467435 > > Web: http://www.cs.stir.ac.uk/~lss > > Blog: http://lestheprof.com > >=20 >=20 --1eebb7b3f1084a8c8a0c74dbb243bb72 Content-Type: text/html;charset=utf-8 Content-Transfer-Encoding: quoted-printable <!DOCTYPE html><html><head><title></title><style type=3D"text/css">p.Mso= Normal,p.MsoNoSpacing{margin:0} p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div style=3D"f= ont-family:Arial;">Dear Junfeng, Alain, and all,<br></div><div style=3D"= font-family:Arial;"><br></div><div style=3D"font-family:Arial;">I think = that some solutions to the undersampling / aliasing problem that you des= cribed should exist, but they likely depend on where the sampling-rate b= ottleneck lies: at the input, in processing, at the output stage, or in = all of them. Also, it depends on the computational capabilities of the s= ystem and whether it has to work in real time, and if so what the permis= sible delay is. <br></div><div style=3D"font-family:Arial;"><br></div><d= iv style=3D"font-family:Arial;">I'm aware of two general approaches to c= ircumvent the Nyquist criterion: <br></div><div style=3D"font-family:Ari= al;">1. Compressed sensing - This heavily researched signal-processing m= ethod uses signal sparsity to faithfully reconstruct undersampled signal= s [1].<br></div><div style=3D"font-family:Arial;"><br></div><div class=3D= "gs_citr"><span class=3D"font" style=3D"font-family:arial, sans-serif, s= ans-serif;">2. Trading off aliasing and noise - This is a classical resu= lt that employs nonuniform sampling at lower rates than Nyquist, whereby= the aliasing that would otherwise arise is replaced by noise [2]. It is= thought that this is what happens in the retina, where the optical imag= e is densely sampled in the fovea by the photoreceptors, but becomes gra= dually undersampled away from the fovea [3]. Had the photoreceptor densi= ty been uniform and regular over the retina, the resolution of the centr= al vision would great suffer and the image would also be severely aliase= d. However, this trick works only if the sampling is truly stochastic. I= f the "localization noise" level (maybe manifest as audio noise) can be = sacrificed, then this approach may work, combined with dither.</span><sp= an class=3D"font" style=3D"font-family:arial, sans-serif, sans-serif;"><= /span><br></div><div style=3D"font-family:Arial;"><br></div><div style=3D= "font-family:Arial;">Regardless of the specific system architecture at h= and, none of these methods appears straightforward to implement.<br></di= v><div style=3D"font-family:Arial;"><br></div><div style=3D"font-family:= Arial;">Finally, regarding Alain's comment about auditory sampling - the= neat trick that is found in spatial processing of vision may be analogo= us to what goes on in temporal processing of stimuli at the transduction= stage of the auditory nerve. Neural adaptation can be thought of as den= se sampling of the signal around its onset / transient portion, which be= comes more sparsely sampled quickly after the onset. Because of adaptati= on, this effect is very illusive, but I believe that it is measurable no= twithstanding. I tried to demonstrate it psychoacoustically in Appendix = E of [4]. While I don't know how it relates to binaural processing direc= tly, there may be instantaneous effects that may be detectable there too= , given that the input to both processing types is the same. <br></div><= div style=3D"font-family:Arial;"><br></div><div style=3D"font-family:Ari= al;"><div style=3D"font-family:Arial;">All the best,<br></div><div style= =3D"font-family:Arial;">Adam.<br></div><div><br></div></div><div style=3D= "font-family:Arial;">[1] Candes, E. J., Romberg, J. K., &amp; Tao, T. (2= 006). Stable signal recovery from incomplete and inaccurate measurements= . Communications on Pure and Applied Mathematics: A Journal Issued by th= e Courant Institute of Mathematical Sciences, 59(8), 1207-1223.<br></div= ><div class=3D"gs_citr"><div style=3D"font-family:Arial;"><br></div></di= v><div style=3D"font-family:Arial;"><br></div><div style=3D"font-family:= Arial;"><div style=3D"font-family:Arial;">[2] Shapiro, Harold S and Silv= erman, Richard A. Alias-free sampling of random noise. Journal of the<br= ></div><div style=3D"font-family:Arial;">Society for Industrial and Appl= ied Mathematics, 8(2):225?248, 1960.<br></div><div style=3D"font-family:= Arial;"><br></div><div style=3D"font-family:Arial;">[3] Yellott, John I.= Spectral consequences of photoreceptor sampling in the rhesus retina. S= cience, 221<br></div></div><div style=3D"font-family:Arial;">(4608):382?= 385, 1983.<br></div><div style=3D"font-family:Arial;"><br></div><div sty= le=3D"font-family:Arial;">[4] <a href=3D"https://arxiv.org/abs/2111.0433= 8">Weisser, A. (2021). Treatise on Hearing: The Temporal Auditory Imagin= g Theory Inspired by Optics and Communication. </a><i><a href=3D"https:/= /arxiv.org/abs/2111.04338">arXiv preprint arXiv:2111.04338</a></i><a hre= f=3D"https://arxiv.org/abs/2111.04338">.</a><br></div><div style=3D"font= -family:Arial;"><br></div><div style=3D"font-family:Arial;"><br></div><d= iv style=3D"font-family:Arial;"><br></div><div>On Sun, Aug 14, 2022, at = 4:47 AM, Alain de Cheveigne wrote:<br></div><blockquote type=3D"cite" id= =3D"qt" style=3D""><div style=3D"font-family:Arial;">Hi Dick, all,&nbsp;= <br></div><div style=3D"font-family:Arial;"><br></div><div style=3D"font= -family:Arial;">A couple of thoughts.&nbsp; I'm no expert of spatial hea= ring, so they may be off the mark.&nbsp;<br></div><div style=3D"font-fam= ily:Arial;"><br></div><div style=3D"font-family:Arial;">&gt; And I don't= think the sample interval of 1/16000 sec provides a strong inherent lim= it on ITD accuracy.&nbsp; The bandwidth of 8 kHz is about half of what's= "normal", so theoretical TDOA resolution should be expected to be no wo= rse than double normal, say 20=E2=80=9340 microseconds (about half of a = sample interval) instead of 10=E2=80=9320 microseconds.&nbsp; I wouldn't= be surprised if the ITD resolution threshold was even closer to normal = (around 1/4 sample interval), since our ITD-computing structure is domin= ated by lower-frequency input.<br></div><div style=3D"font-family:Arial;= "><br></div><div style=3D"font-family:Arial;"><br></div><div style=3D"fo= nt-family:Arial;">Indeed, the sample interval does not limit ITD estimat= ion resolution.&nbsp; You can get arbitrary resolution by interpolating = the cross-correlation function near its peak (for example by fitting a p= arabola to three samples closest to the peak).&nbsp; A similar argument = applies to fundamental frequency estimation (--&gt; pitch) from the auto= correlation function as in the YIN method.&nbsp;<br></div><div style=3D"= font-family:Arial;"><br></div><div style=3D"font-family:Arial;">This ass= umes that the CCF or ACF is smooth enough for the interpolation to be ac= curate, and for that the audio signals must be smooth, i.e. band-limited= .&nbsp; The purpose of a low-pass antialiasing filter associated with sa= mpling or resampling is to *insure* that this is the case for typical si= gnals, but that "insurance" is unnecessary if the signals contain no hig= h-frequency power to start with.&nbsp;&nbsp;<br></div><div style=3D"font= -family:Arial;"><br></div><div style=3D"font-family:Arial;">Thus, the ch= oice of low-pass filter is a bit of a free parameter under the control o= f the engineer or experimenter. A wide filter (or none) is OK if the sig= nals are known to contain little or no high-frequency power, a sharp fil= ter is needed if the signals are strongly high-pass. Engineers typically= err on the side of precaution by designing filters with strong attenuat= ion beyond Nyquist, usually with the additional goal of keeping the pass= -band flat. This requires a filter with a long impulse response. There's= lee-way in the exact choice. EEs love the topic.<br></div><div style=3D= "font-family:Arial;"><br></div><div style=3D"font-family:Arial;">This br= ings me to my second point. Are there perceptual correlates of antialias= ing filtering?&nbsp; There are two reasons to suspect an effect on spati= al hearing. First, a long IR might widen the CCF peak and blur the "cris= p" peak in the short-term CCF associated with a transient. Second, the f= requency-domain features of the filter transfer function might interact = with spectral notches characteristic of elevation or front-vs-back posit= ion of sources, particularly if those features are estimated by neural c= ircuits also sensitive to time.<br></div><div style=3D"font-family:Arial= ;"><br></div><div style=3D"font-family:Arial;">Again, this is pure specu= lation. Unfortunately, antialiasing filters are rarely specified in deta= il (in systems or studies), and I'm not aware of any study aiming to cha= racterize their perceptual effects or demonstrate that there are none.&n= bsp; Anecdotally, I remember being annoyed when listening to music on an= early CD player, by what I attributed to high-frequency ringing of anti= aliasing or reconstruction filters with poles just below Nyquist. That w= as when I could still hear in that region...<br></div><div style=3D"font= -family:Arial;"><br></div><div style=3D"font-family:Arial;">Alain<br></d= iv><div style=3D"font-family:Arial;"><br></div><div style=3D"font-family= :Arial;"><br></div><div style=3D"font-family:Arial;"><br></div><div styl= e=3D"font-family:Arial;"><br></div><div style=3D"font-family:Arial;"><br= ></div><div style=3D"font-family:Arial;"><br></div><div style=3D"font-fa= mily:Arial;"><br></div><div style=3D"font-family:Arial;">&gt; On 14 Aug = 2022, at 05:03, Richard F. Lyon &lt;<a href=3D"mailto:DickLyon@xxxxxxxx">= DickLyon@xxxxxxxx</a>&gt; wrote:<br></div><div style=3D"font-family:Arial= ;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;">&gt; Yes, good = idea to find some solutions to the difficult.<br></div><div style=3D"fon= t-family:Arial;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;">&= gt; Reviewing my book's Figure 22.7, there's a pretty good spectral notc= h cue to elevation in the 5.5-8 kHz region (and higher); 8 kHz might be = enough for elevation up to about 45 degrees (find free book PDF via mach= inehearing.org -- search that blog for "free".)<br></div><div style=3D"f= ont-family:Arial;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;"= >&gt; For resolving front/back confusion, that's hard unless you add the= effects of lateralization change with head turning.&nbsp; Using a head = tracker or gyro to change the lateral angle to the sound, relative to th= e head, is very effective for letting the user disambiguate, if they hav= e time to move a little.&nbsp; So it depends on what you're trying to do= .<br></div><div style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div st= yle=3D"font-family:Arial;">&gt; If it was impossible to localize sounds = with a 16 kHz sample rate, it would be equally impossible to localize so= unds with no energy about 8 kHz.&nbsp; I don't think that's the case.&nb= sp; I can't hear anything about 8 kHz (unless it's quite intense), and I= don't sense that I have any difficulty localizing sounds around me.&nbs= p; Probably if we measured though we'd find I'm not as accurate as a per= son with better hearing.<br></div><div style=3D"font-family:Arial;">&gt;= &nbsp;<br></div><div style=3D"font-family:Arial;">&gt; And I don't think= the sample interval of 1/16000 sec provides a strong inherent limit on = ITD accuracy.&nbsp; The bandwidth of 8 kHz is about half of what's "norm= al", so theoretical TDOA resolution should be expected to be no worse th= an double normal, say 20=E2=80=9340 microseconds (about half of a sample= interval) instead of 10=E2=80=9320 microseconds.&nbsp; I wouldn't be su= rprised if the ITD resolution threshold was even closer to normal (aroun= d 1/4 sample interval), since our ITD-computing structure is dominated b= y lower-frequency input.<br></div><div style=3D"font-family:Arial;">&gt;= &nbsp;<br></div><div style=3D"font-family:Arial;">&gt; Dick<br></div><di= v style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div style=3D"font-fa= mily:Arial;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;">&gt;&= nbsp;<br></div><div style=3D"font-family:Arial;">&gt;&nbsp;<br></div><di= v style=3D"font-family:Arial;">&gt; On Fri, Aug 12, 2022 at 9:20 PM Junf= eng Li &lt;<a href=3D"mailto:junfeng.li.1979@xxxxxxxx">junfeng.li.1979@xxxxxxxx= gmail.com</a>&gt; wrote:<br></div><div style=3D"font-family:Arial;">&gt;= Dear Frederick,<br></div><div style=3D"font-family:Arial;">&gt;&nbsp;<b= r></div><div style=3D"font-family:Arial;">&gt; Thank you so much for the= references that you mentioned.&nbsp;<br></div><div style=3D"font-family= :Arial;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;">&gt; "[..= .] up=E2=80=93down cues are located mainly in the 6=E2=80=9312-kHz band,= and front=E2=80=93back cues in the 8=E2=80=9316-kHz band."&nbsp;<br></d= iv><div style=3D"font-family:Arial;">&gt; According to this statement, i= t seems impossible to solve the problems of elevation perception and fro= nt-back confusion when the output signal is sampled at 16kHz.&nbsp;<br><= /div><div style=3D"font-family:Arial;">&gt; Though I know it is difficul= t, I always try to find some solutions.<br></div><div style=3D"font-fami= ly:Arial;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;">&gt; Th= anks again.<br></div><div style=3D"font-family:Arial;">&gt;&nbsp;<br></d= iv><div style=3D"font-family:Arial;">&gt; Best regards,<br></div><div st= yle=3D"font-family:Arial;">&gt; Junfeng&nbsp;<br></div><div style=3D"fon= t-family:Arial;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;">&= gt; On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun &lt;<a href=3D"mai= lto:fgallun@xxxxxxxx">fgallun@xxxxxxxx</a>&gt; wrote:<br></div><div st= yle=3D"font-family:Arial;">&gt; The literature on the HRTF over the past= 60 years has made it very clear that "[...] up=E2=80=93down cues are lo= cated mainly in the 6=E2=80=9312-kHz band, and front=E2=80=93back cues i= n the 8=E2=80=9316-kHz band." (Langendiijk and Bronkhorst, 2002)&nbsp;&n= bsp;<br></div><div style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div= style=3D"font-family:Arial;">&gt; Here are a few places to start:<br></= div><div style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div style=3D"= font-family:Arial;">&gt; Langendijk, E. H. A., &amp; Bronkhorst, A. W. (= 2002). Contribution of spectral cues to human sound localization. The Jo= urnal of the Acoustical Society of America, 112(4), 1583=E2=80=931596.&n= bsp;<a href=3D"https://doi.org/10.1121/1.1501901">https://doi.org/10.112= 1/1.1501901</a><br></div><div style=3D"font-family:Arial;">&gt;&nbsp;<br= ></div><div style=3D"font-family:Arial;">&gt; Mehrgardt, S., &amp; Melle= rt, V. (1977). Transformation characteristics of the external human ear.= The Journal of the Acoustical Society of America, 61(6), 1567=E2=80=931= 576.&nbsp;<a href=3D"https://doi.org/10.1121/1.381470">https://doi.org/1= 0.1121/1.381470</a><br></div><div style=3D"font-family:Arial;">&gt;&nbsp= ;<br></div><div style=3D"font-family:Arial;">&gt; Shaw, E. a. G., &amp; = Teranishi, R. (1968). Sound Pressure Generated in an External=E2=80=90Ea= r Replica and Real Human Ears by a Nearby Point Source. The Journal of t= he Acoustical Society of America, 44(1), 240=E2=80=93249.&nbsp;<a href=3D= "https://doi.org/10.1121/1.1911059">https://doi.org/10.1121/1.1911059</a= ><br></div><div style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div st= yle=3D"font-family:Arial;">&gt; ----------------------------------------= -----<br></div><div style=3D"font-family:Arial;">&gt;&nbsp;<br></div><di= v style=3D"font-family:Arial;">&gt; Frederick (Erick) Gallun, PhD, FASA,= FASHA | he/him/his<br></div><div style=3D"font-family:Arial;">&gt; Prof= essor, Oregon Hearing Research Center, Oregon Health &amp; Science Unive= rsity<br></div><div style=3D"font-family:Arial;">&gt; "Diversity is like= being invited to a party, Inclusion is being asked to dance, and Belong= ing is dancing like no one=E2=80=99s watching" - Gregory Lewis<br></div>= <div style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div style=3D"font= -family:Arial;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;">&g= t; On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li &lt;<a href=3D"mailto:jun= feng.li.1979@xxxxxxxx">junfeng.li.1979@xxxxxxxx</a>&gt; wrote:<br></di= v><div style=3D"font-family:Arial;">&gt; Dear&nbsp; Leslie,<br></div><di= v style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div style=3D"font-fa= mily:Arial;">&gt; When downsampling to 8/16kHz, we really found the loca= lization accuracy decreases, even for horizon<br></div><div style=3D"fon= t-family:Arial;">&gt; Do you have any good ideas to solve it?<br></div><= div style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div style=3D"font-= family:Arial;">&gt; Thanks a lot.<br></div><div style=3D"font-family:Ari= al;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;">&gt; Best reg= ards,<br></div><div style=3D"font-family:Arial;">&gt; Junfeng&nbsp;<br><= /div><div style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div style=3D= "font-family:Arial;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial= ;">&gt; On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith &lt;<a href=3D= "mailto:l.s.smith@xxxxxxxx">l.s.smith@xxxxxxxx</a>&gt; wrote:<= br></div><div style=3D"font-family:Arial;">&gt; I'd also wonder about th= e time resolution: 16KHz =3D 1/16000 sec between<br></div><div style=3D"= font-family:Arial;">&gt; samples =3D 62 microseconds<br></div><div style= =3D"font-family:Arial;">&gt; .<br></div><div style=3D"font-family:Arial;= ">&gt; That's relatively long for ITD (TDOA) estimation, which would sug= gest that<br></div><div style=3D"font-family:Arial;">&gt; localisation o= f lower frequency signals would be impeded.<br></div><div style=3D"font-= family:Arial;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;">&gt= ; (I don't have evidence for this: it's just a suggestion).<br></div><di= v style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div style=3D"font-fa= mily:Arial;">&gt; --Leslie Smith<br></div><div style=3D"font-family:Aria= l;">&gt;&nbsp;<br></div><div style=3D"font-family:Arial;">&gt; Junfeng L= i wrote:<br></div><div style=3D"font-family:Arial;">&gt; &gt; Dear all,<= br></div><div style=3D"font-family:Arial;">&gt; &gt;<br></div><div style= =3D"font-family:Arial;">&gt; &gt; We are working on 3D audio rendering f= or signals with low sampling<br></div><div style=3D"font-family:Arial;">= &gt; &gt; frequency.<br></div><div style=3D"font-family:Arial;">&gt; &gt= ; As you may know, the HRTFs&nbsp; are normally measured at the high sam= pling<br></div><div style=3D"font-family:Arial;">&gt; &gt; frequency, e.= g., 48kHz or 44.1kHz. However, the sampling frequency of<br></div><div s= tyle=3D"font-family:Arial;">&gt; &gt; sound<br></div><div style=3D"font-= family:Arial;">&gt; &gt; signals in our application is restricted to 16 = kHz. Therefore, to render<br></div><div style=3D"font-family:Arial;">&gt= ; &gt; this low-frequency (=E2=89=A48kHz) signal, one straight way is to= first<br></div><div style=3D"font-family:Arial;">&gt; &gt; downsample<b= r></div><div style=3D"font-family:Arial;">&gt; &gt; the HRTFs from 48kHz= /44.1kHz to 16kHz and then convolve with sound<br></div><div style=3D"fo= nt-family:Arial;">&gt; &gt; signals.<br></div><div style=3D"font-family:= Arial;">&gt; &gt; However, the sound localization performance of the sig= nal rendered with<br></div><div style=3D"font-family:Arial;">&gt; &gt; t= his approach is greatly decreased, especially elevation perception. To<b= r></div><div style=3D"font-family:Arial;">&gt; &gt; improve the sound lo= calization performance, I am now wondering whether<br></div><div style=3D= "font-family:Arial;">&gt; &gt; there is a certain good method to solve o= r mitigate this problem in this<br></div><div style=3D"font-family:Arial= ;">&gt; &gt; scenario.<br></div><div style=3D"font-family:Arial;">&gt; &= gt;<br></div><div style=3D"font-family:Arial;">&gt; &gt; Any discussion = is welcome.<br></div><div style=3D"font-family:Arial;">&gt; &gt;<br></di= v><div style=3D"font-family:Arial;">&gt; &gt; Thanks a lot again.<br></d= iv><div style=3D"font-family:Arial;">&gt; &gt;<br></div><div style=3D"fo= nt-family:Arial;">&gt; &gt; Best regards,<br></div><div style=3D"font-fa= mily:Arial;">&gt; &gt; Junfeng<br></div><div style=3D"font-family:Arial;= ">&gt; &gt;<br></div><div style=3D"font-family:Arial;">&gt;&nbsp;<br></d= iv><div style=3D"font-family:Arial;">&gt;&nbsp;<br></div><div style=3D"f= ont-family:Arial;">&gt; --&nbsp;<br></div><div style=3D"font-family:Aria= l;">&gt; Prof Leslie Smith (Emeritus)<br></div><div style=3D"font-family= :Arial;">&gt; Computing Science &amp; Mathematics,<br></div><div style=3D= "font-family:Arial;">&gt; University of Stirling, Stirling FK9 4LA<br></= div><div style=3D"font-family:Arial;">&gt; Scotland, UK<br></div><div st= yle=3D"font-family:Arial;">&gt; Tel +44 1786 467435<br></div><div style=3D= "font-family:Arial;">&gt; Web:&nbsp;<a href=3D"http://www.cs.stir.ac.uk/= ~lss">http://www.cs.stir.ac.uk/~lss</a><br></div><div style=3D"font-fami= ly:Arial;">&gt; Blog:&nbsp;<a href=3D"http://lestheprof.com">http://lest= heprof.com</a><br></div><div style=3D"font-family:Arial;">&gt;&nbsp;<br>= </div><div style=3D"font-family:Arial;"><br></div></blockquote><div styl= e=3D"font-family:Arial;"><br></div></body></html> --1eebb7b3f1084a8c8a0c74dbb243bb72--


This message came from the mail archive
src/postings/2022/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University