Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency (Piotr Majdak )


Subject: Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency
From:    Piotr Majdak  <piotr@xxxxxxxx>
Date:    Mon, 15 Aug 2022 13:07:01 +0200

This is a multi-part message in MIME format. --------------XCMpKl0TDZRUgkSXsisHvvoZ Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by edgeum4.it.mcgill.ca id 27FB76if110675 Dear all, With respect to the sound localization in vertical planes: We also could=20 look what happens when the spectral content above 8 kHz is removed: Best et al. (2005, =E2=80=9CThe role of high frequencies in speech=20 localization,=E2=80=9D JASA 118, 353=E2=80=9363): Results for the sound l= ocalization=20 with low-pass filtered (8 kHz) speech (their Exp I) show a drastic=20 increase of sound-localization errors in vertical planes (their Fig. 5),=20 no effect in the lateral plane. Majdak et al. (2013, =E2=80=9CEffect of long-term training on sound local= ization=20 performance with spectrally warped and band-limited head-related=20 transfer functions,=E2=80=9D JASA 134, 2148=E2=80=932159): Results for th= e sound=20 localization with low-pass filtered (8 kHz) white noises show a large=20 increase of localization errors (Fig. 6, red circles at "Pre") in=20 vertical planes (front/back, top/down), no changes in the lateral=20 dimensions (left/right). It's not encouraging for systems going up to 8 kHz only, though :-(. And=20 of course, consideration of head movements may help... Best regards, Piotr Am 15.08.2022 um 00:52 schrieb Adam Weisser: > Dear Junfeng, Alain, and all, > > I think that some solutions to the undersampling / aliasing problem=20 > that you described should exist, but they likely depend on where the=20 > sampling-rate bottleneck lies: at the input, in processing, at the=20 > output stage, or in all of them. Also, it depends on the computational=20 > capabilities of the system and whether it has to work in real time,=20 > and if so what the permissible delay is. > > I'm aware of two general approaches to circumvent the Nyquist criterion= : > 1. Compressed sensing - This heavily researched signal-processing=20 > method uses signal sparsity to faithfully reconstruct undersampled=20 > signals [1]. > > 2. Trading off aliasing and noise - This is a classical result that=20 > employs nonuniform sampling at lower rates than Nyquist, whereby the=20 > aliasing that would otherwise arise is replaced by noise [2]. It is=20 > thought that this is what happens in the retina, where the optical=20 > image is densely sampled in the fovea by the photoreceptors, but=20 > becomes gradually undersampled away from the fovea [3]. Had the=20 > photoreceptor density been uniform and regular over the retina, the=20 > resolution of the central vision would great suffer and the image=20 > would also be severely aliased. However, this trick works only if the=20 > sampling is truly stochastic. If the "localization noise" level (maybe=20 > manifest as audio noise) can be sacrificed, then this approach may=20 > work, combined with dither. > > Regardless of the specific system architecture at hand, none of these=20 > methods appears straightforward to implement. > > Finally, regarding Alain's comment about auditory sampling - the neat=20 > trick that is found in spatial processing of vision may be analogous=20 > to what goes on in temporal processing of stimuli at the transduction=20 > stage of the auditory nerve. Neural adaptation can be thought of as=20 > dense sampling of the signal around its onset / transient portion,=20 > which becomes more sparsely sampled quickly after the onset. Because=20 > of adaptation, this effect is very illusive, but I believe that it is=20 > measurable notwithstanding. I tried to demonstrate it=20 > psychoacoustically in Appendix E of [4]. While I don't know how it=20 > relates to binaural processing directly, there may be instantaneous=20 > effects that may be detectable there too, given that the input to both=20 > processing types is the same. > > All the best, > Adam. > > [1] Candes, E. J., Romberg, J. K., & Tao, T. (2006). Stable signal=20 > recovery from incomplete and inaccurate measurements. Communications=20 > on Pure and Applied Mathematics: A Journal Issued by the Courant=20 > Institute of Mathematical Sciences, 59(8), 1207-1223. > > > [2] Shapiro, Harold S and Silverman, Richard A. Alias-free sampling of=20 > random noise. Journal of the > Society for Industrial and Applied Mathematics, 8(2):225?248, 1960. > > [3] Yellott, John I. Spectral consequences of photoreceptor sampling=20 > in the rhesus retina. Science, 221 > (4608):382?385, 1983. > > [4] Weisser, A. (2021). Treatise on Hearing: The Temporal Auditory=20 > Imaging Theory Inspired by Optics and Communication.=20 > <https://arxiv.org/abs/2111.04338>/arXiv preprint arXiv:2111.04338=20 > <https://arxiv.org/abs/2111.04338>/. <https://arxiv.org/abs/2111.04338> > > > > On Sun, Aug 14, 2022, at 4:47 AM, Alain de Cheveigne wrote: >> Hi Dick, all, >> >> A couple of thoughts.=C2=A0 I'm no expert of spatial hearing, so they = may=20 >> be off the mark. >> >> > And I don't think the sample interval of 1/16000 sec provides a=20 >> strong inherent limit on ITD accuracy.=C2=A0 The bandwidth of 8 kHz is= =20 >> about half of what's "normal", so theoretical TDOA resolution should=20 >> be expected to be no worse than double normal, say 20=E2=80=9340 micro= seconds=20 >> (about half of a sample interval) instead of 10=E2=80=9320 microsecond= s.=C2=A0 I=20 >> wouldn't be surprised if the ITD resolution threshold was even closer=20 >> to normal (around 1/4 sample interval), since our ITD-computing=20 >> structure is dominated by lower-frequency input. >> >> >> Indeed, the sample interval does not limit ITD estimation=20 >> resolution.=C2=A0 You can get arbitrary resolution by interpolating th= e=20 >> cross-correlation function near its peak (for example by fitting a=20 >> parabola to three samples closest to the peak).=C2=A0 A similar argume= nt=20 >> applies to fundamental frequency estimation (--> pitch) from the=20 >> autocorrelation function as in the YIN method. >> >> This assumes that the CCF or ACF is smooth enough for the=20 >> interpolation to be accurate, and for that the audio signals must be=20 >> smooth, i.e. band-limited.=C2=A0 The purpose of a low-pass antialiasin= g=20 >> filter associated with sampling or resampling is to *insure* that=20 >> this is the case for typical signals, but that "insurance" is=20 >> unnecessary if the signals contain no high-frequency power to start=20 >> with. >> >> Thus, the choice of low-pass filter is a bit of a free parameter=20 >> under the control of the engineer or experimenter. A wide filter (or=20 >> none) is OK if the signals are known to contain little or no=20 >> high-frequency power, a sharp filter is needed if the signals are=20 >> strongly high-pass. Engineers typically err on the side of precaution=20 >> by designing filters with strong attenuation beyond Nyquist, usually=20 >> with the additional goal of keeping the pass-band flat. This requires=20 >> a filter with a long impulse response. There's lee-way in the exact=20 >> choice. EEs love the topic. >> >> This brings me to my second point. Are there perceptual correlates of=20 >> antialiasing filtering?=C2=A0 There are two reasons to suspect an effe= ct=20 >> on spatial hearing. First, a long IR might widen the CCF peak and=20 >> blur the "crisp" peak in the short-term CCF associated with a=20 >> transient. Second, the frequency-domain features of the filter=20 >> transfer function might interact with spectral notches characteristic=20 >> of elevation or front-vs-back position of sources, particularly if=20 >> those features are estimated by neural circuits also sensitive to time. >> >> Again, this is pure speculation. Unfortunately, antialiasing filters=20 >> are rarely specified in detail (in systems or studies), and I'm not=20 >> aware of any study aiming to characterize their perceptual effects or=20 >> demonstrate that there are none.=C2=A0 Anecdotally, I remember being=20 >> annoyed when listening to music on an early CD player, by what I=20 >> attributed to high-frequency ringing of antialiasing or=20 >> reconstruction filters with poles just below Nyquist. That was when I=20 >> could still hear in that region... >> >> Alain >> >> >> >> >> >> >> >> > On 14 Aug 2022, at 05:03, Richard F. Lyon <DickLyon@xxxxxxxx> wrote: >> > >> > Yes, good idea to find some solutions to the difficult. >> > >> > Reviewing my book's Figure 22.7, there's a pretty good spectral=20 >> notch cue to elevation in the 5.5-8 kHz region (and higher); 8 kHz=20 >> might be enough for elevation up to about 45 degrees (find free book=20 >> PDF via machinehearing.org -- search that blog for "free".) >> > >> > For resolving front/back confusion, that's hard unless you add the=20 >> effects of lateralization change with head turning.=C2=A0 Using a head= =20 >> tracker or gyro to change the lateral angle to the sound, relative to=20 >> the head, is very effective for letting the user disambiguate, if=20 >> they have time to move a little.=C2=A0 So it depends on what you're tr= ying=20 >> to do. >> > >> > If it was impossible to localize sounds with a 16 kHz sample rate,=20 >> it would be equally impossible to localize sounds with no energy=20 >> about 8 kHz.=C2=A0 I don't think that's the case.=C2=A0 I can't hear a= nything=20 >> about 8 kHz (unless it's quite intense), and I don't sense that I=20 >> have any difficulty localizing sounds around me.=C2=A0 Probably if we=20 >> measured though we'd find I'm not as accurate as a person with better=20 >> hearing. >> > >> > And I don't think the sample interval of 1/16000 sec provides a=20 >> strong inherent limit on ITD accuracy.=C2=A0 The bandwidth of 8 kHz is= =20 >> about half of what's "normal", so theoretical TDOA resolution should=20 >> be expected to be no worse than double normal, say 20=E2=80=9340 micro= seconds=20 >> (about half of a sample interval) instead of 10=E2=80=9320 microsecond= s.=C2=A0 I=20 >> wouldn't be surprised if the ITD resolution threshold was even closer=20 >> to normal (around 1/4 sample interval), since our ITD-computing=20 >> structure is dominated by lower-frequency input. >> > >> > Dick >> > >> > >> > >> > >> > On Fri, Aug 12, 2022 at 9:20 PM Junfeng Li=20 >> <junfeng.li.1979@xxxxxxxx> wrote: >> > Dear Frederick, >> > >> > Thank you so much for the references that you mentioned. >> > >> > "[...] up=E2=80=93down cues are located mainly in the 6=E2=80=9312-k= Hz band, and=20 >> front=E2=80=93back cues in the 8=E2=80=9316-kHz band." >> > According to this statement, it seems impossible to solve the=20 >> problems of elevation perception and front-back confusion when the=20 >> output signal is sampled at 16kHz. >> > Though I know it is difficult, I always try to find some solutions. >> > >> > Thanks again. >> > >> > Best regards, >> > Junfeng >> > >> > On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun=20 >> <fgallun@xxxxxxxx> wrote: >> > The literature on the HRTF over the past 60 years has made it very=20 >> clear that "[...] up=E2=80=93down cues are located mainly in the 6=E2=80= =9312-kHz=20 >> band, and front=E2=80=93back cues in the 8=E2=80=9316-kHz band." (Lang= endiijk and=20 >> Bronkhorst, 2002) >> > >> > Here are a few places to start: >> > >> > Langendijk, E. H. A., & Bronkhorst, A. W. (2002). Contribution of=20 >> spectral cues to human sound localization. The Journal of the=20 >> Acoustical Society of America, 112(4), 1583=E2=80=931596.=20 >> https://doi.org/10.1121/1.1501901 >> > >> > Mehrgardt, S., & Mellert, V. (1977). Transformation characteristics=20 >> of the external human ear. The Journal of the Acoustical Society of=20 >> America, 61(6), 1567=E2=80=931576. https://doi.org/10.1121/1.381470 >> > >> > Shaw, E. a. G., & Teranishi, R. (1968). Sound Pressure Generated in=20 >> an External=E2=80=90Ear Replica and Real Human Ears by a Nearby Point = Source.=20 >> The Journal of the Acoustical Society of America, 44(1), 240=E2=80=932= 49.=20 >> https://doi.org/10.1121/1.1911059 >> > >> > --------------------------------------------- >> > >> > Frederick (Erick) Gallun, PhD, FASA, FASHA | he/him/his >> > Professor, Oregon Hearing Research Center, Oregon Health & Science=20 >> University >> > "Diversity is like being invited to a party, Inclusion is being=20 >> asked to dance, and Belonging is dancing like no one=E2=80=99s watchin= g" -=20 >> Gregory Lewis >> > >> > >> > On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li=20 >> <junfeng.li.1979@xxxxxxxx> wrote: >> > Dear=C2=A0 Leslie, >> > >> > When downsampling to 8/16kHz, we really found the localization=20 >> accuracy decreases, even for horizon >> > Do you have any good ideas to solve it? >> > >> > Thanks a lot. >> > >> > Best regards, >> > Junfeng >> > >> > >> > On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith=20 >> <l.s.smith@xxxxxxxx> wrote: >> > I'd also wonder about the time resolution: 16KHz =3D 1/16000 sec bet= ween >> > samples =3D 62 microseconds >> > . >> > That's relatively long for ITD (TDOA) estimation, which would=20 >> suggest that >> > localisation of lower frequency signals would be impeded. >> > >> > (I don't have evidence for this: it's just a suggestion). >> > >> > --Leslie Smith >> > >> > Junfeng Li wrote: >> > > Dear all, >> > > >> > > We are working on 3D audio rendering for signals with low sampling >> > > frequency. >> > > As you may know, the HRTFs=C2=A0 are normally measured at the high= =20 >> sampling >> > > frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency= of >> > > sound >> > > signals in our application is restricted to 16 kHz. Therefore, to=20 >> render >> > > this low-frequency (=E2=89=A48kHz) signal, one straight way is to = first >> > > downsample >> > > the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound >> > > signals. >> > > However, the sound localization performance of the signal=20 >> rendered with >> > > this approach is greatly decreased, especially elevation=20 >> perception. To >> > > improve the sound localization performance, I am now wondering=20 >> whether >> > > there is a certain good method to solve or mitigate this problem=20 >> in this >> > > scenario. >> > > >> > > Any discussion is welcome. >> > > >> > > Thanks a lot again. >> > > >> > > Best regards, >> > > Junfeng >> > > >> > >> > >> > -- >> > Prof Leslie Smith (Emeritus) >> > Computing Science & Mathematics, >> > University of Stirling, Stirling FK9 4LA >> > Scotland, UK >> > Tel +44 1786 467435 >> > Web: http://www.cs.stir.ac.uk/~lss >> > Blog: http://lestheprof.com >> > >> > --=20 Piotr Majdak Fachbereich H=C3=B6ren=20 <https://www.oeaw.ac.at/isf/forschung/fachbereiche-teams/hoeren> Institut f=C3=BCr Schallforschung <https://www.oeaw.ac.at/isf> =C3=96sterreichische Akademie der Wissenschaften <http://www.oeaw.ac.at/> Wohllebengasse 12-14, 1040 Wien Tel.: +43 1 51581-2511 --------------XCMpKl0TDZRUgkSXsisHvvoZ Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by edgeum4.it.mcgill.ca id 27FB76if110675 <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF= -8"> </head> <body text=3D"#000000" bgcolor=3D"#FFFFFF"> <p>Dear all, <br> </p> <p>With respect to the sound localization in vertical planes: We also could look what happens when the spectral content above 8 kHz is removed: <br> </p> <div class=3D"csl-bib-body" style=3D"line-height: 1.35; margin-left: 2em; text-indent:-2em;"> <div class=3D"csl-entry">Best et al. (2005, =E2=80=9CThe role of hi= gh frequencies in speech localization,=E2=80=9D JASA 118, 353=E2=80=93= 63): Results for the sound localization with low-pass filtered (8 kHz) speech (their Exp I) show a drastic increase of sound-localization errors in vertical planes (their Fig. 5), no effect in the lateral plane.=C2=A0 <br> </div> <br> </div> <div class=3D"csl-bib-body" style=3D"line-height: 1.35; margin-left: 2em; text-indent:-2em;">Majdak et al. (2013, =E2=80=9CEffect of lon= g-term training on sound localization performance with spectrally warped and band-limited head-related transfer functions,=E2=80=9D JASA 134= , 2148=E2=80=932159): Results for the sound localization with low-pas= s filtered (8 kHz) white noises show a large increase of localization errors (Fig. 6, red circles at "Pre") in vertical planes (front/back, top/down), no changes in the lateral dimensions (left/right).<br> <br> </div> <div class=3D"csl-bib-body" style=3D"line-height: 1.35; margin-left: 2em; text-indent:-2em;">It's not encouraging for systems going up to 8 kHz only, though :-(. And of course, consideration of head movements may help...<br> </div> <div class=3D"csl-bib-body" style=3D"line-height: 1.35; margin-left: 2em; text-indent:-2em;"><br> </div> <div class=3D"csl-bib-body" style=3D"line-height: 1.35; margin-left: 2em; text-indent:-2em;">Best regards, <br> </div> <div class=3D"csl-bib-body" style=3D"line-height: 1.35; margin-left: 2em; text-indent:-2em;">Piotr<br> </div> <div class=3D"csl-bib-body" style=3D"line-height: 1.35; margin-left: 2em; text-indent:-2em;"><br> </div> <div class=3D"moz-cite-prefix">Am 15.08.2022 um 00:52 schrieb Adam Weisser:<br> </div> <blockquote type=3D"cite" cite=3D"mid:84bbf7ba-7b6e-4ecd-8316-f8d92f771c9a@xxxxxxxx"> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DU= TF-8"> <title></title> <style type=3D"text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}p.MsoN= ormal,p.MsoNoSpacing{margin:0}</style> <div style=3D"font-family:Arial;">Dear Junfeng, Alain, and all,<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">I think that some solutions to th= e undersampling / aliasing problem that you described should exist, but they likely depend on where the sampling-rate bottleneck lies: at the input, in processing, at the output stage, or in all of them. Also, it depends on the computational capabilities of the system and whether it has to work in real time, and if so what the permissible delay is. <br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">I'm aware of two general approaches to circumvent the Nyquist criterion: <br> </div> <div style=3D"font-family:Arial;">1. Compressed sensing - This heavily researched signal-processing method uses signal sparsity to faithfully reconstruct undersampled signals [1].<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div class=3D"gs_citr"><span class=3D"font" style=3D"font-family:ar= ial, sans-serif, sans-serif;">2. Trading off aliasing and noise - This is a classical result that employs nonuniform sampling at lower rates than Nyquist, whereby the aliasing that would otherwise arise is replaced by noise [2]. It is thought that this is what happens in the retina, where the optical image is densely sampled in the fovea by the photoreceptors, but becomes gradually undersampled away from the fovea [3]. Had the photoreceptor density been uniform and regular over the retina, the resolution of the central vision would great suffer and the image would also be severely aliased. However, this trick works only if the sampling is truly stochastic. If the "localization noise" level (maybe manifest as audio noise) can be sacrificed, then this approach may work, combined with dither.</span><span class=3D"font" style=3D"font-family:arial, sans-serif, sans-serif;"></span><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">Regardless of the specific system architecture at hand, none of these methods appears straightforward to implement.<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">Finally, regarding Alain's commen= t about auditory sampling - the neat trick that is found in spatial processing of vision may be analogous to what goes on in temporal processing of stimuli at the transduction stage of the auditory nerve. Neural adaptation can be thought of as dense sampling of the signal around its onset / transient portion, which becomes more sparsely sampled quickly after the onset. Because of adaptation, this effect is very illusive, but I believe that it is measurable notwithstanding. I tried to demonstrate it psychoacoustically in Appendix E of [4]. While I don't know how it relates to binaural processing directly, there may be instantaneous effects that may be detectable there too, given that the input to both processing types is the same. <br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"> <div style=3D"font-family:Arial;">All the best,<br> </div> <div style=3D"font-family:Arial;">Adam.<br> </div> <div><br> </div> </div> <div style=3D"font-family:Arial;">[1] Candes, E. J., Romberg, J. K.= , &amp; Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8), 1207-1223.<br> </div> <div class=3D"gs_citr"> <div style=3D"font-family:Arial;"><br> </div> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"> <div style=3D"font-family:Arial;">[2] Shapiro, Harold S and Silverman, Richard A. Alias-free sampling of random noise. Journal of the<br> </div> <div style=3D"font-family:Arial;">Society for Industrial and Applied Mathematics, 8(2):225?248, 1960.<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">[3] Yellott, John I. Spectral consequences of photoreceptor sampling in the rhesus retina. Science, 221<br> </div> </div> <div style=3D"font-family:Arial;">(4608):382?385, 1983.<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">[4] <a href=3D"https://arxiv.org/abs/2111.04338" moz-do-not-send=3D"tr= ue">Weisser, A. (2021). Treatise on Hearing: The Temporal Auditory Imaging Theory Inspired by Optics and Communication. </a><i><a href=3D"https://arxiv.org/abs/2111.04338" moz-do-not-send=3D"true">arXiv preprint arXiv:2111.04338</a><= /i><a href=3D"https://arxiv.org/abs/2111.04338" moz-do-not-send=3D"tr= ue">.</a><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div>On Sun, Aug 14, 2022, at 4:47 AM, Alain de Cheveigne wrote:<br= > </div> <blockquote type=3D"cite" id=3D"qt" style=3D""> <div style=3D"font-family:Arial;">Hi Dick, all,=C2=A0<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">A couple of thoughts.=C2=A0 I'm= no expert of spatial hearing, so they may be off the mark.=C2=A0<b= r> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">&gt; And I don't think the sample interval of 1/16000 sec provides a strong inherent limit on ITD accuracy.=C2=A0 The bandwidth of 8 kHz is about ha= lf of what's "normal", so theoretical TDOA resolution should be expected to be no worse than double normal, say 20=E2=80=9340 microseconds (about half of a sample interval) instead of 10=E2=80=9320 microseconds.=C2=A0 I wouldn't be surprised if th= e ITD resolution threshold was even closer to normal (around 1/4 sample interval), since our ITD-computing structure is dominated by lower-frequency input.<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">Indeed, the sample interval doe= s not limit ITD estimation resolution.=C2=A0 You can get arbitrar= y resolution by interpolating the cross-correlation function near its peak (for example by fitting a parabola to three samples closest to the peak).=C2=A0 A similar argument applies = to fundamental frequency estimation (--&gt; pitch) from the autocorrelation function as in the YIN method.=C2=A0<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">This assumes that the CCF or AC= F is smooth enough for the interpolation to be accurate, and for that the audio signals must be smooth, i.e. band-limited.=C2=A0= The purpose of a low-pass antialiasing filter associated with sampling or resampling is to *insure* that this is the case for typical signals, but that "insurance" is unnecessary if the signals contain no high-frequency power to start with.=C2=A0= =C2=A0<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">Thus, the choice of low-pass filter is a bit of a free parameter under the control of the engineer or experimenter. A wide filter (or none) is OK if the signals are known to contain little or no high-frequency power, a sharp filter is needed if the signals are strongly high-pass. Engineers typically err on the side of precaution by designing filters with strong attenuation beyond Nyquist, usually with the additional goal of keeping the pass-band flat. This requires a filter with a long impulse response. There's lee-way in the exact choice. EEs love the topic.<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">This brings me to my second point. Are there perceptual correlates of antialiasing filtering?=C2=A0 There are two reasons to suspect an effect on spatial hearing. First, a long IR might widen the CCF peak and blur the "crisp" peak in the short-term CCF associated with a transient. Second, the frequency-domain features of the filter transfer function might interact with spectral notches characteristic of elevation or front-vs-back position of sources, particularly if those features are estimated by neural circuits also sensitive to time.<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">Again, this is pure speculation. Unfortunately, antialiasing filters are rarely specified in detail (in systems or studies), and I'm not aware of any study aiming to characterize their perceptual effects or demonstrate that there are none.=C2=A0 Anecdotally, I remember being annoye= d when listening to music on an early CD player, by what I attributed to high-frequency ringing of antialiasing or reconstruction filters with poles just below Nyquist. That was when I could still hear in that region...<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">Alain<br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;"><br> </div> <div style=3D"font-family:Arial;">&gt; On 14 Aug 2022, at 05:03, Richard F. Lyon &lt;<a href=3D"mailto:DickLyon@xxxxxxxx" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">Dick= Lyon@xxxxxxxx</a>&gt; wrote:<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Yes, good idea to find som= e solutions to the difficult.<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Reviewing my book's Figure 22.7, there's a pretty good spectral notch cue to elevation in the 5.5-8 kHz region (and higher); 8 kHz might be enough for elevation up to about 45 degrees (find free book PDF via machinehearing.org -- search that blog for "free".)<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; For resolving front/back confusion, that's hard unless you add the effects of lateralization change with head turning.=C2=A0 Using a head tra= cker or gyro to change the lateral angle to the sound, relative to the head, is very effective for letting the user disambiguate, if they have time to move a little.=C2=A0 So it depends on what you're trying to do.<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; If it was impossible to localize sounds with a 16 kHz sample rate, it would be equally impossible to localize sounds with no energy about 8 kHz.=C2=A0= I don't think that's the case.=C2=A0 I can't hear anything about = 8 kHz (unless it's quite intense), and I don't sense that I have any difficulty localizing sounds around me.=C2=A0 Probably if w= e measured though we'd find I'm not as accurate as a person with better hearing.<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; And I don't think the sample interval of 1/16000 sec provides a strong inherent limit on ITD accuracy.=C2=A0 The bandwidth of 8 kHz is about ha= lf of what's "normal", so theoretical TDOA resolution should be expected to be no worse than double normal, say 20=E2=80=9340 microseconds (about half of a sample interval) instead of 10=E2=80=9320 microseconds.=C2=A0 I wouldn't be surprised if th= e ITD resolution threshold was even closer to normal (around 1/4 sample interval), since our ITD-computing structure is dominated by lower-frequency input.<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Dick<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; On Fri, Aug 12, 2022 at 9:20 PM Junfeng Li &lt;<a href=3D"mailto:junfeng.li.1979@xxxxxxxx" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">junf= eng.li.1979@xxxxxxxx</a>&gt; wrote:<br> </div> <div style=3D"font-family:Arial;">&gt; Dear Frederick,<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Thank you so much for the references that you mentioned.=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; "[...] up=E2=80=93down cue= s are located mainly in the 6=E2=80=9312-kHz band, and front=E2=80=93= back cues in the 8=E2=80=9316-kHz band."=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; According to this statement, it seems impossible to solve the problems of elevation perception and front-back confusion when the output signal is sampled at 16kHz.=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Though I know it is difficult, I always try to find some solutions.<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Thanks again.<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Best regards,<br> </div> <div style=3D"font-family:Arial;">&gt; Junfeng=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun &lt;<a href=3D"mailto:fgallun@xxxxxxxx" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">fgallun@xxxxxxxx</a>&gt; wrote:<br> </div> <div style=3D"font-family:Arial;">&gt; The literature on the HRTF over the past 60 years has made it very clear that "[...] up=E2=80=93down cues are located mainly in the 6=E2=80=9312-kHz= band, and front=E2=80=93back cues in the 8=E2=80=9316-kHz band." (Langend= iijk and Bronkhorst, 2002)=C2=A0=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Here are a few places to start:<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Langendijk, E. H. A., &amp= ; Bronkhorst, A. W. (2002). Contribution of spectral cues to human sound localization. The Journal of the Acoustical Society of America, 112(4), 1583=E2=80=931596.=C2=A0<a href=3D"https://doi.org/10.1121/1.1501901" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">http= s://doi.org/10.1121/1.1501901</a><br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Mehrgardt, S., &amp; Mellert, V. (1977). Transformation characteristics of the external human ear. The Journal of the Acoustical Society of America, 61(6), 1567=E2=80=931576.=C2=A0<a href=3D"https://doi.org/10.1121/1.381470" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">http= s://doi.org/10.1121/1.381470</a><br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Shaw, E. a. G., &amp; Teranishi, R. (1968). Sound Pressure Generated in an External=E2=80=90Ear Replica and Real Human Ears by a Nearby Po= int Source. The Journal of the Acoustical Society of America, 44(1), 240=E2=80=93249.=C2=A0<a href=3D"https://doi.org/10.1121= /1.1911059" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">http= s://doi.org/10.1121/1.1911059</a><br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; ---------------------------------------------<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Frederick (Erick) Gallun, PhD, FASA, FASHA | he/him/his<br> </div> <div style=3D"font-family:Arial;">&gt; Professor, Oregon Hearing Research Center, Oregon Health &amp; Science University<br> </div> <div style=3D"font-family:Arial;">&gt; "Diversity is like being invited to a party, Inclusion is being asked to dance, and Belonging is dancing like no one=E2=80=99s watching" - Gregory = Lewis<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li &lt;<a href=3D"mailto:junfeng.li.1979@xxxxxxxx" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">junf= eng.li.1979@xxxxxxxx</a>&gt; wrote:<br> </div> <div style=3D"font-family:Arial;">&gt; Dear=C2=A0 Leslie,<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; When downsampling to 8/16kHz, we really found the localization accuracy decreases, even for horizon<br> </div> <div style=3D"font-family:Arial;">&gt; Do you have any good ideas to solve it?<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Thanks a lot.<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Best regards,<br> </div> <div style=3D"font-family:Arial;">&gt; Junfeng=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith &lt;<a href=3D"mailto:l.s.smith@xxxxxxxx" moz-do-not-send=3D"tr= ue" class=3D"moz-txt-link-freetext">l.s.smith@xxxxxxxx</a>&g= t; wrote:<br> </div> <div style=3D"font-family:Arial;">&gt; I'd also wonder about the time resolution: 16KHz =3D 1/16000 sec between<br> </div> <div style=3D"font-family:Arial;">&gt; samples =3D 62 microsecond= s<br> </div> <div style=3D"font-family:Arial;">&gt; .<br> </div> <div style=3D"font-family:Arial;">&gt; That's relatively long for ITD (TDOA) estimation, which would suggest that<br> </div> <div style=3D"font-family:Arial;">&gt; localisation of lower frequency signals would be impeded.<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; (I don't have evidence for this: it's just a suggestion).<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; --Leslie Smith<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Junfeng Li wrote:<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; Dear all,<br> </div> <div style=3D"font-family:Arial;">&gt; &gt;<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; We are working on 3D audio rendering for signals with low sampling<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; frequency.<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; As you may know, the HRTFs=C2=A0 are normally measured at the high sampling<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; frequency, e.g., 48kH= z or 44.1kHz. However, the sampling frequency of<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; sound<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; signals in our application is restricted to 16 kHz. Therefore, to render<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; this low-frequency (=E2=89=A48kHz) signal, one straight way is to first<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; downsample<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; signals.<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; However, the sound localization performance of the signal rendered with<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; this approach is greatly decreased, especially elevation perception. To<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; improve the sound localization performance, I am now wondering whether<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; there is a certain good method to solve or mitigate this problem in this<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; scenario.<br> </div> <div style=3D"font-family:Arial;">&gt; &gt;<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; Any discussion is welcome.<br> </div> <div style=3D"font-family:Arial;">&gt; &gt;<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; Thanks a lot again.<b= r> </div> <div style=3D"font-family:Arial;">&gt; &gt;<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; Best regards,<br> </div> <div style=3D"font-family:Arial;">&gt; &gt; Junfeng<br> </div> <div style=3D"font-family:Arial;">&gt; &gt;<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; --=C2=A0<br> </div> <div style=3D"font-family:Arial;">&gt; Prof Leslie Smith (Emeritus)<br> </div> <div style=3D"font-family:Arial;">&gt; Computing Science &amp; Mathematics,<br> </div> <div style=3D"font-family:Arial;">&gt; University of Stirling, Stirling FK9 4LA<br> </div> <div style=3D"font-family:Arial;">&gt; Scotland, UK<br> </div> <div style=3D"font-family:Arial;">&gt; Tel +44 1786 467435<br> </div> <div style=3D"font-family:Arial;">&gt; Web:=C2=A0<a href=3D"http://www.cs.stir.ac.uk/~lss" moz-do-not-send=3D"tru= e" class=3D"moz-txt-link-freetext">http://www.cs.stir.ac.uk/~lss= </a><br> </div> <div style=3D"font-family:Arial;">&gt; Blog:=C2=A0<a href=3D"http://lestheprof.com" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">http://lestheprof.com</a><br> </div> <div style=3D"font-family:Arial;">&gt;=C2=A0<br> </div> <div style=3D"font-family:Arial;"><br> </div> </blockquote> <div style=3D"font-family:Arial;"><br> </div> </blockquote> <div class=3D"moz-signature">-- <br> Piotr Majdak<br> <a href=3D"https://www.oeaw.ac.at/isf/forschung/fachbereiche-teams/h= oeren">Fachbereich H=C3=B6ren</a><br> <a href=3D"https://www.oeaw.ac.at/isf">Institut f=C3=BCr Schallfors= chung</a><br> <a href=3D"http://www.oeaw.ac.at/">=C3=96sterreichische Akademie de= r Wissenschaften</a><br> Wohllebengasse 12-14, 1040 Wien<br> Tel.: +43 1 51581-2511<br> </div> </body> </html> --------------XCMpKl0TDZRUgkSXsisHvvoZ--


This message came from the mail archive
src/postings/2022/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University