[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency



Hi Dick, all, 

A couple of thoughts.  I'm no expert of spatial hearing, so they may be off the mark. 

> And I don't think the sample interval of 1/16000 sec provides a strong inherent limit on ITD accuracy.  The bandwidth of 8 kHz is about half of what's "normal", so theoretical TDOA resolution should be expected to be no worse than double normal, say 20–40 microseconds (about half of a sample interval) instead of 10–20 microseconds.  I wouldn't be surprised if the ITD resolution threshold was even closer to normal (around 1/4 sample interval), since our ITD-computing structure is dominated by lower-frequency input.


Indeed, the sample interval does not limit ITD estimation resolution.  You can get arbitrary resolution by interpolating the cross-correlation function near its peak (for example by fitting a parabola to three samples closest to the peak).  A similar argument applies to fundamental frequency estimation (--> pitch) from the autocorrelation function as in the YIN method. 

This assumes that the CCF or ACF is smooth enough for the interpolation to be accurate, and for that the audio signals must be smooth, i.e. band-limited.  The purpose of a low-pass antialiasing filter associated with sampling or resampling is to *insure* that this is the case for typical signals, but that "insurance" is unnecessary if the signals contain no high-frequency power to start with.  

Thus, the choice of low-pass filter is a bit of a free parameter under the control of the engineer or experimenter. A wide filter (or none) is OK if the signals are known to contain little or no high-frequency power, a sharp filter is needed if the signals are strongly high-pass. Engineers typically err on the side of precaution by designing filters with strong attenuation beyond Nyquist, usually with the additional goal of keeping the pass-band flat. This requires a filter with a long impulse response. There's lee-way in the exact choice. EEs love the topic.

This brings me to my second point. Are there perceptual correlates of antialiasing filtering?  There are two reasons to suspect an effect on spatial hearing. First, a long IR might widen the CCF peak and blur the "crisp" peak in the short-term CCF associated with a transient. Second, the frequency-domain features of the filter transfer function might interact with spectral notches characteristic of elevation or front-vs-back position of sources, particularly if those features are estimated by neural circuits also sensitive to time.

Again, this is pure speculation. Unfortunately, antialiasing filters are rarely specified in detail (in systems or studies), and I'm not aware of any study aiming to characterize their perceptual effects or demonstrate that there are none.  Anecdotally, I remember being annoyed when listening to music on an early CD player, by what I attributed to high-frequency ringing of antialiasing or reconstruction filters with poles just below Nyquist. That was when I could still hear in that region...

Alain







> On 14 Aug 2022, at 05:03, Richard F. Lyon <DickLyon@xxxxxxx> wrote:
> 
> Yes, good idea to find some solutions to the difficult.
> 
> Reviewing my book's Figure 22.7, there's a pretty good spectral notch cue to elevation in the 5.5-8 kHz region (and higher); 8 kHz might be enough for elevation up to about 45 degrees (find free book PDF via machinehearing.org -- search that blog for "free".)
> 
> For resolving front/back confusion, that's hard unless you add the effects of lateralization change with head turning.  Using a head tracker or gyro to change the lateral angle to the sound, relative to the head, is very effective for letting the user disambiguate, if they have time to move a little.  So it depends on what you're trying to do.
> 
> If it was impossible to localize sounds with a 16 kHz sample rate, it would be equally impossible to localize sounds with no energy about 8 kHz.  I don't think that's the case.  I can't hear anything about 8 kHz (unless it's quite intense), and I don't sense that I have any difficulty localizing sounds around me.  Probably if we measured though we'd find I'm not as accurate as a person with better hearing.
> 
> And I don't think the sample interval of 1/16000 sec provides a strong inherent limit on ITD accuracy.  The bandwidth of 8 kHz is about half of what's "normal", so theoretical TDOA resolution should be expected to be no worse than double normal, say 20–40 microseconds (about half of a sample interval) instead of 10–20 microseconds.  I wouldn't be surprised if the ITD resolution threshold was even closer to normal (around 1/4 sample interval), since our ITD-computing structure is dominated by lower-frequency input.
> 
> Dick
> 
> 
> 
> 
> On Fri, Aug 12, 2022 at 9:20 PM Junfeng Li <junfeng.li.1979@xxxxxxxxx> wrote:
> Dear Frederick,
> 
> Thank you so much for the references that you mentioned. 
> 
> "[...] up–down cues are located mainly in the 6–12-kHz band, and front–back cues in the 8–16-kHz band." 
> According to this statement, it seems impossible to solve the problems of elevation perception and front-back confusion when the output signal is sampled at 16kHz. 
> Though I know it is difficult, I always try to find some solutions.
> 
> Thanks again.
> 
> Best regards,
> Junfeng 
> 
> On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun <fgallun@xxxxxxxxx> wrote:
> The literature on the HRTF over the past 60 years has made it very clear that "[...] up–down cues are located mainly in the 6–12-kHz band, and front–back cues in the 8–16-kHz band." (Langendiijk and Bronkhorst, 2002)  
> 
> Here are a few places to start:
> 
> Langendijk, E. H. A., & Bronkhorst, A. W. (2002). Contribution of spectral cues to human sound localization. The Journal of the Acoustical Society of America, 112(4), 1583–1596. https://doi.org/10.1121/1.1501901
> 
> Mehrgardt, S., & Mellert, V. (1977). Transformation characteristics of the external human ear. The Journal of the Acoustical Society of America, 61(6), 1567–1576. https://doi.org/10.1121/1.381470
> 
> Shaw, E. a. G., & Teranishi, R. (1968). Sound Pressure Generated in an External‐Ear Replica and Real Human Ears by a Nearby Point Source. The Journal of the Acoustical Society of America, 44(1), 240–249. https://doi.org/10.1121/1.1911059
> 
> ---------------------------------------------
> 
> Frederick (Erick) Gallun, PhD, FASA, FASHA | he/him/his
> Professor, Oregon Hearing Research Center, Oregon Health & Science University
> "Diversity is like being invited to a party, Inclusion is being asked to dance, and Belonging is dancing like no one’s watching" - Gregory Lewis
> 
> 
> On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li <junfeng.li.1979@xxxxxxxxx> wrote:
> Dear  Leslie,
> 
> When downsampling to 8/16kHz, we really found the localization accuracy decreases, even for horizon
> Do you have any good ideas to solve it?
> 
> Thanks a lot.
> 
> Best regards,
> Junfeng 
> 
> 
> On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith <l.s.smith@xxxxxxxxxxxxx> wrote:
> I'd also wonder about the time resolution: 16KHz = 1/16000 sec between
> samples = 62 microseconds
> .
> That's relatively long for ITD (TDOA) estimation, which would suggest that
> localisation of lower frequency signals would be impeded.
> 
> (I don't have evidence for this: it's just a suggestion).
> 
> --Leslie Smith
> 
> Junfeng Li wrote:
> > Dear all,
> >
> > We are working on 3D audio rendering for signals with low sampling
> > frequency.
> > As you may know, the HRTFs  are normally measured at the high sampling
> > frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of
> > sound
> > signals in our application is restricted to 16 kHz. Therefore, to render
> > this low-frequency (≤8kHz) signal, one straight way is to first
> > downsample
> > the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound
> > signals.
> > However, the sound localization performance of the signal rendered with
> > this approach is greatly decreased, especially elevation perception. To
> > improve the sound localization performance, I am now wondering whether
> > there is a certain good method to solve or mitigate this problem in this
> > scenario.
> >
> > Any discussion is welcome.
> >
> > Thanks a lot again.
> >
> > Best regards,
> > Junfeng
> >
> 
> 
> -- 
> Prof Leslie Smith (Emeritus)
> Computing Science & Mathematics,
> University of Stirling, Stirling FK9 4LA
> Scotland, UK
> Tel +44 1786 467435
> Web: http://www.cs.stir.ac.uk/~lss
> Blog: http://lestheprof.com
>