[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency



I should have mentioned that the same figure in my book also shows low-frequency cues to vertical angle, from torso and shoulder bounce effects.  I don't know how effective these are in overcoming the loss of higher-frequency cues, but they're something you could work with, and perhaps try to exaggerate.
Dick


On Sat, Aug 13, 2022 at 8:03 PM Richard F. Lyon <dicklyon@xxxxxxx> wrote:
Yes, good idea to find some solutions to the difficult.

Reviewing my book's Figure 22.7, there's a pretty good spectral notch cue to elevation in the 5.5-8 kHz region (and higher); 8 kHz might be enough for elevation up to about 45 degrees (find free book PDF via machinehearing.org -- search that blog for "free".)

For resolving front/back confusion, that's hard unless you add the effects of lateralization change with head turning.  Using a head tracker or gyro to change the lateral angle to the sound, relative to the head, is very effective for letting the user disambiguate, if they have time to move a little.  So it depends on what you're trying to do.

If it was impossible to localize sounds with a 16 kHz sample rate, it would be equally impossible to localize sounds with no energy about 8 kHz.  I don't think that's the case.  I can't hear anything about 8 kHz (unless it's quite intense), and I don't sense that I have any difficulty localizing sounds around me.  Probably if we measured though we'd find I'm not as accurate as a person with better hearing.

And I don't think the sample interval of 1/16000 sec provides a strong inherent limit on ITD accuracy.  The bandwidth of 8 kHz is about half of what's "normal", so theoretical TDOA resolution should be expected to be no worse than double normal, say 20–40 microseconds (about half of a sample interval) instead of 10–20 microseconds.  I wouldn't be surprised if the ITD resolution threshold was even closer to normal (around 1/4 sample interval), since our ITD-computing structure is dominated by lower-frequency input.

Dick




On Fri, Aug 12, 2022 at 9:20 PM Junfeng Li <junfeng.li.1979@xxxxxxxxx> wrote:
Dear Frederick,

Thank you so much for the references that you mentioned. 

"[...] up–down cues are located mainly in the 6–12-kHz band, and front–back cues in the 8–16-kHz band." 
According to this statement, it seems impossible to solve the problems of elevation perception and front-back confusion when the output signal is sampled at 16kHz. 
Though I know it is difficult, I always try to find some solutions.

Thanks again.

Best regards,
Junfeng 

On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun <fgallun@xxxxxxxxx> wrote:
The literature on the HRTF over the past 60 years has made it very clear that "[...] up–down cues are located mainly in the 6–12-kHz band, and front–back cues in the 8–16-kHz band." (Langendiijk and Bronkhorst, 2002)  

Here are a few places to start:

Langendijk, E. H. A., & Bronkhorst, A. W. (2002). Contribution of spectral cues to human sound localization. The Journal of the Acoustical Society of America, 112(4), 1583–1596. https://doi.org/10.1121/1.1501901

Mehrgardt, S., & Mellert, V. (1977). Transformation characteristics of the external human ear. The Journal of the Acoustical Society of America, 61(6), 1567–1576. https://doi.org/10.1121/1.381470

Shaw, E. a. G., & Teranishi, R. (1968). Sound Pressure Generated in an External‐Ear Replica and Real Human Ears by a Nearby Point Source. The Journal of the Acoustical Society of America, 44(1), 240–249. https://doi.org/10.1121/1.1911059

---------------------------------------------

Frederick (Erick) Gallun, PhD, FASA, FASHA | he/him/his

Professor, Oregon Hearing Research Center, Oregon Health & Science University
"Diversity is like being invited to a party, Inclusion is being asked to dance, and Belonging is dancing like no one’s watching" - Gregory Lewis


On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li <junfeng.li.1979@xxxxxxxxx> wrote:
Dear  Leslie,

When downsampling to 8/16kHz, we really found the localization accuracy decreases, even for horizon
Do you have any good ideas to solve it?

Thanks a lot.

Best regards,
Junfeng 


On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith <l.s.smith@xxxxxxxxxxxxx> wrote:
I'd also wonder about the time resolution: 16KHz = 1/16000 sec between
samples = 62 microseconds
.
That's relatively long for ITD (TDOA) estimation, which would suggest that
localisation of lower frequency signals would be impeded.

(I don't have evidence for this: it's just a suggestion).

--Leslie Smith

Junfeng Li wrote:
> Dear all,
>
> We are working on 3D audio rendering for signals with low sampling
> frequency.
> As you may know, the HRTFs  are normally measured at the high sampling
> frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of
> sound
> signals in our application is restricted to 16 kHz. Therefore, to render
> this low-frequency (≤8kHz) signal, one straight way is to first
> downsample
> the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound
> signals.
> However, the sound localization performance of the signal rendered with
> this approach is greatly decreased, especially elevation perception. To
> improve the sound localization performance, I am now wondering whether
> there is a certain good method to solve or mitigate this problem in this
> scenario.
>
> Any discussion is welcome.
>
> Thanks a lot again.
>
> Best regards,
> Junfeng
>


--
Prof Leslie Smith (Emeritus)
Computing Science & Mathematics,
University of Stirling, Stirling FK9 4LA
Scotland, UK
Tel +44 1786 467435
Web: http://www.cs.stir.ac.uk/~lss
Blog: http://lestheprof.com