[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency



Hi Junfeng,

 

in visual VR, „hyperstereo“ has been used in some settings with head-mounted displays. That means that the distance between the two virtual cameras is set to a larger value than the interpupillary distance of the participant, so that binocular stereoscopic information is “exaggerated”.

Just a wild guess, but could you use a modified head geometry (e.g., a larger simulated head) when creating the BRIRs to make the relevant cues more salient even at the low sampling rate?

 

Daniel

 

---------------------------------

Prof. Dr. Daniel Oberfeld-Twistel

Johannes Gutenberg - Universitaet Mainz, Experimental Psychology &

Laboratoire ICube UMR7357 Université de Strasbourg

Wallstrasse 3

55122 Mainz

Germany

 

Phone ++49 (0) 6131 39 39274

Fax   ++49 (0) 6131 39 39416

http://www.staff.uni-mainz.de/oberfeld/

 

From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx> On Behalf Of Junfeng Li
Sent: Saturday, August 13, 2022 1:42 AM
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: On 3D audio rendering for signals with the low sampling frequency

 

Dear Frederick,

 

Thank you so much for the references that you mentioned. 

 

"[...] up–down cues are located mainly in the 6–12-kHz band, and front–back cues in the 8–16-kHz band." 

According to this statement, it seems impossible to solve the problems of elevation perception and front-back confusion when the output signal is sampled at 16kHz. 

Though I know it is difficult, I always try to find some solutions.

 

Thanks again.

 

Best regards,

Junfeng 

 

On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun <fgallun@xxxxxxxxx> wrote:

The literature on the HRTF over the past 60 years has made it very clear that "[...] up–down cues are located mainly in the 6–12-kHz band, and front–back cues in the 8–16-kHz band." (Langendiijk and Bronkhorst, 2002)  

 

Here are a few places to start:

 

Langendijk, E. H. A., & Bronkhorst, A. W. (2002). Contribution of spectral cues to human sound localization. The Journal of the Acoustical Society of America, 112(4), 1583–1596. https://doi.org/10.1121/1.1501901

 

Mehrgardt, S., & Mellert, V. (1977). Transformation characteristics of the external human ear. The Journal of the Acoustical Society of America, 61(6), 1567–1576. https://doi.org/10.1121/1.381470

 

Shaw, E. a. G., & Teranishi, R. (1968). Sound Pressure Generated in an External‐Ear Replica and Real Human Ears by a Nearby Point Source. The Journal of the Acoustical Society of America, 44(1), 240–249. https://doi.org/10.1121/1.1911059

 

---------------------------------------------

Frederick (Erick) Gallun, PhD, FASA, FASHA | he/him/his

Professor, Oregon Hearing Research Center, Oregon Health & Science University

"Diversity is like being invited to a party, Inclusion is being asked to dance, and Belonging is dancing like no one’s watching" - Gregory Lewis

 

 

On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li <junfeng.li.1979@xxxxxxxxx> wrote:

Dear  Leslie,

 

When downsampling to 8/16kHz, we really found the localization accuracy decreases, even for horizon

Do you have any good ideas to solve it?

 

Thanks a lot.

 

Best regards,

Junfeng 

 

 

On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith <l.s.smith@xxxxxxxxxxxxx> wrote:

I'd also wonder about the time resolution: 16KHz = 1/16000 sec between
samples = 62 microseconds
.
That's relatively long for ITD (TDOA) estimation, which would suggest that
localisation of lower frequency signals would be impeded.

(I don't have evidence for this: it's just a suggestion).

--Leslie Smith

Junfeng Li wrote:
> Dear all,
>
> We are working on 3D audio rendering for signals with low sampling
> frequency.
> As you may know, the HRTFs  are normally measured at the high sampling
> frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of
> sound
> signals in our application is restricted to 16 kHz. Therefore, to render
> this low-frequency (≤8kHz) signal, one straight way is to first
> downsample
> the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound
> signals.
> However, the sound localization performance of the signal rendered with
> this approach is greatly decreased, especially elevation perception. To
> improve the sound localization performance, I am now wondering whether
> there is a certain good method to solve or mitigate this problem in this
> scenario.
>
> Any discussion is welcome.
>
> Thanks a lot again.
>
> Best regards,
> Junfeng
>


--
Prof Leslie Smith (Emeritus)
Computing Science & Mathematics,
University of Stirling, Stirling FK9 4LA
Scotland, UK
Tel +44 1786 467435
Web: http://www.cs.stir.ac.uk/~lss
Blog: http://lestheprof.com