Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency

Subject: Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency

From: Junfeng Li <junfeng.li.1979@xxxxxxxxx>

Date: Sat, 13 Aug 2022 07:42:02 +0800

Approved-by: junfeng.li.1979@xxxxxxxxx

Arc-authentication-results: i=1; mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b="oMpDyY/d"; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.104 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:in-reply-to:to:comments:subject:from:sender:reply-to :date:message-id:references:mime-version:approved-by:dkim-signature; bh=5u3CNuTXxaZD/5BFTKpD30H3Onwt+tM1YThP0SUiLk8=; b=cqLfbK54XrhUeDetHd/j6D/lGCD8VWGE4kcso8Xkiyfw9b27ISoGLMrKRqePXgO48e A3VEZDrdjU7r7ny2QkL40KwXJSwU37kn7JVehTGa4NCo/8G8bqhGWbVkj78Gsvt1N1QI aHdwOM0bOTcgAG1CtVmj8y+WgKHnFseQWyJ00DwEduU4A5geVhywaN5z8+vY4yq8ByrH 9YXTX3bPV8a4GZjL3KITr3bHUB6xaSveWfbVZNb4QWWeCDxRnYdtvV/4IQBHKA4IaK+T eTn9+azph24Zb0dxdmg0S94Uf/mt2mnpPGs65pwrQPJl9vj89SEVfM7OY6Vps+iLaqRB eXZA==

Arc-seal: i=1; a=rsa-sha256; t=1660364239; cv=none; d=google.com; s=arc-20160816; b=qk3woW4wB4X+yET6M6cK+XB5W0ALtrQ8zkGIQdsxbzMUIdxcuZHwNuYGr+nsubHiSf SbgZCj/Tx4NqHbEkQdaTvUCMFQR8bYeGSa7cZET9ZDHvwW1XERIOI+MMxpkRMWIZaS2R 3sS0uZUIIzmUNDKOvceQtRIWfPaJEG+Lt1Lzl8QUKvvNXM9wNMiDhU51jDGKyfUEUH5P g8YPIStwr5BgH3hkiuXmPkcBR2LQRrAViofJ3fcutI88jh0CIYJVWQS1cUvuKmbtvrxn snDzhgdP4blx0VTp4wVl06Soos9xWGRYxjm4+VzkKq8+p35/zAxVnqiqG5Lwm0OI3ZLv 6oOQ==

Authentication-results: mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b="oMpDyY/d"; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.104 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com

Comments: To: Frederick Gallun <fgallun@xxxxxxxxx>

Delivered-to: dan.ellis@xxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; d=LISTS.MCGILL.CA; s=SELECTOR1; c=relaxed/relaxed; bh=5u3CNuTXxaZD/5BFTKpD30H3Onwt+tM1YThP0SUiLk8=; i=@LISTS.MCGILL.CA; h=Approved-By:MIME-Version:References:Content-Type:Message-ID:Date:Reply-To:Sender:From:Subject:To:In-Reply-To:List-Help:List-Unsubscribe:List-Subscribe:List-Owner:List-Archive; b=oMpDyY/ddPOgVxwFUR/mqyW/PBU5J5/hAUQUIeqKDUExtPt4+cU8TmdHGfRtnFRkblAHFuaq9UNCKK0FSvJyOCx4TXabUtOnv9n0IUxbzd0b6N3dncF2Bv6Oy8gd0XQRdN12fJ/gCZ8GxL87EpAXVxrhJS0wdMd3jfi4GGKlLiGW36xUAfEuy95u+awTw67dOV5pytUiYNUzXLfaFGojE3TAo4AismiqfnM0gfgV1HCG5AQxbmZ8SCfutDa+hnPNC/F8xHIJnmUxcJ2Rt7KzCSyXEpKn+YqTBmpD9ssiAPxb/d5GNwY2ssIh6z3AHVOhsnt3aflfREZNIWZCnGt3wg==

In-reply-to: <CAGZSQibXBsLo--pNEgG6uHh6vicZQ9Pf8sviMTf96oD=19rkpQ@mail.gmail.com>

List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

References: <CAJ0_ud+Joi36yuX+1q8qE650BBTf3hiTCu2J2hMs1U4my_F4ZQ@mail.gmail.com> <02975e04c31b34dffc9629927fe69753.squirrel@mail.cs.stir.ac.uk> <CAJ0_udJ9_VsWn_Zrn3cwPXHR7_744kAvyU_h8QvXuZCu3Ginqg@mail.gmail.com> <CAGZSQibXBsLo--pNEgG6uHh6vicZQ9Pf8sviMTf96oD=19rkpQ@mail.gmail.com>

Reply-to: Junfeng Li <junfeng.li.1979@xxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Dear Frederick,

Thank you so much for the references that you mentioned.

"[...] up–down cues are located mainly in the 6–12-kHz band, and front–back cues in the 8–16-kHz band."

According to this statement, it seems impossible to solve the problems of elevation perception and front-back confusion when the output signal is sampled at 16kHz.

Though I know it is difficult, I always try to find some solutions.

Thanks again.

Best regards,

Junfeng

On Sat, Aug 13, 2022 at 12:50 AM Frederick Gallun <fgallun@xxxxxxxxx> wrote:

The literature on the HRTF over the past 60 years has made it very clear that "[...] up–down cues are located mainly in the 6–12-kHz band, and front–back cues in the 8–16-kHz band." (Langendiijk and Bronkhorst, 2002)

Here are a few places to start:

Langendijk, E. H. A., & Bronkhorst, A. W. (2002). Contribution of spectral cues to human sound localization. The Journal of the Acoustical Society of America, 112(4), 1583–1596. https://doi.org/10.1121/1.1501901

Mehrgardt, S., & Mellert, V. (1977). Transformation characteristics of the external human ear. The Journal of the Acoustical Society of America, 61(6), 1567–1576. https://doi.org/10.1121/1.381470

Shaw, E. a. G., & Teranishi, R. (1968). Sound Pressure Generated in an External‐Ear Replica and Real Human Ears by a Nearby Point Source. The Journal of the Acoustical Society of America, 44(1), 240–249. https://doi.org/10.1121/1.1911059

---------------------------------------------
Frederick (Erick) Gallun, PhD, FASA, FASHA | he/him/his
Professor, Oregon Hearing Research Center, Oregon Health & Science University
"Diversity is like being invited to a party, Inclusion is being asked to dance, and Belonging is dancing like no one’s watching" - Gregory Lewis

On Thu, Aug 11, 2022 at 11:59 PM Junfeng Li <junfeng.li.1979@xxxxxxxxx> wrote:
Dear Leslie,

When downsampling to 8/16kHz, we really found the localization accuracy decreases, even for horizon
Do you have any good ideas to solve it?

Thanks a lot.

Best regards,
Junfeng

On Thu, Aug 11, 2022 at 4:04 PM Prof Leslie Smith <l.s.smith@xxxxxxxxxxxxx> wrote:
I'd also wonder about the time resolution: 16KHz = 1/16000 sec between
samples = 62 microseconds
.
That's relatively long for ITD (TDOA) estimation, which would suggest that
localisation of lower frequency signals would be impeded.

(I don't have evidence for this: it's just a suggestion).

--Leslie Smith

Junfeng Li wrote:
> Dear all,
>
> We are working on 3D audio rendering for signals with low sampling
> frequency.
> As you may know, the HRTFs are normally measured at the high sampling
> frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of
> sound
> signals in our application is restricted to 16 kHz. Therefore, to render
> this low-frequency (≤8kHz) signal, one straight way is to first
> downsample
> the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound
> signals.
> However, the sound localization performance of the signal rendered with
> this approach is greatly decreased, especially elevation perception. To
> improve the sound localization performance, I am now wondering whether
> there is a certain good method to solve or mitigate this problem in this
> scenario.
>
> Any discussion is welcome.
>
> Thanks a lot again.
>
> Best regards,
> Junfeng
>

--
Prof Leslie Smith (Emeritus)
Computing Science & Mathematics,
University of Stirling, Stirling FK9 4LA
Scotland, UK
Tel +44 1786 467435
Web: http://www.cs.stir.ac.uk/~lss
Blog: http://lestheprof.com