Thank you so much for your discussions.
Thanks for sharing this observation here. I do not have a solution now but curious to know more.
I can relate the loss in elevation to poor capture of the spectral notches present in the HRTF. But I did not assume that the notches beyond 8kHz are this crucial. Are the HRTF personalized?
Yeah. In theory, we can obtain high localization accuracy when individualized HRTFs are used. However, we found that even if individualized HRTFs and head tracking are used, sound localization is still not so good as we expected when using headphone rendering. Especially, the errors in elevation perception are still large. Though we know that localization accuracy in elevation is greatly lower than that in horizontal plan even for normal-hearing listeners.
Also, I am now wondering, is it always the case that elevation information is poor for 16 kHz audio signals. Is there literature on this?
Just a quick shot, I will also try downsampling (without low pass filtering) the HRTF to 16 kHz and see if the aliased HRTF spectrum significantly corrupts the 3-D perception. I will bet - not much. But will keep fingers crossed.
I am sorry that I have no exact literatur on this issue. But, you can have a try and see the results. As an example, the sampling frequency for telephone speech is normally 8kHz, when we try to 3D render this 8kHz speech, It is quite difficult to perceive elevation. This is the main problem that we have to consider.
Look forward to your results and further dicussion.
Best regards,
Junfeng
Dear Dick,
Thanks a lot for your information.
Yeah, the main problem for us is the limitation of the 16kHz sampling frequency at the output side. Therefore, even if we do bandwidth extension for input signal, we have to downsample to 16kHz after 3D rendering processing. I am wondering there is any possible/potential method using some pychoacoustic principle, like that?
Thanks again.
Best regards
Junfeng
You could do "bandwidth extension" on the signals you want to spatialize, e.g. with some of the methods at
and then apply the high-sample-rate HRTFs.
Of course, if your system has a 16 ksps limitation on the output side, that will be of no use.
Dick
Dear all,
We are working on 3D audio rendering for signals with low sampling frequency.
As you may know, the HRTFs are normally measured at the high sampling frequency, e.g., 48kHz or 44.1kHz. However, the sampling frequency of sound signals in our application is restricted to 16 kHz. Therefore, to render this low-frequency (≤8kHz) signal, one straight way is to first downsample the HRTFs from 48kHz/44.1kHz to 16kHz and then convolve with sound signals. However, the sound localization performance of the signal rendered with this approach is greatly decreased, especially elevation perception. To improve the sound localization performance, I am now wondering whether there is a certain good method to solve or mitigate this problem in this scenario.
Any discussion is welcome.
Thanks a lot again.
Best regards,
Junfeng