[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency
Hi Chris, all,
Good points. I'd be interested in reading the chapter.
A motivation of my earlier post was the idea that a filter (e.g. antialiasing) blurs sharp onsets. That might possibly affect the adaptation-dependent sampling process that you mention. Rather than fire with high rate (and temporal precision) at the precise onset, the AN would fire with a lower rate (and greater stochastic jitter) over the duration of the filter impulse response, resulting in a wider and/or noisier CCF peak.
A similar broadening effect is expected from a cochlear filter, but possibly limited by the asymmetric shape of its impulse responses (steep onset, shallow decay), and wide bandwidth at higher CFs. The idea is that this intrinsic broadening would be significantly increased by the external filter if that filter includes poles that are narrower than cochlear filters (and thus have a longer ringing decay).
According to this view, low-pass filtering (antialiasing, or stimulus manipulation in a perceptual experiment) might have effects beyond simply attenuating power in the high-frequency region. Those effects would depend on filter characteristics that are not usually reported. For example, a filter with sharp cutoff and flat pass-band (a holy grail for many filter-designers) requires multiple poles, some of which are quite narrow.
An experiment in which high-frequency content (e.g. of speech) is removed might thus find effects that are in part due to time-domain blurring by the low-pass filter. In designing a system that requires downsampling or resampling (as in Junfeng's original post), one might find effects due to blurring from the antialiasing filters. Those effects might be more salient with some filters (e.g. sharp cutoff) than others.
Of course, this is pure conjecture. Whether it makes sense theoretically requires careful modelling of the non-linear and time-variant transduction processes as mentioned by Chris, but it might be possible to test the idea empirically by controlling for the filters involved in a - say - low-pass speech perception experiment.
From earlier exchanges I gathered that localization of 48 kHz sound upsampled from 16kHz (and thus band-limited to 8 kHz or lower) was degraded when the sound was down-sampled again to 16 kHz, whereas localization was good at 48 kHz. If correct (apologies if not!) this suggests that the effect was not due to presence or absence of power above 8 kHz, and thus possibly resulted instead from the antialiasing filter involved in the downsampling.
All this applies to ITD for which temporal cues are obviously important, whereas the original post was concerned with elevation and front-back cues that are usually attributed to spectral cues (e.g. notches). However, I think I recall papers of Eric Young, Monty Escabi and others that suggested that they might in part be extracted based on temporal cues. If so, they too might be affected by temporal blurring.
Take care,
Alain
P.S. My Neuron paper on filters explains some of this issues (https://www.cell.com/neuron/pdf/S0896-6273(19)30174-6.pdf). Its focus is electrophysiology but some aspects are relevant to audio too.
> On 16 Aug 2022, at 20:15, Chris Stecker <cstecker@xxxxxxxxxxxxxxxxxx> wrote:
>
> Hi all, Particularly Leslie and Adam:
>
>
> The ready availability of binaural information at sound onsets and other positive fluctuations of the amplitude envelope is well supported by decades of psychophysical evidence, including 20 years of my own publications. The overall evidence, and the theory which it motivates (“RESTART theory”) is reviewed in a 2020 chapter of the Springer Handbook on Auditory Research by myself, Les Bernstein, and Andrew Brown:
>
> Stecker, G. C., Bernstein, L. R., and Brown, A. D. (2020). Binaural hearing with temporally complex signals. Chapter 5 in Goupell, M. J., Litovsky, R. Y, Popper, A. N., and Fay, R. R. (eds). Springer Handbook of Auditory Research Vol 73: Binaural Hearing. Switzerland: Springer International. doi:10.1007/978-3-030-57100-9
>
> Please contact me if you need help accessing the chapter.
>
> In quick summary, the evidence suggests that all forms of binaural cue (ITD of the envelope and fine structure, ILD, etc) available at any cochlear place (i.e. frequency) are specifically “sampled” at moments of positive envelope fluctuation. As Adam suggests, one obvious source of this “sampling” process is the strong adaptation exhibited in neural pathways prior to binaural interaction (e.g. hair cells, AN fibers, various cells of the cochlear nucleus). Indeed, phenomenological models that include realistic adaptive behavior exhibit many of the same properties observed psychophysically (Stecker 2020, Assoc Res Otolaryngol Abs 43).
>
> A feature of the data which is sometimes overlooked is the apparent refractory nature of this “sampling” process. New samples, or “onsets” can occur in succession, but not much more quickly than 200-300 times per second (3-5 ms). Above that rate (e.g. for rapid paired pulses, “steady” tones, etc.) binaural processing is confined to the overall onset. This rate limitation itself defines what counts as an “onset” for binaural processing: below the critical rate, successive events each contribute roughly equally and independently to spatial perception.
>
> What does this have to do with spatial cue representation at low sampling rates? Many of the mentions in this thread quite rightly invoke linear systems theory to understand the consequences of limiting bandwidth (i.e. due to slow sampling) on these representations. Various tricks may be suggested to somewhat extend the effective bandwidth (e.g. non-uniform sampling, etc.). I don’t have much to add there, except to consider how the brain might do it.
>
> In my view, it is important to keep in mind that no mechanisms of the ear or brain are, in fact, linear. Neuronal adaptation is highly nonlinear and also temporally asymmetric. A consequence is dramatic over-representation of rapid onset-like events–events that, in a linear system, would imply very broad bandwidth. Thus, auditory “channels” are capable of representations that apparently exceed the narrow "bandwidth” implied by their cochlear-place selectivity. That notion seems absurd on its face, because many of us have been trained to think about auditory function as "quasi-linear” (e.g. using terms like “auditory filter” to refer to neural pathways that are clearly not filters). But in fact it should not be surprising based on the actual physiology.
>
> This has clear consequences for loads of phenomena in binaural and spatial hearing: precedence, binaural adaptation, jitter in CI pulse timing, “straightness”, etc. (Stecker, Dietz, and Stern 2019(A), JASA 145:1759).
>
> Thank you for your attention, and for the interesting discussion!
>
> -Chris
>
>
>
>
> —
>
> G. Christopher Stecker, Ph.D., F.A.S.A.
>
> Director, Spatial Hearing Lab
> Director, Research Technology
> Boys Town National Research Hospital
>
> Coordinating Editor, Psychological and Physiological Acoustics
> Journal of the Acoustical Society of America
>
>
> cstecker@xxxxxxxxxxxxxxxxxx
> www.spatialhearing.org
>
>
>
>
>
>
>
>
>
>> On Aug 15, 2022, at 3:23 AM, Prof Leslie Smith <l.s.smith@xxxxxxxxxxxxx> wrote:
>>
>> Dear all:
>>
>> Some years ago, I worked on using sound at onsets for calculating source
>> direction in reverberant environments [1]. It's kind-of obvious, because
>> after the onset, the sound at the ear/microphone is made up of energy both
>> from the source and from reflections.
>>
>> Sampling rates are normally constant, and techniques for compression are
>> aimed at recreating the percept of the original sound: I am under the
>> impression that this doesn't extend to the percept of precise location of
>> the sound. Perhaps we need novel compression/decompression techniques
>> that include the relevant data for source location.
>>
>> [1] L.S. Smith, S. Collins Determining ITDs using two microphones on a
>> flat panel during onset intervals with a biologically inspired spike based
>> technique
>> IEEE Transactions of Audio, Speech and Language Processing, 15, 8,
>> 2278-2286, (2007).
>>
>> --Leslie Smith
>>
>> Adam Weisser wrote:
>>
>>> 1. Compressed sensing - This heavily researched signal-processing method
>>> uses signal sparsity to faithfully reconstruct undersampled signals [1].
>>>
>> .....
>>> Neural adaptation can be thought of as dense
>>> sampling of the signal around its onset / transient portion, which becomes
>>> more sparsely sampled quickly after the onset. Because of adaptation, this
>>> effect is very illusive, but I believe that it is measurable
>>> notwithstanding. I tried to demonstrate it psychoacoustically in Appendix
>>> E of [4]. While I don't know how it relates to binaural processing
>>> directly, there may be instantaneous effects that may be detectable there
>>> too, given that the input to both processing types is the same.
>>>
>>> All the best,
>>> Adam.
>>>
>> ...
>>
>>
>> --
>> Prof Leslie Smith (Emeritus)
>> Computing Science & Mathematics,
>> University of Stirling, Stirling FK9 4LA
>> Scotland, UK
>> Tel +44 1786 467435
>> Web: http://www.cs.stir.ac.uk/~lss
>> Blog: http://lestheprof.com
>