[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency



Hi all, Particularly Leslie and Adam:


The ready availability of binaural information at sound onsets and other positive fluctuations of the amplitude envelope is well supported by decades of psychophysical evidence, including 20 years of my own publications. The overall evidence, and the theory which it motivates (“RESTART theory”) is reviewed in a 2020 chapter of the Springer Handbook on Auditory Research by myself, Les Bernstein, and Andrew Brown:

Stecker, G. C., Bernstein, L. R., and Brown, A. D. (2020). Binaural hearing with temporally complex signals. Chapter 5 in Goupell, M. J., Litovsky, R. Y, Popper, A. N., and Fay, R. R. (eds). Springer Handbook of Auditory Research Vol 73: Binaural Hearing. Switzerland: Springer International. doi:10.1007/978-3-030-57100-9 

Please contact me if you need help accessing the chapter. 

In quick summary, the evidence suggests that all forms of binaural cue (ITD of the envelope and fine structure, ILD, etc) available at any cochlear place (i.e. frequency) are specifically “sampled” at moments of positive envelope fluctuation. As Adam suggests, one obvious source of this “sampling” process is the strong adaptation exhibited in neural pathways prior to binaural interaction (e.g. hair cells, AN fibers, various cells of the cochlear nucleus). Indeed, phenomenological models that include realistic adaptive behavior exhibit many of the same properties observed psychophysically (Stecker 2020, Assoc Res Otolaryngol Abs 43)

A feature of the data which is sometimes overlooked is the apparent refractory nature of this “sampling” process. New samples, or “onsets” can occur in succession, but not much more quickly than 200-300 times per second (3-5 ms). Above that rate (e.g. for rapid paired pulses, “steady” tones, etc.) binaural processing is confined to the overall onset. This rate limitation itself defines what counts as an “onset” for binaural processing: below the critical rate, successive events each contribute roughly equally and independently to spatial perception. 

What does this have to do with spatial cue representation at low sampling rates? Many of the mentions in this thread quite rightly invoke linear systems theory to understand the consequences of limiting bandwidth (i.e. due to slow sampling) on these representations. Various tricks may be suggested to somewhat extend the effective bandwidth (e.g. non-uniform sampling, etc.). I don’t have much to add there, except to consider how the brain might do it. 

In my view, it is important to keep in mind that no mechanisms of the ear or brain are, in fact, linear. Neuronal adaptation is highly nonlinear and also temporally asymmetric. A consequence is dramatic over-representation of rapid onset-like events–events that, in a linear system, would imply very broad bandwidth. Thus, auditory “channels” are capable of representations that apparently exceed the narrow "bandwidth” implied by their cochlear-place selectivity. That notion seems absurd on its face, because many of us have been trained to think about auditory function as "quasi-linear” (e.g. using terms like “auditory filter” to refer to neural pathways that are clearly not filters). But in fact it should not be surprising based on the actual physiology. 

This has clear consequences for loads of phenomena in binaural and spatial hearing: precedence, binaural adaptation, jitter in CI pulse timing, “straightness”, etc. (Stecker, Dietz, and Stern 2019(A), JASA 145:1759). 

Thank you for your attention, and for the interesting discussion! 

-Chris





G. Christopher Stecker, Ph.D., F.A.S.A.

Director, Spatial Hearing Lab
Director, Research Technology
Boys Town National Research Hospital

Coordinating Editor, Psychological and Physiological Acoustics
Journal of the Acoustical Society of America











On Aug 15, 2022, at 3:23 AM, Prof Leslie Smith <l.s.smith@xxxxxxxxxxxxx> wrote:

Dear all:

Some years ago, I worked on using sound at onsets for calculating source
direction in reverberant environments [1]. It's kind-of obvious, because
after the onset, the sound at the ear/microphone is made up of energy both
from the source and from reflections.

Sampling rates are normally constant, and techniques for compression are
aimed at recreating the percept of the original sound: I am under the
impression that this doesn't extend to the percept of precise location of
the sound. Perhaps we need novel compression/decompression  techniques
that include the relevant data for source location.

[1] L.S. Smith, S. Collins Determining ITDs using two microphones on a
flat panel during onset intervals with a biologically inspired spike based
technique
IEEE Transactions of Audio, Speech and Language Processing, 15, 8,
2278-2286, (2007).

--Leslie Smith

Adam Weisser wrote:

1. Compressed sensing - This heavily researched signal-processing method
uses signal sparsity to faithfully reconstruct undersampled signals [1].

.....
Neural adaptation can be thought of as dense
sampling of the signal around its onset / transient portion, which becomes
more sparsely sampled quickly after the onset. Because of adaptation, this
effect is very illusive, but I believe that it is measurable
notwithstanding. I tried to demonstrate it psychoacoustically in Appendix
E of [4]. While I don't know how it relates to binaural processing
directly, there may be instantaneous effects that may be detectable there
too, given that the input to both processing types is the same.

All the best,
Adam.

...


--
Prof Leslie Smith (Emeritus)
Computing Science & Mathematics,
University of Stirling, Stirling FK9 4LA
Scotland, UK
Tel +44 1786 467435
Web: http://www.cs.stir.ac.uk/~lss
Blog: http://lestheprof.com