Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency (Alain de Cheveigne )


Subject: Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency
From:    Alain de Cheveigne  <alain.de.cheveigne@xxxxxxxx>
Date:    Wed, 17 Aug 2022 10:04:35 +0200

Hi Chris, all, Good points. I'd be interested in reading the chapter.=20 A motivation of my earlier post was the idea that a filter (e.g. = antialiasing) blurs sharp onsets. That might possibly affect the = adaptation-dependent sampling process that you mention. Rather than fire = with high rate (and temporal precision) at the precise onset, the AN = would fire with a lower rate (and greater stochastic jitter) over the = duration of the filter impulse response, resulting in a wider and/or = noisier CCF peak. A similar broadening effect is expected from a cochlear filter, but = possibly limited by the asymmetric shape of its impulse responses (steep = onset, shallow decay), and wide bandwidth at higher CFs. The idea is = that this intrinsic broadening would be significantly increased by the = external filter if that filter includes poles that are narrower than = cochlear filters (and thus have a longer ringing decay). =20 According to this view, low-pass filtering (antialiasing, or stimulus = manipulation in a perceptual experiment) might have effects beyond = simply attenuating power in the high-frequency region. Those effects = would depend on filter characteristics that are not usually reported. = For example, a filter with sharp cutoff and flat pass-band (a holy grail = for many filter-designers) requires multiple poles, some of which are = quite narrow.=20 An experiment in which high-frequency content (e.g. of speech) is = removed might thus find effects that are in part due to time-domain = blurring by the low-pass filter. In designing a system that requires = downsampling or resampling (as in Junfeng's original post), one might = find effects due to blurring from the antialiasing filters. Those = effects might be more salient with some filters (e.g. sharp cutoff) than = others. Of course, this is pure conjecture. Whether it makes sense = theoretically requires careful modelling of the non-linear and = time-variant transduction processes as mentioned by Chris, but it might = be possible to test the idea empirically by controlling for the filters = involved in a - say - low-pass speech perception experiment. =20 =46rom earlier exchanges I gathered that localization of 48 kHz sound = upsampled from 16kHz (and thus band-limited to 8 kHz or lower) was = degraded when the sound was down-sampled again to 16 kHz, whereas = localization was good at 48 kHz. If correct (apologies if not!) this = suggests that the effect was not due to presence or absence of power = above 8 kHz, and thus possibly resulted instead from the antialiasing = filter involved in the downsampling. All this applies to ITD for which temporal cues are obviously important, = whereas the original post was concerned with elevation and front-back = cues that are usually attributed to spectral cues (e.g. notches). = However, I think I recall papers of Eric Young, Monty Escabi and others = that suggested that they might in part be extracted based on temporal = cues. If so, they too might be affected by temporal blurring.=20 Take care,=20 Alain P.S. My Neuron paper on filters explains some of this issues = (https://www.cell.com/neuron/pdf/S0896-6273(19)30174-6.pdf). Its focus = is electrophysiology but some aspects are relevant to audio too. > On 16 Aug 2022, at 20:15, Chris Stecker <cstecker@xxxxxxxx> = wrote: >=20 > Hi all, Particularly Leslie and Adam: >=20 >=20 > The ready availability of binaural information at sound onsets and = other positive fluctuations of the amplitude envelope is well supported = by decades of psychophysical evidence, including 20 years of my own = publications. The overall evidence, and the theory which it motivates = (=E2=80=9CRESTART theory=E2=80=9D) is reviewed in a 2020 chapter of the = Springer Handbook on Auditory Research by myself, Les Bernstein, and = Andrew Brown: >=20 > Stecker, G. C., Bernstein, L. R., and Brown, A. D. (2020). Binaural = hearing with temporally complex signals. Chapter 5 in Goupell, M. J., = Litovsky, R. Y, Popper, A. N., and Fay, R. R. (eds). Springer Handbook = of Auditory Research Vol 73: Binaural Hearing. Switzerland: Springer = International. doi:10.1007/978-3-030-57100-9=20 >=20 > Please contact me if you need help accessing the chapter.=20 >=20 > In quick summary, the evidence suggests that all forms of binaural cue = (ITD of the envelope and fine structure, ILD, etc) available at any = cochlear place (i.e. frequency) are specifically =E2=80=9Csampled=E2=80=9D= at moments of positive envelope fluctuation. As Adam suggests, one = obvious source of this =E2=80=9Csampling=E2=80=9D process is the strong = adaptation exhibited in neural pathways prior to binaural interaction = (e.g. hair cells, AN fibers, various cells of the cochlear nucleus). = Indeed, phenomenological models that include realistic adaptive behavior = exhibit many of the same properties observed psychophysically (Stecker = 2020, Assoc Res Otolaryngol Abs 43).=20 >=20 > A feature of the data which is sometimes overlooked is the apparent = refractory nature of this =E2=80=9Csampling=E2=80=9D process. New = samples, or =E2=80=9Consets=E2=80=9D can occur in succession, but not = much more quickly than 200-300 times per second (3-5 ms). Above that = rate (e.g. for rapid paired pulses, =E2=80=9Csteady=E2=80=9D tones, = etc.) binaural processing is confined to the overall onset. This rate = limitation itself defines what counts as an =E2=80=9Conset=E2=80=9D for = binaural processing: below the critical rate, successive events each = contribute roughly equally and independently to spatial perception.=20 >=20 > What does this have to do with spatial cue representation at low = sampling rates? Many of the mentions in this thread quite rightly invoke = linear systems theory to understand the consequences of limiting = bandwidth (i.e. due to slow sampling) on these representations. Various = tricks may be suggested to somewhat extend the effective bandwidth (e.g. = non-uniform sampling, etc.). I don=E2=80=99t have much to add there, = except to consider how the brain might do it.=20 >=20 > In my view, it is important to keep in mind that no mechanisms of the = ear or brain are, in fact, linear. Neuronal adaptation is highly = nonlinear and also temporally asymmetric. A consequence is dramatic = over-representation of rapid onset-like events=E2=80=93events that, in a = linear system, would imply very broad bandwidth. Thus, auditory = =E2=80=9Cchannels=E2=80=9D are capable of representations that = apparently exceed the narrow "bandwidth=E2=80=9D implied by their = cochlear-place selectivity. That notion seems absurd on its face, = because many of us have been trained to think about auditory function as = "quasi-linear=E2=80=9D (e.g. using terms like =E2=80=9Cauditory = filter=E2=80=9D to refer to neural pathways that are clearly not = filters). But in fact it should not be surprising based on the actual = physiology.=20 >=20 > This has clear consequences for loads of phenomena in binaural and = spatial hearing: precedence, binaural adaptation, jitter in CI pulse = timing, =E2=80=9Cstraightness=E2=80=9D, etc. (Stecker, Dietz, and Stern = 2019(A), JASA 145:1759).=20 >=20 > Thank you for your attention, and for the interesting discussion!=20 >=20 > -Chris >=20 >=20 >=20 >=20 > =E2=80=94 >=20 > G. Christopher Stecker, Ph.D., F.A.S.A. >=20 > Director, Spatial Hearing Lab > Director, Research Technology > Boys Town National Research Hospital >=20 > Coordinating Editor, Psychological and Physiological Acoustics > Journal of the Acoustical Society of America >=20 >=20 > cstecker@xxxxxxxx > www.spatialhearing.org >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >=20 >> On Aug 15, 2022, at 3:23 AM, Prof Leslie Smith = <l.s.smith@xxxxxxxx> wrote: >>=20 >> Dear all: >>=20 >> Some years ago, I worked on using sound at onsets for calculating = source >> direction in reverberant environments [1]. It's kind-of obvious, = because >> after the onset, the sound at the ear/microphone is made up of energy = both >> from the source and from reflections. >>=20 >> Sampling rates are normally constant, and techniques for compression = are >> aimed at recreating the percept of the original sound: I am under the >> impression that this doesn't extend to the percept of precise = location of >> the sound. Perhaps we need novel compression/decompression = techniques >> that include the relevant data for source location. >>=20 >> [1] L.S. Smith, S. Collins Determining ITDs using two microphones on = a >> flat panel during onset intervals with a biologically inspired spike = based >> technique >> IEEE Transactions of Audio, Speech and Language Processing, 15, 8, >> 2278-2286, (2007). >>=20 >> --Leslie Smith >>=20 >> Adam Weisser wrote: >>=20 >>> 1. Compressed sensing - This heavily researched signal-processing = method >>> uses signal sparsity to faithfully reconstruct undersampled signals = [1]. >>>=20 >> ..... >>> Neural adaptation can be thought of as dense >>> sampling of the signal around its onset / transient portion, which = becomes >>> more sparsely sampled quickly after the onset. Because of = adaptation, this >>> effect is very illusive, but I believe that it is measurable >>> notwithstanding. I tried to demonstrate it psychoacoustically in = Appendix >>> E of [4]. While I don't know how it relates to binaural processing >>> directly, there may be instantaneous effects that may be detectable = there >>> too, given that the input to both processing types is the same. >>>=20 >>> All the best, >>> Adam. >>>=20 >> ... >>=20 >>=20 >> --=20 >> Prof Leslie Smith (Emeritus) >> Computing Science & Mathematics, >> University of Stirling, Stirling FK9 4LA >> Scotland, UK >> Tel +44 1786 467435 >> Web: http://www.cs.stir.ac.uk/~lss >> Blog: http://lestheprof.com >=20


This message came from the mail archive
src/postings/2022/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University