Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency (Chris Stecker )

Subject: Re: [AUDITORY] On 3D audio rendering for signals with the low sampling frequency From: Chris Stecker <cstecker@xxxxxxxx> Date: Tue, 16 Aug 2022 13:15:47 -0500 --Apple-Mail=_434EBE6B-CCF8-49AD-9DC0-92A2E3581F49 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi all, Particularly Leslie and Adam: The ready availability of binaural information at sound onsets and other = positive fluctuations of the amplitude envelope is well supported by = decades of psychophysical evidence, including 20 years of my own = publications. The overall evidence, and the theory which it motivates = (=E2=80=9CRESTART theory=E2=80=9D) is reviewed in a 2020 chapter of the = Springer Handbook on Auditory Research by myself, Les Bernstein, and = Andrew Brown: Stecker, G. C., Bernstein, L. R., and Brown, A. D. (2020). Binaural = hearing with temporally complex signals. Chapter 5 in Goupell, M. J., = Litovsky, R. Y, Popper, A. N., and Fay, R. R. (eds). Springer Handbook = of Auditory Research Vol 73: Binaural Hearing. Switzerland: Springer = International. doi:10.1007/978-3-030-57100-9=20 Please contact me if you need help accessing the chapter.=20 In quick summary, the evidence suggests that all forms of binaural cue = (ITD of the envelope and fine structure, ILD, etc) available at any = cochlear place (i.e. frequency) are specifically =E2=80=9Csampled=E2=80=9D= at moments of positive envelope fluctuation. As Adam suggests, one = obvious source of this =E2=80=9Csampling=E2=80=9D process is the strong = adaptation exhibited in neural pathways prior to binaural interaction = (e.g. hair cells, AN fibers, various cells of the cochlear nucleus). = Indeed, phenomenological models that include realistic adaptive behavior = exhibit many of the same properties observed psychophysically (Stecker = 2020, Assoc Res Otolaryngol Abs 43).=20 A feature of the data which is sometimes overlooked is the apparent = refractory nature of this =E2=80=9Csampling=E2=80=9D process. New = samples, or =E2=80=9Consets=E2=80=9D can occur in succession, but not = much more quickly than 200-300 times per second (3-5 ms). Above that = rate (e.g. for rapid paired pulses, =E2=80=9Csteady=E2=80=9D tones, = etc.) binaural processing is confined to the overall onset. This rate = limitation itself defines what counts as an =E2=80=9Conset=E2=80=9D for = binaural processing: below the critical rate, successive events each = contribute roughly equally and independently to spatial perception.=20 What does this have to do with spatial cue representation at low = sampling rates? Many of the mentions in this thread quite rightly invoke = linear systems theory to understand the consequences of limiting = bandwidth (i.e. due to slow sampling) on these representations. Various = tricks may be suggested to somewhat extend the effective bandwidth (e.g. = non-uniform sampling, etc.). I don=E2=80=99t have much to add there, = except to consider how the brain might do it.=20 In my view, it is important to keep in mind that no mechanisms of the = ear or brain are, in fact, linear. Neuronal adaptation is highly = nonlinear and also temporally asymmetric. A consequence is dramatic = over-representation of rapid onset-like events=E2=80=93events that, in a = linear system, would imply very broad bandwidth. Thus, auditory = =E2=80=9Cchannels=E2=80=9D are capable of representations that = apparently exceed the narrow "bandwidth=E2=80=9D implied by their = cochlear-place selectivity. That notion seems absurd on its face, = because many of us have been trained to think about auditory function as = "quasi-linear=E2=80=9D (e.g. using terms like =E2=80=9Cauditory = filter=E2=80=9D to refer to neural pathways that are clearly not = filters). But in fact it should not be surprising based on the actual = physiology.=20 This has clear consequences for loads of phenomena in binaural and = spatial hearing: precedence, binaural adaptation, jitter in CI pulse = timing, =E2=80=9Cstraightness=E2=80=9D, etc. (Stecker, Dietz, and Stern = 2019(A), JASA 145:1759).=20 Thank you for your attention, and for the interesting discussion!=20 -Chris =E2=80=94 G. Christopher Stecker, Ph.D., F.A.S.A. Director, Spatial Hearing Lab Director, Research Technology Boys Town National Research Hospital Coordinating Editor, Psychological and Physiological Acoustics Journal of the Acoustical Society of America cstecker@xxxxxxxx www.spatialhearing.org > On Aug 15, 2022, at 3:23 AM, Prof Leslie Smith = <l.s.smith@xxxxxxxx> wrote: >=20 > Dear all: >=20 > Some years ago, I worked on using sound at onsets for calculating = source > direction in reverberant environments [1]. It's kind-of obvious, = because > after the onset, the sound at the ear/microphone is made up of energy = both > from the source and from reflections. >=20 > Sampling rates are normally constant, and techniques for compression = are > aimed at recreating the percept of the original sound: I am under the > impression that this doesn't extend to the percept of precise location = of > the sound. Perhaps we need novel compression/decompression techniques > that include the relevant data for source location. >=20 > [1] L.S. Smith, S. Collins Determining ITDs using two microphones on a > flat panel during onset intervals with a biologically inspired spike = based > technique > IEEE Transactions of Audio, Speech and Language Processing, 15, 8, > 2278-2286, (2007). >=20 > --Leslie Smith >=20 > Adam Weisser wrote: >=20 >> 1. Compressed sensing - This heavily researched signal-processing = method >> uses signal sparsity to faithfully reconstruct undersampled signals = [1]. >>=20 > ..... >> Neural adaptation can be thought of as dense >> sampling of the signal around its onset / transient portion, which = becomes >> more sparsely sampled quickly after the onset. Because of adaptation, = this >> effect is very illusive, but I believe that it is measurable >> notwithstanding. I tried to demonstrate it psychoacoustically in = Appendix >> E of [4]. While I don't know how it relates to binaural processing >> directly, there may be instantaneous effects that may be detectable = there >> too, given that the input to both processing types is the same. >>=20 >> All the best, >> Adam. >>=20 > ... >=20 >=20 > --=20 > Prof Leslie Smith (Emeritus) > Computing Science & Mathematics, > University of Stirling, Stirling FK9 4LA > Scotland, UK > Tel +44 1786 467435 > Web: http://www.cs.stir.ac.uk/~lss > Blog: http://lestheprof.com --Apple-Mail=_434EBE6B-CCF8-49AD-9DC0-92A2E3581F49 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D"">Hi = all, Particularly Leslie and Adam:<div class=3D""><br = class=3D""></div><div class=3D""><br class=3D""></div><div class=3D"">The = ready availability of binaural information at sound onsets and other = positive fluctuations of the amplitude envelope is well supported by = decades of psychophysical evidence, including 20 years of my own = publications. The overall evidence, and the theory which it motivates = (=E2=80=9CRESTART theory=E2=80=9D) is reviewed in a 2020 chapter of the = Springer Handbook on Auditory Research by myself, Les Bernstein, and = Andrew Brown:</div><div class=3D""><br class=3D""></div><div class=3D"">    <b class=3D""><span = style=3D"font-size:10.0pt;mso-bidi-font-size:12.0pt; font-family:Arial;mso-fareast-font-family:"Times New = Roman";mso-ansi-language: EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA" = class=3D"">Stecker, G. C.,</span></b><span = style=3D"font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial;mso-= fareast-font-family: "Times New = Roman";mso-ansi-language:EN-US;mso-fareast-language:EN-US; mso-bidi-language:AR-SA;mso-bidi-font-weight:bold" class=3D""> = Bernstein, L. R., and Brown, A. D. (<b class=3D"">2020</b>). Binaural hearing with temporally = complex signals. Chapter 5 in Goupell, M. J., Litovsky, R. Y, Popper, A. N., and Fay, R. = R. (eds). <i class=3D"">Springer Handbook of Auditory Research Vol 73: Binaural Hearing</i>. Switzerland: Springer = International. doi:10.1007/978-3-030-57100-9 </span></div><div class=3D""><span = style=3D"font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial;mso-= fareast-font-family: "Times New = Roman";mso-ansi-language:EN-US;mso-fareast-language:EN-US; mso-bidi-language:AR-SA;mso-bidi-font-weight:bold" class=3D""><br = class=3D""></span></div><div class=3D""><span = style=3D"font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial;mso-= fareast-font-family: "Times New = Roman";mso-ansi-language:EN-US;mso-fareast-language:EN-US; mso-bidi-language:AR-SA;mso-bidi-font-weight:bold" class=3D"">Please = contact me if you need help accessing the = chapter. </span></div><div class=3D""><span = style=3D"font-size:10.0pt;mso-bidi-font-size:12.0pt;font-family:Arial;mso-= fareast-font-family: "Times New = Roman";mso-ansi-language:EN-US;mso-fareast-language:EN-US; mso-bidi-language:AR-SA;mso-bidi-font-weight:bold" class=3D""><br = class=3D""></span></div><div class=3D""><font face=3D"Arial" size=3D"2" = class=3D"">In quick summary, the evidence suggests that all forms of = binaural cue (ITD of the envelope and fine structure, ILD, etc) = available at any cochlear place (i.e. frequency) are = specifically =E2=80=9Csampled=E2=80=9D at moments of positive = envelope fluctuation. As Adam suggests, o</font><font face=3D"Arial" = size=3D"2" class=3D"">ne obvious source of this =E2=80=9Csampling=E2=80= =9D process is the strong adaptation exhibited in neural pathways prior = to binaural interaction (e.g. hair cells, AN fibers, various cells = of the cochlear nucleus). Indeed, phenomenological models that = include realistic adaptive behavior exhibit many of the same properties = observed psychophysically </font><span style=3D"font-family: Arial; = font-size: small;" class=3D"">(Stecker 2020, Assoc Res Otolaryngol = Abs 43)</span><span style=3D"font-family: Arial; font-size: small;" = class=3D"">. </span></div><div class=3D""><span style=3D"font-family:= Arial; font-size: small;" class=3D""><br class=3D""></span></div><div = class=3D""><font face=3D"Arial" size=3D"2" class=3D"">A feature of the = data which is sometimes overlooked is the apparent refractory = nature of this =E2=80=9Csampling=E2=80=9D process. New samples, = or =E2=80=9Consets=E2=80=9D can occur in succession, but not much = more quickly than 200-300 times per second (3-5 ms). Above that rate = (e.g. for rapid paired pulses, =E2=80=9Csteady=E2=80=9D tones, etc.) = binaural processing is confined to the overall onset. This rate = limitation itself defines what counts as an =E2=80=9Conset=E2=80= =9D for binaural processing: below the critical rate, successive = events each contribute roughly equally and independently to = spatial perception. </font></div><div class=3D""><br = class=3D""></div><div class=3D"">What does this have to do with spatial = cue representation at low sampling rates? Many of the mentions in this = thread quite rightly invoke linear systems theory to understand the = consequences of limiting bandwidth (i.e. due to slow sampling) on these = representations. Various tricks may be suggested to somewhat extend the = effective bandwidth (e.g. non-uniform sampling, etc.). I don=E2=80=99t = have much to add there, except to consider how the brain might do = it. </div><div class=3D""><br class=3D""></div><div class=3D"">In = my view, it is important to keep in mind that no mechanisms of the ear = or brain are, in fact, linear. Neuronal adaptation is highly nonlinear = and also temporally asymmetric. A consequence is dramatic = over-representation of rapid onset-like events=E2=80=93events that, in a = linear system, would imply very broad bandwidth. Thus, auditory = =E2=80=9Cchannels=E2=80=9D are capable of representations that = apparently exceed the narrow "bandwidth=E2=80=9D implied by their = cochlear-place selectivity. That notion seems absurd on its face, = because many of us have been trained to think about auditory function as = "quasi-linear=E2=80=9D (e.g. using terms like =E2=80=9Cauditory = filter=E2=80=9D to refer to neural pathways that are clearly not = filters). But in fact it should not be surprising based on the actual = physiology. </div><div class=3D""><br class=3D""></div><div = class=3D"">This has clear consequences for loads of phenomena in = binaural and spatial hearing: precedence, binaural adaptation, jitter in = CI pulse timing, =E2=80=9Cstraightness=E2=80=9D, etc. <span = style=3D"font-family: Arial; font-size: small;" class=3D"">(Stecker, = Dietz, and Stern 2019(A), JASA 145:1759). </span></div><div = class=3D""><br class=3D""></div><div class=3D"">Thank you for your = attention, and for the interesting discussion! </div><div = class=3D""><br class=3D""></div><div class=3D"">-Chris</div><div = class=3D""><br class=3D""></div><div class=3D""><br class=3D""></div><div = class=3D""><br class=3D""><div class=3D""> <div style=3D"color: rgb(0, 0, 0); font-family: Helvetica; font-size: = 12px; font-style: normal; font-variant-caps: normal; font-weight: = normal; letter-spacing: normal; text-align: start; text-indent: 0px; = text-transform: none; white-space: normal; word-spacing: 0px; = -webkit-text-stroke-width: 0px; word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><div = style=3D"color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; = text-indent: 0px; text-transform: none; white-space: normal; = word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: = break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" = class=3D""><div style=3D"color: rgb(0, 0, 0); letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: = break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" = class=3D""><div style=3D"color: rgb(0, 0, 0); letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: = break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" = class=3D""><div style=3D"color: rgb(0, 0, 0); letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: = break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" = class=3D""><div style=3D"color: rgb(0, 0, 0); letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: = break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" = class=3D""><div style=3D"color: rgb(0, 0, 0); letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: = break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" = class=3D""><div class=3D""><br = class=3D"Apple-interchange-newline">=E2=80=94</div><div class=3D""><br = class=3D""></div><div class=3D"">G. Christopher Stecker, Ph.D., = F.A.S.A.</div><div class=3D""><br class=3D""></div><div = class=3D"">Director, Spatial Hearing Lab</div><div class=3D"">Director, = Research Technology</div><div class=3D"">Boys Town National Research = Hospital</div><div class=3D""><br class=3D""></div><div = class=3D""><div>Coordinating Editor, Psychological and Physiological = Acoustics</div><div>Journal of the Acoustical Society of = America</div></div><div><br class=3D""></div><div><br = class=3D""></div><div class=3D""><a = href=3D"mailto:cstecker@xxxxxxxx" = class=3D"">cstecker@xxxxxxxx</a></div><div class=3D""><a = href=3D"http://www.spatialhearing.org" = class=3D"">www.spatialhearing.org</a></div></div><br = class=3D"Apple-interchange-newline"></div><br = class=3D"Apple-interchange-newline"></div><br = class=3D"Apple-interchange-newline"></div><br = class=3D"Apple-interchange-newline"></div><br = class=3D"Apple-interchange-newline"></div><br = class=3D"Apple-interchange-newline"></div><br = class=3D"Apple-interchange-newline" style=3D"color: rgb(0, 0, 0); = font-family: Helvetica; font-size: 12px; font-style: normal; = font-variant-caps: normal; font-weight: normal; letter-spacing: normal; = text-align: start; text-indent: 0px; text-transform: none; white-space: = normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br = class=3D"Apple-interchange-newline"> </div> <br class=3D""><div><blockquote type=3D"cite" class=3D""><div = class=3D"">On Aug 15, 2022, at 3:23 AM, Prof Leslie Smith <<a = href=3D"mailto:l.s.smith@xxxxxxxx" = class=3D"">l.s.smith@xxxxxxxx</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div class=3D"">Dear = all:<br class=3D""><br class=3D"">Some years ago, I worked on using = sound at onsets for calculating source<br class=3D"">direction in = reverberant environments [1]. It's kind-of obvious, because<br = class=3D"">after the onset, the sound at the ear/microphone is made up = of energy both<br class=3D"">from the source and from reflections.<br = class=3D""><br class=3D"">Sampling rates are normally constant, and = techniques for compression are<br class=3D"">aimed at recreating the = percept of the original sound: I am under the<br class=3D"">impression = that this doesn't extend to the percept of precise location of<br = class=3D"">the sound. Perhaps we need novel compression/decompression =  techniques<br class=3D"">that include the relevant data for source = location.<br class=3D""><br class=3D"">[1] L.S. Smith, S. Collins = Determining ITDs using two microphones on a<br class=3D"">flat panel = during onset intervals with a biologically inspired spike based<br = class=3D"">technique<br class=3D"">IEEE Transactions of Audio, Speech = and Language Processing, 15, 8,<br class=3D"">2278-2286, (2007).<br = class=3D""><br class=3D"">--Leslie Smith<br class=3D""><br class=3D"">Adam= Weisser wrote:<br class=3D""><br class=3D""><blockquote type=3D"cite" = class=3D"">1. Compressed sensing - This heavily researched = signal-processing method<br class=3D"">uses signal sparsity to = faithfully reconstruct undersampled signals [1].<br class=3D""><br = class=3D""></blockquote>.....<br class=3D""><blockquote type=3D"cite" = class=3D"">Neural adaptation can be thought of as dense<br = class=3D"">sampling of the signal around its onset / transient portion, = which becomes<br class=3D"">more sparsely sampled quickly after the = onset. Because of adaptation, this<br class=3D"">effect is very = illusive, but I believe that it is measurable<br = class=3D"">notwithstanding. I tried to demonstrate it psychoacoustically = in Appendix<br class=3D"">E of [4]. While I don't know how it relates to = binaural processing<br class=3D"">directly, there may be instantaneous = effects that may be detectable there<br class=3D"">too, given that the = input to both processing types is the same.<br class=3D""><br = class=3D"">All the best,<br class=3D"">Adam.<br class=3D""><br = class=3D""></blockquote>...<br class=3D""><br class=3D""><br class=3D"">--= <br class=3D"">Prof Leslie Smith (Emeritus)<br class=3D"">Computing = Science & Mathematics,<br class=3D"">University of Stirling, = Stirling FK9 4LA<br class=3D"">Scotland, UK<br class=3D"">Tel +44 1786 = 467435<br class=3D"">Web: <a href=3D"http://www.cs.stir.ac.uk/~lss" = class=3D"">http://www.cs.stir.ac.uk/~lss</a><br class=3D"">Blog: <a = href=3D"http://lestheprof.com" class=3D"">http://lestheprof.com</a><br = class=3D""></div></div></blockquote></div><br = class=3D""></div></body></html>= --Apple-Mail=_434EBE6B-CCF8-49AD-9DC0-92A2E3581F49--

This message came from the mail archive
src/postings/2022/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University