Re: [AUDITORY] Gammatone filter bank in MATLABr2019a (Volker Hohmann )

Subject: Re: [AUDITORY] Gammatone filter bank in MATLABr2019a From: Volker Hohmann <volker.hohmann@xxxxxxxx> Date: Thu, 18 Apr 2019 07:13:53 +0200 List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY> Dear Les and all, this is my (not complete) list: https://engineering.purdue.edu/~malcolm/interval/1998-010/ https://code.soundsoftware.ac.uk/projects/aimmat http://legacy.spa.aalto.fi/software/HUTear/ http://amtoolbox.sourceforge.net/ https://uol.de/en/mediphysics/downloads/ (now also at https://doi.org/10.5281/zenodo.2643400) Best, Volker On 17.04.2019 15:10, Bernstein,Leslie wrote: > Thanks, Volker.=A0 A link to those implementations would be very helpfu= l. >=20 > Les >=20 > On 4/17/2019 4:58 AM, Volker Hohmann wrote: >> Dear Dick and all, >> >> just want to add that the re-synthesis method they apply is not optima= l. >> I would recommend using the Matlab implementations contributed by our >> community, which have been described properly in citable publications, >> are readily available and have been running flawlessly for many years >> under whatever Matlab version came out. >> >> Best regards, >> >> Volker >> >> On 17.04.2019 02:51, Richard F. Lyon wrote: >>> Bastian, >>> >>> That's an interesting distinction that needs to be made, between the >>> peripheral and "whole system" auditory filter, whether gammatone or >>> otherwise.=A0 In my book, I say this about that (in Part III =96 The >>> Auditory Periphery): >>> >>> 13.1 What Is an Auditory Filter? >>> The auditory filters that we consider here include both those >>> motivated by psychoacoustic experiments, such as detection of ton= es >>> in noise maskers, and those motivated by reproducing the observed >>> mechanical response of the basilar membrane or neural response of >>> the auditory nerve. One thesis of this work is that a single mode= l >>> can do a good job for both of these, and thereby provide a good >>> basis for a machine hearing system. Since there are several stage= s >>> of neural processing between the cochlea and our psychoacoustic >>> perceptions, it would not be surprising if the best parameters we= re >>> different between these types of models, but it seems likely that >>> the linear and nonlinear filtering due to the cochlea plays a >>> sufficient role in perception that we may find one set of paramet= ers >>> is adequate, at least for a range of machine hearing applications. >>> >>> >>> And to be fair, the gammatone was originally proposed as a model of f= rog >>> hearing physiology, and is widely used in cochlear models, even thoug= h >>> Patterson popularized it in the psychoacoustic domain. >>> >>> So the MathWorks ought to be more careful what they say.=A0 I'd have >>> several other quibbles with their docs (in the Audio Toolbox referenc= e >>> at https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__www.mathwor= ks.com_help_pdf-5Fdoc_audio_index.html&d=3DDwIFaQ&c=3DEZxp_D7cDnouwj5YEFH= gXuSKoUq2zVQZ_7Fw9yfotck&r=3D2Pw2GwelGcMR4953G-STHGpPJm2-pYYYSPmTwJk3sWM&= m=3DGHXIqZnxZ7ZjCjlEGmDuiQlnjJQizpHYy3weycRYNko&s=3DBE9euCO95AcdvV7T4r3Ko= b_OyFq4F1_v9-0p75nY_Ok&e=3D).=A0 >>> >>> Quibbles: >>> >>> 1. "The gammatoneFilterBank follows the algorithm described in [1] an= d >>> first proposed by [2]."=A0 [1] is Slaney's method, a simple filter ca= scade >>> based on analyzing the Laplace transform of the gammatone.=A0 [2] is >>> Patterson et al.'s "Complex Sounds and Auditory Images", a great pape= r >>> but it doesn't say one word about how to implement the gammatone (the= y >>> did have other implementation papers elsewhere, but not this method a= nd >>> not here). >>> >>> 2. Ref 2 says "the shape of the magnitude characteristic of the >>> gammatone filter is very similar to that of the roex(p) filter common= ly >>> used to represent the magnitude characteristic of the human auditory >>> filter."=A0 Mathworks says "The gammatone filter is similar to the ro= ex >>> filter derived from the notched-noise >>> experiment."=A0 A cursory look at more recent literature on auditory >>> filters, including Patterson's, would suggest omitting or at least >>> tempering this claim.=A0 See my book Chapter 13 or this paper: >>> https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__storage.google= apis.com_pub-2Dtools-2Dpublic-2Dpublication-2Ddata_pdf_36895.pdf&d=3DDwIF= aQ&c=3DEZxp_D7cDnouwj5YEFHgXuSKoUq2zVQZ_7Fw9yfotck&r=3D2Pw2GwelGcMR4953G-= STHGpPJm2-pYYYSPmTwJk3sWM&m=3DGHXIqZnxZ7ZjCjlEGmDuiQlnjJQizpHYy3weycRYNko= &s=3D_Jft13aI1rDz891VcgKid-OKGfUIm6NugFjoDEcj1lg&e=3D >>> >>> 3. Error where it says b =96=96 bandwidth, set to 1.019*erb2hz(fc).=A0= Either >>> the documentation is wrong, or the functionality is wrong.=A0 Hopeful= ly >>> the former. >>> >>> 4. The parameterization by only FrequencyRange, NumFilters, and >>> SampleRate is rather impoverished.=A0 It is not documented whether th= e >>> filters match the ERB bandwidth if some of these parameters are chang= ed, >>> or whether adjacent filters continue to cross over about 3 dB down; y= ou >>> can't have both, but you might want one or the other, and there's not >>> enough control to say what you want.=A0 With a few more parameters on= e >>> could do useful comparisons, tradeoffs, and tunings of filter numbers= , >>> orders, bandwidths, and phases for example.=A0 With just a few more o= ne >>> could include better auditory filter variants (that differ only in th= e >>> locations of the zeros of the cascaded second-order filters), includi= ng >>> APGF and OZGF. >>> >>> R2019a also adds gtcc (gammatone cepstral coefficients).=A0 Their >>> algorithm uses log(energy) before the DCT, instead of the cube root >>> proposed by the Shao et al. reference, which also uses a slightly >>> different acronym:=A0 GFCC (gammatone frequency cepstral coefficients= ).=A0 >>> Not clear why.=A0 The referenced paper did not really investigate whe= ther >>> their improvement over mfcc was due to the different frequency scale >>> (700 Hz=A0 mel vs 229 Hz ERB break point between linear and exponenti= al >>> spacing), or the filter shape (triangle vs gammatone), or the >>> nonlinearity (log vs cube root), or the domain of implementation >>> (frequency vs time). With the impoverished parameterizations of these >>> functions in the audio toolboxes, it's hard to further compare such >>> things (though the gtcc does allow some of that).=A0 The other gtcc r= ef >>> (Rabiner and Schafer) has nothing on gammatone or gtcc or gfcc. >>> >>> I could go on... >>> >>> Dick >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Apr 16, 2019 at 12:24 AM Bastian Epp >>> <000000a94eb56441-dmarc-request@xxxxxxxx >>> <mailto:000000a94eb56441-dmarc-request@xxxxxxxx>> wrote: >>> >>> Dear list, >>> >>> This morning I read through the release notes of MATLAB R2019a an= d was >>> happy to find that there was an implementation of a Gammatone fil= ter >>> bank included: >>> >>> "Gammatone Filter Bank: Mimic the human auditory system" >>> >>> With the reference to (among others): >>> >>> Glasberg, Brian R., and Brian CJ Moore. "Derivation of Auditory F= ilter >>> Shapes from Notched-Noise Data." Hearing=A0Research. Vol. 47. Iss= ue 1-2, >>> 1990, pp. 103 =96138. >>> >>> This made me quite happy because it is a proper description of wh= at >>> Gammatone filter banks most often are used for - to model the fre= quency >>> selectivity of the auditory system (as measured using psychoacous= tics). >>> >>> However, in the DOC page, they show a picture of the Basilar memb= rane >>> on top with the frequency response of the filter bank - suggestin= g that >>> there exists a 1:1 correspondance. >>> >>> Everybody needs a topic to grow old and grumpy on - mine is this:= =A0 >>> >>> From my point of view, this is only correct under the (overly str= ong?) >>> assumption that the cochlear is the only place in the auditory sy= stem >>> underlying the perceptually observed frequency selectivity. Measu= ring >>> "auditory filters" means to evaluate the auditory system as a >>> whole=A0(the concept of a "neuron" also only makes sense when bei= ng >>> embedded in its network). "Cochlear filters" are measured on/in t= he >>> cochlea .=A0 >>> >>> Besides the common critiques (linearity, coarse approximation of = the >>> actual "filter" shape, etc), the main problem in my point of view= is >>> that we teach students that we can "measure" the function of a >>> "subsystem" (the cochlea) using a method that assesses the functi= on of >>> the "whole" system. There are some data sets that suggest a stron= g >>> link, but the "tool" of psychoacoustics simply does not allow suc= h a >>> statement. >>> >>> Even though I like the working hypothesis "The brain exists to ke= ep the >>> cochlea warm", I think equating cochlear frequency selectivity wi= th >>> auditory filters (without explicitly stating the assumption that = no(!) >>> element along the auditory pathway modifies this frequency select= ivity) >>> is a point where we could=A0 be more careful to avoid misconcepti= ons and >>> overly strong conclusions. In most publications and books, this p= oint >>> is not explicitly wrong, but not as precise as it could be in my >>> opinion. >>> >>> I hope that someone from MATHWORKS follows this list and consider= s a >>> more careful description in the DOCs. I would also be happy to co= mpile >>> all the constructive arguments that people might have for/against= my >>> point of view. >>> >>> Have a great day everybody! >>> >>> BAstian >>> >>> >>> >>> >>> --=20 >>> Bastian Epp >>> Associate Professor >>> >>> DTU Healthtech=A0=A0=A0=A0 >>> ------------------------------------ >>> Technical University of Denmark >>> =D8rsteds Plads >>> Building 352, Room 118 >>> 2800 Kgs. Lyngby >>> Direct +45 45253953 >>> bepp@xxxxxxxx <mailto:bepp@xxxxxxxx> >>> https://urldefense.proofpoint.com/v2/url?u=3Dhttp-3A__www.dtu.dk_= english&d=3DDwIFaQ&c=3DEZxp_D7cDnouwj5YEFHgXuSKoUq2zVQZ_7Fw9yfotck&r=3D2P= w2GwelGcMR4953G-STHGpPJm2-pYYYSPmTwJk3sWM&m=3DGHXIqZnxZ7ZjCjlEGmDuiQlnjJQ= izpHYy3weycRYNko&s=3DUtZyeOWPT8vhvgDk4ouA5eLQ9REPci24KX0I7LjUw3s&e=3D >>> >=20 >=20 > --=20 > *Leslie R. Bernstein, Ph.D. **| *Professor > Depts. of Neuroscience and Surgery (Otolaryngology)| UConn School of > Medicine > 263 Farmington Avenue, Farmington, CT 06030-3401 > Office: 860.679.4622 | Fax: 860.679.2495 >=20 >=20 --=20 --------------------------------------------------------- Prof. Dr. Volker Hohmann Medizinische Physik and Cluster of Excellence Hearing4all Universit=E4t Oldenburg D-26111 Oldenburg Germany Tel. +49 441 798 5468 FAX +49 441 798 3902 Email volker.hohmann@xxxxxxxx http://www.uni-oldenburg.de/mediphysik/ http://www.uni-oldenburg.de/auditorische-signalverarbeitung/ Public Key and Key Fingerprint http://medi.uni-oldenburg.de/members/vh/pubkey_vh_uni.txt C75A 8A8D 9408 28EE FCFD 20CA 1D9F 23CC BAD2 B967 ---------------------------------------------------------

This message came from the mail archive
src/postings/2019/
maintained by:

DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University