[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] Gammatone filter bank in MATLABr2019a



Thanks, Volker.  A link to those implementations would be very helpful.

Les

On 4/17/2019 4:58 AM, Volker Hohmann wrote:
Dear Dick and all,

just want to add that the re-synthesis method they apply is not optimal.
I would recommend using the Matlab implementations contributed by our
community, which have been described properly in citable publications,
are readily available and have been running flawlessly for many years
under whatever Matlab version came out.

Best regards,

Volker

On 17.04.2019 02:51, Richard F. Lyon wrote:
Bastian,

That's an interesting distinction that needs to be made, between the
peripheral and "whole system" auditory filter, whether gammatone or
otherwise.  In my book, I say this about that (in Part III – The
Auditory Periphery):

    13.1 What Is an Auditory Filter?
    The auditory filters that we consider here include both those
    motivated by psychoacoustic experiments, such as detection of tones
    in noise maskers, and those motivated by reproducing the observed
    mechanical response of the basilar membrane or neural response of
    the auditory nerve. One thesis of this work is that a single model
    can do a good job for both of these, and thereby provide a good
    basis for a machine hearing system. Since there are several stages
    of neural processing between the cochlea and our psychoacoustic
    perceptions, it would not be surprising if the best parameters were
    different between these types of models, but it seems likely that
    the linear and nonlinear filtering due to the cochlea plays a
    sufficient role in perception that we may find one set of parameters
    is adequate, at least for a range of machine hearing applications.


And to be fair, the gammatone was originally proposed as a model of frog
hearing physiology, and is widely used in cochlear models, even though
Patterson popularized it in the psychoacoustic domain.

So the MathWorks ought to be more careful what they say.  I'd have
several other quibbles with their docs (in the Audio Toolbox reference
at https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mathworks.com_help_pdf-5Fdoc_audio_index.html&d=DwIFaQ&c=EZxp_D7cDnouwj5YEFHgXuSKoUq2zVQZ_7Fw9yfotck&r=2Pw2GwelGcMR4953G-STHGpPJm2-pYYYSPmTwJk3sWM&m=GHXIqZnxZ7ZjCjlEGmDuiQlnjJQizpHYy3weycRYNko&s=BE9euCO95AcdvV7T4r3Kob_OyFq4F1_v9-0p75nY_Ok&e=). 

Quibbles:

1. "The gammatoneFilterBank follows the algorithm described in [1] and
first proposed by [2]."  [1] is Slaney's method, a simple filter cascade
based on analyzing the Laplace transform of the gammatone.  [2] is
Patterson et al.'s "Complex Sounds and Auditory Images", a great paper
but it doesn't say one word about how to implement the gammatone (they
did have other implementation papers elsewhere, but not this method and
not here).

2. Ref 2 says "the shape of the magnitude characteristic of the
gammatone filter is very similar to that of the roex(p) filter commonly
used to represent the magnitude characteristic of the human auditory
filter."  Mathworks says "The gammatone filter is similar to the roex
filter derived from the notched-noise
experiment."  A cursory look at more recent literature on auditory
filters, including Patterson's, would suggest omitting or at least
tempering this claim.  See my book Chapter 13 or this paper:
https://urldefense.proofpoint.com/v2/url?u=https-3A__storage.googleapis.com_pub-2Dtools-2Dpublic-2Dpublication-2Ddata_pdf_36895.pdf&d=DwIFaQ&c=EZxp_D7cDnouwj5YEFHgXuSKoUq2zVQZ_7Fw9yfotck&r=2Pw2GwelGcMR4953G-STHGpPJm2-pYYYSPmTwJk3sWM&m=GHXIqZnxZ7ZjCjlEGmDuiQlnjJQizpHYy3weycRYNko&s=_Jft13aI1rDz891VcgKid-OKGfUIm6NugFjoDEcj1lg&e=

3. Error where it says b –– bandwidth, set to 1.019*erb2hz(fc).  Either
the documentation is wrong, or the functionality is wrong.  Hopefully
the former.

4. The parameterization by only FrequencyRange, NumFilters, and
SampleRate is rather impoverished.  It is not documented whether the
filters match the ERB bandwidth if some of these parameters are changed,
or whether adjacent filters continue to cross over about 3 dB down; you
can't have both, but you might want one or the other, and there's not
enough control to say what you want.  With a few more parameters one
could do useful comparisons, tradeoffs, and tunings of filter numbers,
orders, bandwidths, and phases for example.  With just a few more one
could include better auditory filter variants (that differ only in the
locations of the zeros of the cascaded second-order filters), including
APGF and OZGF.

R2019a also adds gtcc (gammatone cepstral coefficients).  Their
algorithm uses log(energy) before the DCT, instead of the cube root
proposed by the Shao et al. reference, which also uses a slightly
different acronym:  GFCC (gammatone frequency cepstral coefficients). 
Not clear why.  The referenced paper did not really investigate whether
their improvement over mfcc was due to the different frequency scale
(700 Hz  mel vs 229 Hz ERB break point between linear and exponential
spacing), or the filter shape (triangle vs gammatone), or the
nonlinearity (log vs cube root), or the domain of implementation
(frequency vs time). With the impoverished parameterizations of these
functions in the audio toolboxes, it's hard to further compare such
things (though the gtcc does allow some of that).  The other gtcc ref
(Rabiner and Schafer) has nothing on gammatone or gtcc or gfcc.

I could go on...

Dick









On Tue, Apr 16, 2019 at 12:24 AM Bastian Epp
<000000a94eb56441-dmarc-request@xxxxxxxxxxxxxxx
<mailto:000000a94eb56441-dmarc-request@xxxxxxxxxxxxxxx>> wrote:

    Dear list,

    This morning I read through the release notes of MATLAB R2019a and was
    happy to find that there was an implementation of a Gammatone filter
    bank included:

    "Gammatone Filter Bank: Mimic the human auditory system"

    With the reference to (among others):

    Glasberg, Brian R., and Brian CJ Moore. "Derivation of Auditory Filter
    Shapes from Notched-Noise Data." Hearing Research. Vol. 47. Issue 1-2,
    1990, pp. 103 –138.

    This made me quite happy because it is a proper description of what
    Gammatone filter banks most often are used for - to model the frequency
    selectivity of the auditory system (as measured using psychoacoustics).

    However, in the DOC page, they show a picture of the Basilar membrane
    on top with the frequency response of the filter bank - suggesting that
    there exists a 1:1 correspondance.

    Everybody needs a topic to grow old and grumpy on - mine is this: 

    From my point of view, this is only correct under the (overly strong?)
    assumption that the cochlear is the only place in the auditory system
    underlying the perceptually observed frequency selectivity. Measuring
    "auditory filters" means to evaluate the auditory system as a
    whole (the concept of a "neuron" also only makes sense when being
    embedded in its network). "Cochlear filters" are measured on/in the
    cochlea . 

    Besides the common critiques (linearity, coarse approximation of the
    actual "filter" shape, etc), the main problem in my point of view is
    that we teach students that we can "measure" the function of a
    "subsystem" (the cochlea) using a method that assesses the function of
    the "whole" system. There are some data sets that suggest a strong
    link, but the "tool" of psychoacoustics simply does not allow such a
    statement.

    Even though I like the working hypothesis "The brain exists to keep the
    cochlea warm", I think equating cochlear frequency selectivity with
    auditory filters (without explicitly stating the assumption that no(!)
    element along the auditory pathway modifies this frequency selectivity)
    is a point where we could  be more careful to avoid misconceptions and
    overly strong conclusions. In most publications and books, this point
    is not explicitly wrong, but not as precise as it could be in my
    opinion.

    I hope that someone from MATHWORKS follows this list and considers a
    more careful description in the DOCs. I would also be happy to compile
    all the constructive arguments that people might have for/against my
    point of view.

    Have a great day everybody!

    BAstian




    -- 
    Bastian Epp
    Associate Professor

    DTU Healthtech    
    ------------------------------------
    Technical University of Denmark
    Ørsteds Plads
    Building 352, Room 118
    2800 Kgs. Lyngby
    Direct +45 45253953
    bepp@xxxxxx <mailto:bepp@xxxxxx>
    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.dtu.dk_english&d=DwIFaQ&c=EZxp_D7cDnouwj5YEFHgXuSKoUq2zVQZ_7Fw9yfotck&r=2Pw2GwelGcMR4953G-STHGpPJm2-pYYYSPmTwJk3sWM&m=GHXIqZnxZ7ZjCjlEGmDuiQlnjJQizpHYy3weycRYNko&s=UtZyeOWPT8vhvgDk4ouA5eLQ9REPci24KX0I7LjUw3s&e=




--
Leslie R. Bernstein, Ph.D. | Professor
Depts. of Neuroscience and Surgery (Otolaryngology)| UConn School of Medicine

263 Farmington Avenue, Farmington, CT 06030-3401
Office: 860.679.4622 | Fax: 860.679.2495