[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] Gammatone filter bank in MATLABr2019a



Dear All,

 

A point of discussion about the modelling of auditory masking effects. In my view masking is the results of 2 operations, time frequency smearing on a mechanical level and time frequency inhibition at a neural level. If we try to model masking by a filter bank we will never be able to model masking correctly, even if we use a nonlinear filter approach where the slope of the filter depends on the level. In the development of POLQA (ITU standard that uses perceptual modelling to predict speech quality) we used a very pragmatic approach by using a smeared representation in the calculation of a the suppression factor that suppresses the loudness in neighboring time-frequency cells in order to be able to model time time-frequency domain masking more correctly (see section 2.7 with more details in the ITU C-code).

http://www.aes.org/e-lib/browse.cfm?elib=16830  (open access)

 

Regards,

John Beerends

TNO

The Netherlands

http://beesikk.nl/JohnBeerends.htm

 

 

 

 

From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx> On Behalf Of Jihad Ibrahim
Sent: maandag 20 mei 2019 18:25
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: Gammatone filter bank in MATLABr2019a

 

Hi all,

 

I am a developer in Audio Toolbox at MathWorks, and just wanted to let everyone know that we are capturing your comments regarding new R2019a releases and really appreciate your feedback.

 

It will take us some time to digest this feedback and convert it into user-visible changes, but I thought I’d share a few notes in the meantime:

  • Regarding Bastian Epp’s initial post, he is right to point out that the image might be misleading and interpreted to indicate an equivalence between the cochlea and the gammatone filter bank. We will aim to remove the image of the basilar membrane in the next release to help avoid that incorrect interpretation.
  • Regarding Richard F. Lyon’s post: The confusion here is due to an ambiguously worded sentence. The gammatone filter bank implemented in Audio Toolbox followed the algorithm described in [1] (Slaney). [1] says the algorithm is an implementation of an idea proposed by [2] (Patterson et al). [2] is in general a good primer for understanding [1], which is why we thought it was good to reference. We think we should reword this more carefully.
  • The formula stating that the bandwidth is 1.019*erb2hz(fc) does indeed have a typo. We will fix this ASAP starting from the online documentation.
  • Regarding the limited parametrizations of the function(s): So far, Audio Toolbox has focused on providing simple and fast implementations of feature extractors. The idea is to find a balance between an expert in auditory science and someone looking to build a machine learning or deep learning application. That being said, if exposing more parameters would enable more workflows, then we would definitely consider adding more options on the functions. We plan to investigate alternative options and we may try to reach out to some of those who commented on this for additional feedback.
  •  We agree that the cubic root is a very common implementation of GTCC. We will investigate offering the option of using a cubic root in the nonlinear rectification stage )along with the log option, which is used as well). Rabiner and Schafer are referenced because the computation of the deltas is implemented based on Theory and Applications of Digital Speech Processing.
  • Regarding Volker Hohmanns’ note on the re-synthesis method being non-optimal: The intention of the example was to showcase a straightforward and simple usage of the object rather than demonstrate how to best achieve reconstruction. We agree that the showcased method is not optimal, and we will reword the example to clarify this. We will also consider adding an optimal reconstruction example based on Dr. Hohmanns’ paper

 

 

Regards,

Jihad Ibrahim

Developer, Audio Toolbox, MathWorks

 

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.