Hi all,
I am a developer in Audio Toolbox at MathWorks, and just wanted to let everyone know that we are capturing your comments regarding new R2019a releases and really appreciate your feedback.
It will take us some time to digest this feedback and convert it into user-visible changes, but I thought I’d share a few notes in the meantime:
- Regarding Bastian Epp’s initial post, he is right to point out that the image
might be misleading and interpreted to indicate an equivalence between the cochlea and the gammatone filter bank. We will aim to remove the image of the basilar membrane in the next release to help avoid that incorrect interpretation.
- Regarding Richard F. Lyon’s post: The confusion here is due to an ambiguously worded sentence. The gammatone filter bank implemented in Audio Toolbox followed the algorithm described in [1] (Slaney).
[1] says the algorithm is an implementation of an idea proposed by [2] (Patterson et al). [2] is in general a good primer for understanding [1], which is why we thought it was good to reference. We think we should reword this more carefully.
- The formula stating that the bandwidth is
1.019*erb2hz(fc) does indeed have a typo. We will fix this ASAP starting from the online documentation.
- Regarding the limited parametrizations of the function(s): So far, Audio Toolbox has focused on providing simple and fast implementations of feature extractors. The idea is to find
a balance between an expert in auditory science and someone looking to build a machine learning or deep learning application. That being said, if exposing more parameters would enable more workflows, then we would definitely consider adding more options on
the functions. We plan to investigate alternative options and we may try to reach out to some of those who commented on this for additional feedback.
- We agree that the cubic root is a very common implementation of GTCC. We will investigate offering the option of using a cubic root in the nonlinear rectification stage )along with the log option,
which is used as well). Rabiner and Schafer are referenced because the computation of the deltas is implemented based on
Theory and Applications of Digital Speech Processing.
- Regarding
Volker Hohmanns’ note on the re-synthesis method being non-optimal: The intention of the example was to showcase a straightforward and simple usage of the object rather than demonstrate how to best achieve reconstruction. We agree that the showcased
method is not optimal, and we will reword the example to clarify this. We will also consider adding an optimal reconstruction example based on Dr. Hohmanns’ paper
Regards,
Jihad Ibrahim
Developer, Audio Toolbox, MathWorks