Re: [AUDITORY] Gammatone filter bank in MATLABr2019a (John)


Subject: Re: [AUDITORY] Gammatone filter bank in MATLABr2019a
From:    John <"Beerends, J.G. ">
Date:    Tue, 21 May 2019 09:15:20 +0000
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--_000_e441bfc9267842d6b40c5071280cdff2tnonl_ Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Dear All, A point of discussion about the modelling of auditory masking effects. In m= y view masking is the results of 2 operations, time frequency smearing on a= mechanical level and time frequency inhibition at a neural level. If we tr= y to model masking by a filter bank we will never be able to model masking = correctly, even if we use a nonlinear filter approach where the slope of th= e filter depends on the level. In the development of POLQA (ITU standard th= at uses perceptual modelling to predict speech quality) we used a very prag= matic approach by using a smeared representation in the calculation of a th= e suppression factor that suppresses the loudness in neighboring time-frequ= ency cells in order to be able to model time time-frequency domain masking = more correctly (see section 2.7 with more details in the ITU C-code). http://www.aes.org/e-lib/browse.cfm?elib=3D16830 (open access) Regards, John Beerends TNO The Netherlands http://beesikk.nl/JohnBeerends.htm From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxx>= On Behalf Of Jihad Ibrahim Sent: maandag 20 mei 2019 18:25 To: AUDITORY@xxxxxxxx Subject: Re: Gammatone filter bank in MATLABr2019a Hi all, I am a developer in Audio Toolbox at MathWorks, and just wanted to let ever= yone know that we are capturing your comments regarding new R2019a releases= and really appreciate your feedback. It will take us some time to digest this feedback and convert it into user-= visible changes, but I thought I'd share a few notes in the meantime: * Regarding Bastian Epp's initial post, he is right to point out that t= he image might be misleading and interpreted to indicate an equivalence bet= ween the cochlea and the gammatone filter bank. We will aim to remove the i= mage of the basilar membrane in the next release to help avoid that incorre= ct interpretation. * Regarding Richard F. Lyon's post: The confusion here is due to an amb= iguously worded sentence. The gammatone filter bank implemented in Audio To= olbox followed the algorithm described in [1] (Slaney). [1] says the algori= thm is an implementation of an idea proposed by [2] (Patterson et al). [2] = is in general a good primer for understanding [1], which is why we thought = it was good to reference. We think we should reword this more carefully. * The formula stating that the bandwidth is 1.019*erb2hz(fc) does indee= d have a typo. We will fix this ASAP starting from the online documentation. * Regarding the limited parametrizations of the function(s): So far, Au= dio Toolbox has focused on providing simple and fast implementations of fea= ture extractors. The idea is to find a balance between an expert in auditor= y science and someone looking to build a machine learning or deep learning = application. That being said, if exposing more parameters would enable more= workflows, then we would definitely consider adding more options on the fu= nctions. We plan to investigate alternative options and we may try to reach= out to some of those who commented on this for additional feedback. * We agree that the cubic root is a very common implementation of GTCC= . We will investigate offering the option of using a cubic root in the nonl= inear rectification stage )along with the log option, which is used as well= ). Rabiner and Schafer are referenced because the computation of the deltas= is implemented based on Theory and Applications of Digital Speech Processi= ng. * Regarding Volker Hohmanns' note on the re-synthesis method being non-= optimal: The intention of the example was to showcase a straightforward and= simple usage of the object rather than demonstrate how to best achieve rec= onstruction. We agree that the showcased method is not optimal, and we will= reword the example to clarify this. We will also consider adding an optima= l reconstruction example based on Dr. Hohmanns' paper Regards, Jihad Ibrahim Developer, Audio Toolbox, MathWorks This message may contain information that is not intended for you. If you a= re not the addressee or if this message was sent to you by mistake, you are= requested to inform the sender and delete the message. TNO accepts no liab= ility for the content of this e-mail, for the manner in which you use it an= d for damage of any kind resulting from the risks inherent to the electroni= c transmission of messages. --_000_e441bfc9267842d6b40c5071280cdff2tnonl_ Content-Type: text/html; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable <html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr= osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" = xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:= //www.w3.org/TR/REC-html40"> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"> <meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)"> <style><!-- /* Font Definitions */ @xxxxxxxx {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0;} @xxxxxxxx {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @xxxxxxxx {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri",sans-serif;} a:link, span.MsoHyperlink {mso-style-priority:99; color:#0563C1; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:#954F72; text-decoration:underline;} p.msonormal0, li.msonormal0, div.msonormal0 {mso-style-name:msonormal; mso-margin-top-alt:auto; margin-right:0cm; mso-margin-bottom-alt:auto; margin-left:0cm; font-size:11.0pt; font-family:"Calibri",sans-serif;} span.EmailStyle18 {mso-style-type:personal; font-family:"Calibri",sans-serif; color:windowtext;} span.EmailStyle19 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:windowtext;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @xxxxxxxx WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 72.0pt 72.0pt 72.0pt;} div.WordSection1 {page:WordSection1;} /* List Definitions */ @xxxxxxxx l0 {mso-list-id:27924296; mso-list-type:hybrid; mso-list-template-ids:-796901762 67698689 67698691 67698693 67698689 67698= 691 67698693 67698689 67698691 67698693;} @xxxxxxxx l0:level1 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:Symbol;} @xxxxxxxx l0:level2 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:"Courier New";} @xxxxxxxx l0:level3 {mso-level-number-format:bullet; mso-level-text:\F0A7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:Wingdings;} @xxxxxxxx l0:level4 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:Symbol;} @xxxxxxxx l0:level5 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:"Courier New";} @xxxxxxxx l0:level6 {mso-level-number-format:bullet; mso-level-text:\F0A7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:Wingdings;} @xxxxxxxx l0:level7 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:Symbol;} @xxxxxxxx l0:level8 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:"Courier New";} @xxxxxxxx l0:level9 {mso-level-number-format:bullet; mso-level-text:\F0A7; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:Wingdings;} @xxxxxxxx l1 {mso-list-id:1930698854; mso-list-template-ids:1546266370;} @xxxxxxxx l1:level1 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:36.0pt; mso-level-number-position:left; text-indent:-18.0pt; mso-ansi-font-size:10.0pt; font-family:Symbol;} @xxxxxxxx l1:level2 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:72.0pt; mso-level-number-position:left; text-indent:-18.0pt; mso-ansi-font-size:10.0pt; font-family:Symbol;} @xxxxxxxx l1:level3 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:108.0pt; mso-level-number-position:left; text-indent:-18.0pt; mso-ansi-font-size:10.0pt; font-family:Symbol;} @xxxxxxxx l1:level4 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:144.0pt; mso-level-number-position:left; text-indent:-18.0pt; mso-ansi-font-size:10.0pt; font-family:Symbol;} @xxxxxxxx l1:level5 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:180.0pt; mso-level-number-position:left; text-indent:-18.0pt; mso-ansi-font-size:10.0pt; font-family:Symbol;} @xxxxxxxx l1:level6 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:216.0pt; mso-level-number-position:left; text-indent:-18.0pt; mso-ansi-font-size:10.0pt; font-family:Symbol;} @xxxxxxxx l1:level7 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:252.0pt; mso-level-number-position:left; text-indent:-18.0pt; mso-ansi-font-size:10.0pt; font-family:Symbol;} @xxxxxxxx l1:level8 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:288.0pt; mso-level-number-position:left; text-indent:-18.0pt; mso-ansi-font-size:10.0pt; font-family:Symbol;} @xxxxxxxx l1:level9 {mso-level-number-format:bullet; mso-level-text:\F0B7; mso-level-tab-stop:324.0pt; mso-level-number-position:left; text-indent:-18.0pt; mso-ansi-font-size:10.0pt; font-family:Symbol;} ol {margin-bottom:0cm;} ul {margin-bottom:0cm;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--> </head> <body lang=3D"NL" link=3D"#0563C1" vlink=3D"#954F72"> <div class=3D"WordSection1"> <p class=3D"MsoNormal"><span style=3D"mso-fareast-language:EN-US">Dear All,= <o:p></o:p></span></p> <p class=3D"MsoNormal"><span style=3D"mso-fareast-language:EN-US"><o:p>&nbs= p;</o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US">A point of discussion about the modelling of auditory masking effects= . In my view masking is the results of 2 operations, time frequency smearin= g on a mechanical level and time frequency inhibition at a neural level. If we try to model masking by a filter bank = we will never be able to model masking correctly, even if we use a nonlinea= r filter approach where the slope of the filter depends on the level. In th= e development of POLQA (ITU standard that uses perceptual modelling to predict speech quality) we used a very p= ragmatic approach by using a smeared representation in the calculation of a= the suppression factor that suppresses the loudness in neighboring time-fr= equency cells in order to be able to model time time-frequency domain masking more correctly (see section 2.= 7 with more details in the ITU C-code).<o:p></o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US"><a href=3D"http://www.aes.org/e-lib/browse.cfm?elib=3D16830">http://w= ww.aes.org/e-lib/browse.cfm?elib=3D16830</a><span style=3D"color:black">&nb= sp; (open access)</span><o:p></o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US"><o:p>&nbsp;</o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US">Regards,<o:p></o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US">John Beerends<o:p></o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US">TNO <o:p> </o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US">The Netherlands<o:p></o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US"><a href=3D"http://beesikk.nl/JohnBeerends.htm">http://beesikk.nl/John= Beerends.htm</a><o:p></o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US"><o:p>&nbsp;</o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US"><o:p>&nbsp;</o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US"><o:p>&nbsp;</o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"mso-fareast-language:E= N-US"><o:p>&nbsp;</o:p></span></p> <div> <div style=3D"border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm = 0cm 0cm"> <p class=3D"MsoNormal"><b><span lang=3D"EN-US">From:</span></b><span lang= =3D"EN-US"> AUDITORY - Research in Auditory Perception &lt;AUDITORY@xxxxxxxx= CGILL.CA&gt; <b>On Behalf Of </b>Jihad Ibrahim<br> <b>Sent:</b> maandag 20 mei 2019 18:25<br> <b>To:</b> AUDITORY@xxxxxxxx<br> <b>Subject:</b> Re: Gammatone filter bank in MATLABr2019a<o:p></o:p></span>= </p> </div> </div> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">Hi all,<o:p></o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US"><o:p>&nbsp;</o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">I am a developer in Audio Toolb= ox at MathWorks, and just wanted to let everyone know that we are capturing= your comments regarding new R2019a releases and really appreciate your fee= dback. <o:p></o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US"><o:p>&nbsp;</o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">It will take us some time to di= gest this feedback and convert it into user-visible changes, but I thought = I&#8217;d share a few notes in the meantime:<o:p></o:p></span></p> <ul style=3D"margin-top:0cm" type=3D"disc"> <li class=3D"MsoNormal" style=3D"color:black;mso-list:l0 level1 lfo3"><span= lang=3D"EN-US" style=3D"color:windowtext">Regarding Bastian Epp&#8217;s in= itial post, he is right to point out that the image </span><span lang=3D"EN-US">might be misleading and interpreted to indicate= an equivalence between the cochlea and the gammatone filter bank. We will = aim to remove the image of the basilar membrane in the next release to help= avoid that incorrect interpretation.<o:p></o:p></span></li><li class=3D"Ms= oNormal" style=3D"color:black;mso-list:l0 level1 lfo3"><span lang=3D"EN-US"= >Regarding Richard F. Lyon&#8217;s post: The confusion here is due to an am= biguously worded sentence. The gammatone filter bank implemented in Audio T= oolbox followed the algorithm described in [1] (Slaney). [1] says the algorithm is an implementation of = an idea proposed by [2] (Patterson et al). [2] is in general a good primer = for understanding [1], which is why we thought it was good to reference. We= think we should reword this more carefully.<o:p></o:p></span></li><li class=3D"MsoNormal" style=3D"mso-list= :l0 level1 lfo3"><span lang=3D"EN-US">The formula stating that the bandwidt= h is <span style=3D"color:black">1.019*erb2hz(fc) does indeed have a typo. We wi= ll fix this ASAP starting from the online documentation. </span><o:p></o:p></span></li><li class=3D"MsoNormal" style=3D"mso-list:l0 = level1 lfo3"><span lang=3D"EN-US">Regarding the limited p<span style=3D"col= or:black">arametrizations of the function(s): So far, Audio Toolbox has foc= used on providing simple and fast implementations of feature extractors. The idea is to find a balance between an expert in auditory science and so= meone looking to build a machine learning or deep learning application. Tha= t being said, if exposing more parameters would enable more workflows, then= we would definitely consider adding more options on the functions. We plan to investigate alternative options = and we may try to reach out to some of those who commented on this for addi= tional feedback</span></span><span lang=3D"EN-GB" style=3D"color:black">. </span><span lang=3D"EN-US"><o:p></o:p></span></li><li class=3D"MsoNormal" = style=3D"color:black;mso-list:l0 level1 lfo3"><span lang=3D"EN-US">&nbsp;We= agree that the cubic root is a very common implementation of GTCC. We will= investigate offering the option of using a cubic root in the nonlinear rec= tification stage )along with the log option, which is used as well). Rabiner and Schafer are refer= enced because the computation of the deltas is implemented based on </span><span lang=3D"EN-US" style=3D"color:#404040;background:white">Theory= and Applications of Digital Speech Processing.</span><span lang=3D"EN-US">= <o:p></o:p></span></li><li class=3D"MsoNormal" style=3D"color:black;mso-lis= t:l0 level1 lfo3"><span lang=3D"EN-US" style=3D"color:windowtext">Regarding </span><span lang=3D"EN-US">Volker Hohmanns&#8217; note on the re-synthesis= method being non-optimal: The intention of the example was to showcase a s= traightforward and simple usage of the object rather than demonstrate how t= o best achieve reconstruction. We agree that the showcased method is not optimal, and we will reword the example t= o clarify this. We will also consider adding an optimal reconstruction exam= ple based on Dr. Hohmanns&#8217; paper<o:p></o:p></span></li></ul> <p class=3D"MsoNormal"><span lang=3D"EN-US"><o:p>&nbsp;</o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US"><o:p>&nbsp;</o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">Regards,<o:p></o:p></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">Jihad Ibrahim<o:p></o:p></span>= </p> <p class=3D"MsoNormal"><span lang=3D"EN-US">Developer, Audio Toolbox, MathW= orks<o:p></o:p></span></p> </div> <p style=3D"MARGIN: 0cm 0cm 0pt" class=3D"MsoNormal"><span style=3D"FONT-FA= MILY: 'Arial','sans-serif'; FONT-SIZE: 8pt; mso-bidi-font-family: 'Times Ne= w Roman'; mso-bidi-font-size: 11.0pt"><o:p>&nbsp;</o:p></span></p><font sty= le=3D"FONT-SIZE: 11px" size=3D"3"> </font><p style=3D"MARGIN: 0cm 0cm 0pt" class=3D"MsoNormal"><font style=3D"= FONT-SIZE: 11px" size=3D"3"><span style=3D"FONT-FAMILY: 'Arial','sans-serif= '; FONT-SIZE: 8pt; mso-bidi-font-size: 8.5pt">This message may contain info= rmation that is not intended for you. If you are not the addressee or if th= is message was sent to you by mistake, you are requested to inform the send= er and delete the message. TNO accepts no liability for the content of this= e-mail, for the manner in which you use it and for damage of any kind resu= lting from the risks inherent to the electronic transmission of messages.<b= r><br></span></font></p></body> </html> --_000_e441bfc9267842d6b40c5071280cdff2tnonl_--


This message came from the mail archive
src/postings/2019/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University