Re: [AUDITORY] converting masking thresholds to masker levels of speech sounds (James Johnston )


Subject: Re: [AUDITORY] converting masking thresholds to masker levels of speech sounds
From:    James Johnston  <audioskeptic@xxxxxxxx>
Date:    Wed, 29 Jan 2020 22:21:20 -0800
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--000000000000ca7cae059d55771b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable If I may (as the author of Psychoacoustic Model 2) stick a word in here. 1) if you know what is voiced and unvoiced, use 15.5 dB SNR required for voiced and 5.5dB for unvoiced SNR rather than do all the tonality calculations. Pitch strength will probably carry you through, here. (0 -> no pitch) 1-> all pitch, no noise) if you have a representation of that. 2) I would shorten the stride of the calculation and do it on a millisecond by millisecond basis. 3) Proprietary issues prevent me explaining how to do this a lot better, sorry! jj On Wed, Jan 29, 2020 at 9:17 PM Frederico Pereira < pereira.frederico@xxxxxxxx> wrote: > Hi Mengli, > No I haven=C2=B4t tried other models nor did I account for temporal model= s at > this point. Having temporal effects integrated in the psycouacoustic mode= l > would be something quite unique from what I=C2=B4ve seen from existing ro= utines > yes! > > > regards, > > Frederico > > On Wed, Jan 29, 2020 at 1:08 PM Feng, Mengli (2018) < > Mengli.Feng.2018@xxxxxxxx> wrote: > >> Hi Frederico, >> >> Thanks very much for the code! >> >> I did the same thing using ISO psychoacoustic model 2. Was thinking abou= t >> using models to account for temporal effect. Have you tried more advance= d >> auditory models? >> >> Best wishes, >> Mengli >> >> -- >> Mengli Feng >> PhD Student >> PGR Collective EPMS School Convenor >> >> Audio, Biosignals and Machine Learning Group >> Department of Electronic Engineering >> Royal Holloway, University of London >> >> Research Interest: >> Speech/ voice production and perception >> Ongoing Project: >> the perceptual effect of Bone-conducted sound of own voice >> >>> Pure Page >> >> ------------------------------ >> *From:* Frederico Pereira <pereira.frederico@xxxxxxxx> >> *Sent:* Wednesday, January 29, 2020 12:15 pm >> *To:* Feng, Mengli (2018) >> *Cc:* AUDITORY@xxxxxxxx >> *Subject:* Re: [AUDITORY] converting masking thresholds to masker levels >> of speech sounds >> >> Hi Mengli, >> >> I=C2=B4m currently working on something similar and I=C2=B4ve been devel= oping on >> top of the code and psychoacoustic models based on: >> *ISO/IEC 11172-3:1993, Information technology =E2=80=93 Coding of moving= pictures >> and associated audio for digital storage media at up to about 1,5 Mbit/s= =E2=80=93 >> Part 3: Audio* >> >> https://ieeexplore.ieee.org/abstract/document/1296956 >> and Matlab code provided by: >> https://www.petitcolas.net/fabien/software/mpeg/#references >> >> Hoping this is of some help to you. >> >> regards, >> >> Frederico >> >> On Tue, Jan 28, 2020 at 5:19 AM Feng, Mengli (2018) < >> Mengli.Feng.2018@xxxxxxxx> wrote: >> >>> Dear All, >>> >>> I am trying to convert masking curves into the frequency responses of >>> the original maskers (single speech sounds). The maskees I am using are >>> narrow band noises at different frequencies. >>> >>> It has taken me enormous effort to find an auditory model to make >>> accurate predictions, considering the maskers are complex tones with >>> multiple harmonics in high frequency region. Might anyone provide some >>> guidance or advice on finding a suitable model? >>> >>> Is it even possible to do such prediction knowing only the frequency >>> responses of the maskees and the masking thresholds given that temporal >>> effects would inevitably appear because of the higher harmonics in huma= n >>> speech sounds? Any opinions? >>> >>> Any suggestion would be greatly appreciated! >>> >>> Best Regards, >>> Mengli >>> >>> >>> -- >>> Mengli Feng >>> PhD Student >>> PGR Collective EPMS School Convenor >>> >>> Audio, Biosignals and Machine Learning Group >>> Department of Electronic Engineering >>> Royal Holloway, University of London >>> >>> Research Interest: >>> Speech/ voice production and perception >>> Ongoing Project: >>> the perceptual effect of Bone-conducted sound of own voice >>> >>> Pure Page >>> >>> >> >> -- >> Frederico Pereira >> Mobile:+351937356301 >> Email:pereira.frederico@xxxxxxxx >> > > > -- > Frederico Pereira > Mobile:+61409066693 > Email:pereira.frederico@xxxxxxxx > --=20 James D. (jj) Johnston Independent Audio and Electroacoustics Consultant --000000000000ca7cae059d55771b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>If I may (as the author of Psychoacoustic Model 2) st= ick a word in here.</div><div><br></div><div>1) if you know what is voiced = and unvoiced, use 15.5 dB SNR required for voiced and 5.5dB for unvoiced SN= R rather than do all the tonality calculations. Pitch strength will probabl= y carry you through, here. (0 -&gt; no pitch) 1-&gt; all pitch, no noise) i= f you have a representation of that.</div><div><br></div><div>2) I would sh= orten the stride of the calculation and do it on a millisecond by milliseco= nd basis.</div><div><br></div><div>3) Proprietary issues prevent me explain= ing how to do this a lot better, sorry!</div><div><br></div><div>jj</div><d= iv><br></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D= "gmail_attr">On Wed, Jan 29, 2020 at 9:17 PM Frederico Pereira &lt;<a href= =3D"mailto:pereira.frederico@xxxxxxxx">pereira.frederico@xxxxxxxx</a>&gt;= wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px = 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir= =3D"ltr"><div>Hi Mengli,</div><div>No I haven=C2=B4t tried other models nor= did I account for temporal models at this point. Having temporal effects i= ntegrated in the psycouacoustic model would be something quite unique from = what I=C2=B4ve seen from existing routines yes!=C2=A0</div><div><br></div><= div><br></div><div>regards,</div><div><br></div><div>Frederico</div></div><= br><div class=3D"gmail_quote"><div class=3D"gmail_attr" dir=3D"ltr">On Wed,= Jan 29, 2020 at 1:08 PM Feng, Mengli (2018) &lt;<a href=3D"mailto:Mengli.F= eng.2018@xxxxxxxx" target=3D"_blank">Mengli.Feng.2018@xxxxxxxx= k</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin= :0px 0px 0px 0.8ex;padding-left:1ex;border-left:1px solid rgb(204,204,204)"= > <div> <div dir=3D"ltr"> <div dir=3D"ltr"> <div dir=3D"ltr"> <div></div> <div> <div>Hi Frederico,</div> <div dir=3D"ltr"><br> </div> <div dir=3D"ltr">Thanks very much for the code!</div> <div dir=3D"ltr"><br> </div> <div dir=3D"ltr">I did the same thing using ISO psychoacoustic model 2. Was= thinking about using models to account for temporal effect. Have you tried= more advanced auditory models?</div> <div dir=3D"ltr"><br> </div> <div dir=3D"ltr">Best wishes,</div> <div dir=3D"ltr">Mengli</div> <div><br> </div> <div id=3D"gmail-m_-2107829885573660gmail-m_7732870121685138551ms-outlook-m= obile-signature"> <div style=3D"direction:ltr">--=C2=A0</div> <div style=3D"direction:ltr">Mengli Feng</div> <div style=3D"direction:ltr">PhD Student </div> <div style=3D"direction:ltr">PGR Collective=C2=A0EPMS=C2=A0School Convenor<= /div> <div style=3D"direction:ltr">=C2=A0</div> <div style=3D"direction:ltr">Audio, Biosignals and Machine Learning Group</= div> <div style=3D"direction:ltr">Department of Electronic Engineering</div> <div style=3D"direction:ltr">Royal Holloway, University of London=C2=A0</di= v> <div style=3D"direction:ltr">=C2=A0</div> <div style=3D"direction:ltr">Research Interest: </div> <div style=3D"direction:ltr">Speech/ voice production and perception</div> <div style=3D"direction:ltr">Ongoing Project: </div> <div style=3D"direction:ltr">the perceptual effect of Bone-conducted sound = of own voice </div> <div style=3D"direction:ltr">&gt;&gt;&gt; Pure Page </div> <div dir=3D"ltr"><br> </div> </div> </div> <div id=3D"gmail-m_-2107829885573660gmail-m_7732870121685138551id-2bc1e381-= 9e25-4e39-ac35-b1c3f314cf57"> <hr style=3D"width:98%;color:rgb(0,0,0);font-family:-webkit-standard;font-s= ize:12pt;display:inline-block"> <div id=3D"gmail-m_-2107829885573660gmail-m_7732870121685138551divRplyFwdMs= g" dir=3D"ltr"><font face=3D"Calibri, sans-serif"><b>From:</b> Frederico Pe= reira &lt;<a href=3D"mailto:pereira.frederico@xxxxxxxx" target=3D"_blank">= pereira.frederico@xxxxxxxx</a>&gt;<br> <b>Sent:</b> Wednesday, January 29, 2020 12:15 pm<br> <b>To:</b> Feng, Mengli (2018)<br> <b>Cc:</b> <a href=3D"mailto:AUDITORY@xxxxxxxx" target=3D"_blank">AU= DITORY@xxxxxxxx</a><br> <b>Subject:</b> Re: [AUDITORY] converting masking thresholds to masker leve= ls of speech sounds <div>=C2=A0</div> </font></div> <div dir=3D"ltr"> <div dir=3D"ltr"> <div dir=3D"ltr"> <div dir=3D"ltr"> <div dir=3D"ltr"> <div>Hi Mengli,</div> <div><br> </div> <div>I=C2=B4m currently working on something similar and I=C2=B4ve been dev= eloping on top of the code and psychoacoustic models based on:</div> <div><i>ISO/IEC 11172-3:1993, Information technology =E2=80=93 Coding of mo= ving pictures and associated audio for digital storage media at up to about= 1,5 Mbit/s =E2=80=93 Part 3: Audio</i></div> <div> <p><span lang=3D"PT"><a href=3D"https://ieeexplore.ieee.org/abstract/docume= nt/1296956" target=3D"_blank">https://ieeexplore.ieee.org/abstract/document= /1296956</a></span></p> <b></b><i></i><u></u><sub></sub><sup></sup><strike></strike>and Matlab code= provided by:<br> </div> <div><a href=3D"https://www.petitcolas.net/fabien/software/mpeg/#references= " target=3D"_blank">https://www.petitcolas.net/fabien/software/mpeg/#refere= nces</a></div> <div><br> </div> <div>Hoping this is of some help to you.</div> <div><br> </div> <div>regards,</div> <div><br> </div> <div>Frederico</div> </div> </div> </div> </div> <br> <div class=3D"gmail_quote"> <div class=3D"gmail_attr" dir=3D"ltr">On Tue, Jan 28, 2020 at 5:19 AM Feng,= Mengli (2018) &lt;<a href=3D"mailto:Mengli.Feng.2018@xxxxxxxx" targ= et=3D"_blank">Mengli.Feng.2018@xxxxxxxx</a>&gt; wrote:<br> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;padding= -left:1ex;border-left:1px solid rgb(204,204,204)"> <div> <div dir=3D"ltr"> <div dir=3D"ltr"> <div></div> <div> <div>Dear All,</div> <div>=C2=A0</div> <div>I am trying to convert masking curves into the frequency responses of = the original maskers (single speech sounds). The maskees I am using are nar= row band noises at different frequencies.</div> <div>=C2=A0</div> <div>It has taken me enormous effort to find an auditory model to make accu= rate predictions, considering the maskers are complex tones with multiple h= armonics in high frequency region. Might anyone provide some guidance or ad= vice on finding a suitable model?=C2=A0</div> <div>=C2=A0</div> <div>Is it even possible to do such prediction knowing only the frequency r= esponses of the maskees and the masking thresholds given that temporal effe= cts would inevitably appear because of the higher harmonics in human speech= sounds? Any opinions?</div> <div>=C2=A0</div> <div>Any suggestion would be greatly appreciated!</div> <div>=C2=A0</div> <div>Best Regards,</div> <div>Mengli</div> <div>=C2=A0</div> <div><br> </div> <div id=3D"gmail-m_-2107829885573660gmail-m_7732870121685138551gmail-m_8380= 058477085050702ms-outlook-mobile-signature"> <div style=3D"direction:ltr">--=C2=A0</div> <div style=3D"direction:ltr">Mengli Feng</div> <div style=3D"direction:ltr">PhD Student </div> <div style=3D"direction:ltr">PGR Collective=C2=A0EPMS=C2=A0School Convenor<= /div> <div style=3D"direction:ltr">=C2=A0</div> <div style=3D"direction:ltr">Audio, Biosignals and Machine Learning Group</= div> <div style=3D"direction:ltr">Department of Electronic Engineering</div> <div style=3D"direction:ltr">Royal Holloway, University of London=C2=A0</di= v> <div style=3D"direction:ltr">=C2=A0</div> <div style=3D"direction:ltr">Research Interest: </div> <div style=3D"direction:ltr">Speech/ voice production and perception</div> <div style=3D"direction:ltr">Ongoing Project: </div> <div style=3D"direction:ltr">the perceptual effect of Bone-conducted sound = of own voice</div> <div style=3D"direction:ltr">&gt;&gt;&gt; Pure Page </div> <div dir=3D"ltr"><br> </div> </div> </div> </div> </div> </div> </blockquote> </div> <br clear=3D"all"> <br> -- <br> <div dir=3D"ltr">Frederico Pereira<br> Mobile:+351937356301<br> <a href=3D"mailto:Email%3Apereira.frederico@xxxxxxxx" target=3D"_blank">Em= ail:pereira.frederico@xxxxxxxx</a></div> </div> </div> </div> </div> </div> </div> </blockquote></div><br clear=3D"all"><br>-- <br><div dir=3D"ltr">Frederico = Pereira<br>Mobile:+61409066693<br><a href=3D"mailto:Email%3Apereira.frederi= co@xxxxxxxx" target=3D"_blank">Email:pereira.frederico@xxxxxxxx</a></div> </blockquote></div><br clear=3D"all"><br>-- <br><div dir=3D"ltr" class=3D"g= mail_signature"><div>James D. (jj) Johnston</div><div>Independent Audio and= Electroacoustics Consultant</div></div> --000000000000ca7cae059d55771b--


This message came from the mail archive
src/postings/2020/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University