Temporal Envelope based pitch perception (Imran Dhamani )


Subject: Temporal Envelope based pitch perception
From:    Imran Dhamani  <imrandhamani@xxxxxxxx>
Date:    Tue, 2 Feb 2010 21:21:25 +0530
List-Archive:<http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

--0-1909276707-1265125885=:74701 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Hi everyone.=20 I recently had a doubt pertaining to envelope based pitch perception. I wou= ld be grateful if I can get the answer to my question. Thanks in advance. According to the various researches that I have read till now pertaining to= the importance of temporal envelope cues in speech perception, I could und= erstand that the pitch/fundamental frequency can be reliably represented vi= a only the temporal envelope cues in normal as well as hearing impaired and= cochlear implanted listeners (at least within a certain range/limit of Fo)= . In a simple laboratory experiment I also found that my subjective judgeme= nt of the pitch of speech sounds (word/sentence) as a trained listener was = almost within 50-60 Hz of the objective estimate of the pitch/Fo using LPC/= autocorrelation or Cepstral analysis in Matlab and Praat software. In anoth= er series of experiments that I performed I found that when I channel vocod= ed speech sounds (500 Hz sine wave and BBN noise carrier both used alternat= ively) using various envelope cut off frequencies ranging from 50-500 Hz wi= th variable number of bands from 8-24 (based on the greenwoods function/map), there was a drastic mismatch between the objective estimate= of fundamental frequency/pitch between the original stimuli and the vocode= d stimuli across all the conditions (example if the pitch of the original s= timuli was 120 Hz the objectively estimated pitch of vocoded stimuli was ar= ound 70-80 Hz). Moreover I also noticed a relatively lesser mismatch betwee= n original and vocoded using the sine wave carrier and with increasing the = envelope cut-off frequencies. In the next set of trials I also generated va= rious pitch shifted versions (relatively preserving the temporal informatio= n) of the same set of speech stimuli and then vocoded them using the same v= ariables and surprisingly found no significant/drastic change (just a 10-20= Hz change) in the objectively estimated pitch even if I shifted the origin= al stimulus pitch by a ratio of 70 (F0=3D220-250 Hz). Later I tried simulat= ing the speech stimuli using a cochlear implant simulation using variable carrier rates from 400-10000 and channels 10-22 and found almost = similar (within 5-10 Hz) objectively estimated pitch values between the ori= ginal and simulated speech stimuli. The doubts that I had are as follows: 1)=A0=A0=A0=A0=A0=A0 Are these findings due to any technical error (probabl= y in objective pitch estimation of vocoded stimuli) or any other mistake? =A0=A0=A0=A0=A0=A0=A0 ( or can subjective findings mask objective data?) 2)=A0=A0=A0=A0=A0=A0 Is pitch representation solely dependent on temporal e= nvelope cues or are there any other contributors like carrier frequency (ot= her than the Nyquist- Shannon theorem), envelope cut-off, envelope extracti= on method, temporal analysis/sample length etc which may also play a major = role? 3)=A0=A0=A0=A0=A0=A0 How is the pitch information encoded and extracted in = such a complex temporal envelope of speech sounds (is it completely differe= nt than the periodicity based or spectral based pitch extraction mode)? 4)=A0=A0=A0=A0=A0=A0 Is it that if I band pass filter (based on auditory fi= lters) the envelope information then the filter/channel containing the pitc= h/Fo information will have a different envelope (probably more periodic) th= an the other parts and maybe the pitch information is extracted by the audi= tory system from the complex envelope through this mode?=A0=A0 =A0 Best regards, Imran Dhamani PhD. student. $$$$$ monty@xxxxxxxx@xxxxxxxx@xxxxxxxx@xxxxxxxx=0A=0A=0A Your Mail works best with the New Yahoo Optim= ized IE8. Get it NOW! http://downloads.yahoo.com/in/internetexplorer/ --0-1909276707-1265125885=:74701 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable <table cellspacing=3D"0" cellpadding=3D"0" border=3D"0" ><tr><td valign=3D"= top" style=3D"font: inherit;"><P style=3D"TEXT-INDENT: -18pt; MARGIN: 0cm 0= cm 10pt 36pt; mso-list: l0 level1 lfo1" class=3DMsoListParagraphCxSpLast><S= PAN style=3D"LINE-HEIGHT: 115%; FONT-SIZE: 8pt; mso-bidi-font-size: 11.0pt"= ><FONT face=3DCalibri><SPAN style=3D"mso-spacerun: yes"></SPAN></FONT><?xml= :namespace prefix =3D o ns =3D "urn:schemas-microsoft-com:office:office" />= <o:p></o:p></SPAN></DIV> <P style=3D"MARGIN: 0cm 0cm 10pt" class=3DMsoNormal><SPAN style=3D"LINE-HEI= GHT: 115%; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt"><FONT face=3DCalibri= >Hi everyone. <o:p></o:p></FONT></SPAN></DIV> <P style=3D"MARGIN: 0cm 0cm 10pt" class=3DMsoNormal><SPAN style=3D"LINE-HEI= GHT: 115%; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt"><FONT face=3DCalibri= >I recently had a doubt pertaining to envelope based pitch perception. I wo= uld be grateful if I can get the answer to my question. Thanks in advance.<= o:p></o:p></FONT></SPAN></DIV> <P style=3D"MARGIN: 0cm 0cm 10pt" class=3DMsoNormal><SPAN style=3D"LINE-HEI= GHT: 115%; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt"><FONT face=3DCalibri= >According to the various researches that I have read till now pertaining t= o the importance of temporal envelope cues in speech perception, I could un= derstand that the pitch/fundamental frequency can be reliably represented v= ia only the temporal envelope cues in normal as well as hearing impaired an= d cochlear implanted listeners (at least within a certain range/limit of Fo= ). In a simple laboratory experiment I also found that my subjective judgem= ent of the pitch of speech sounds (word/sentence) as a trained listener was= almost within 50-60 Hz of the objective estimate of the pitch/Fo using LPC= /autocorrelation or Cepstral analysis in Matlab and Praat software. In anot= her series of experiments that I performed I found that when I channel voco= ded speech sounds (500 Hz sine wave and BBN noise carrier both used alternatively) using various envelope cut off frequencies ranging from 50-= 500 Hz with variable number of bands from 8-24 (based on the greenwoods fun= ction/map), there was a drastic mismatch between the objective estimate of = fundamental frequency/pitch between the original stimuli and the vocoded st= imuli across all the conditions (example if the pitch of the original stimu= li was 120 Hz the objectively estimated pitch of vocoded stimuli was around= 70-80 Hz). Moreover I also noticed a relatively lesser mismatch between or= iginal and vocoded using the sine wave carrier and with increasing the enve= lope cut-off frequencies. In the next set of trials I also generated variou= s pitch shifted versions (relatively preserving the temporal information) o= f the same set of speech stimuli and then vocoded them using the same varia= bles and surprisingly found no significant/drastic change (just a 10-20 Hz = change) in the objectively estimated pitch even if I shifted the original stimulus pitch by a ratio of 70 (F0=3D220-250 Hz). Later I tried = simulating the speech stimuli using a cochlear implant simulation using var= iable carrier rates from 400-10000 and channels 10-22 and found almost simi= lar (within 5-10 Hz) objectively estimated pitch values between the origina= l and simulated speech stimuli. The doubts that I had are as follows:<o:p><= /o:p></FONT></SPAN></DIV> <P style=3D"TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 0pt 36pt; mso-list: l0 leve= l1 lfo1" class=3DMsoListParagraphCxSpFirst><SPAN style=3D"LINE-HEIGHT: 115%= ; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: Calibri= ; mso-bidi-theme-font: minor-latin"><SPAN style=3D"mso-list: Ignore"><FONT = face=3DCalibri>1)</FONT><SPAN style=3D"FONT: 7pt 'Times New Roman'">&nbsp;&= nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style=3D"LINE-HEIG= HT: 115%; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt"><FONT face=3DCalibri>= Are these findings due to any technical error (probably in objective pitch = estimation of vocoded stimuli) or any other mistake?</FONT></SPAN></DIV> <P style=3D"TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 0pt 36pt; mso-list: l0 leve= l1 lfo1" class=3DMsoListParagraphCxSpFirst><SPAN style=3D"LINE-HEIGHT: 115%= ; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt"><FONT face=3DCalibri>&nbsp;&n= bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ( or can subjective findings mask object= ive data?)<o:p></o:p></FONT></SPAN></DIV> <P style=3D"TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 0pt 36pt; mso-list: l0 leve= l1 lfo1" class=3DMsoListParagraphCxSpMiddle><SPAN style=3D"LINE-HEIGHT: 115= %; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: Calibr= i; mso-bidi-theme-font: minor-latin"><SPAN style=3D"mso-list: Ignore"><FONT= face=3DCalibri>2)</FONT><SPAN style=3D"FONT: 7pt 'Times New Roman'">&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style=3D"LINE-HEI= GHT: 115%; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt"><FONT face=3DCalibri= >Is pitch representation solely dependent on temporal envelope cues or are = there any other contributors like carrier frequency (other than the Nyquist= - Shannon theorem), envelope cut-off, envelope extraction method, temporal = analysis/sample length etc which may also play a major role?<o:p></o:p></FO= NT></SPAN></DIV> <P style=3D"TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 0pt 36pt; mso-list: l0 leve= l1 lfo1" class=3DMsoListParagraphCxSpMiddle><SPAN style=3D"LINE-HEIGHT: 115= %; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: Calibr= i; mso-bidi-theme-font: minor-latin"><SPAN style=3D"mso-list: Ignore"><FONT= face=3DCalibri>3)</FONT><SPAN style=3D"FONT: 7pt 'Times New Roman'">&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style=3D"LINE-HEI= GHT: 115%; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt"><FONT face=3DCalibri= >How is the pitch information encoded and extracted in such a complex tempo= ral envelope of speech sounds (is it completely different than the periodic= ity based or spectral based pitch extraction mode)?<o:p></o:p></FONT></SPAN= ></DIV> <P style=3D"TEXT-INDENT: -18pt; MARGIN: 0cm 0cm 10pt 36pt; mso-list: l0 lev= el1 lfo1" class=3DMsoListParagraphCxSpLast><SPAN style=3D"LINE-HEIGHT: 115%= ; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt; mso-bidi-font-family: Calibri= ; mso-bidi-theme-font: minor-latin"><SPAN style=3D"mso-list: Ignore"><FONT = face=3DCalibri>4)</FONT><SPAN style=3D"FONT: 7pt 'Times New Roman'">&nbsp;&= nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN style=3D"LINE-HEIG= HT: 115%; FONT-SIZE: 9pt; mso-bidi-font-size: 11.0pt"><FONT face=3DCalibri>= Is it that if I band pass filter (based on auditory filters) the envelope i= nformation then the filter/channel containing the pitch/Fo information will= have a different envelope (probably more periodic) than the other parts an= d maybe the pitch information is extracted by the auditory system from the = complex envelope through this mode?&nbsp;<SPAN style=3D"mso-spacerun: yes">= &nbsp;</SPAN></FONT></SPAN></DIV> <DIV> <DIV>&nbsp;</DIV> <P style=3D"MARGIN: 0cm 0cm 10pt" class=3DMsoNormal>Best regards,</DIV> <P style=3D"MARGIN: 0cm 0cm 10pt" class=3DMsoNormal>Imran Dhamani</DIV> <DIV>PhD. student.<BR><BR>$$$$$ monty@xxxxxxxx@xxxxxxxx@xxxxxxxx@xxxxxxxx</DIV></DIV></td></tr></table><br>= =0A <!--1--><hr size=3D1></hr> =0AYour Mail works best with the New Ya= hoo Optimized IE8. <a href=3D"http://in.rd.yahoo.com/tagline_ie8_new/*http:= //downloads.yahoo.com/in/internetexplorer/" target=3D"_blank">Get it NOW!</= a>. --0-1909276707-1265125885=:74701--


This message came from the mail archive
/home/empire6/dpwe/public_html/postings/2010/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University