Subject: Re: [AUDITORY] Tool for automatic syllable segmentation From: Pierre Divenyi <pdivenyi@xxxxxxxx> Date: Fri, 20 Sep 2024 08:48:17 -0700--Apple-Mail-03D44C33-3763-418E-9A2D-E9D951BE3F0F Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D= utf-8"></head><body dir=3D"auto">In our work with synthesized CVs we assumed= that the initial C burst always woke up the perceptual system. <div><b= r></div><div>Pierre </div><div><div dir=3D"ltr"><br></div><div dir=3D"l= tr"><br><blockquote type=3D"cite">On Sep 20, 2024, at 02:43, Jan Schnupp <= ;000000e042a1ec30-dmarc-request@xxxxxxxx> wrote:<br><br></blockquo= te></div><blockquote type=3D"cite"><div dir=3D"ltr">=EF=BB=BF<div dir=3D"ltr= ">Dear Remy,<div><br></div><div>it might be useful for us to know where your= meaningless CV syllable stimuli come from. </div><div>But in any event= , if you are any good at coding you are likely better off working directly c= omputing parameters of the recording waveforms and apply criteria to th= ose. CV syllables have an "energy arc" such that the V is invariably louder t= han the C. In speech there are rarely silent gaps between syllables, so you m= ay be looking at a CVCVCVCV... stream where the only "easy" handle on the sy= llable boundary is likely to be the end of end of the vowel, which = ;should be recognizable by a marked decline in acoustic energy, which you ca= n quantify by some running RMS value (perhaps after low-pass filtering given= that consonants rarely have much low frequency energy). If that's not accur= ate or reliable enough then things are likely to get a lot trickier. You cou= ld look for voicing in a running autocorrelation as an additional cue given t= hat all vowels are voiced but only some consonants are. </div><div>How m= any of these do you have to process? If the number isn't huge, it may be qui= cker to find the boundaries "by ear" than trying to develop a piece of compu= ter code. The best way forward really depends enormously on the nature of yo= ur original stimulus set. </div><div><br></div><div>Best wishes,</div><= div><br></div><div>Jan<br clear=3D"all"><div><div dir=3D"ltr" class=3D"gmail= _signature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div dir=3D"= ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><di= v style=3D"font-size:12.8px"><br></div><div style=3D"font-size:12.8px">-----= ----------------------------------</div><div style=3D"font-size:12.8px">Prof= Jan Schnupp<br>Gerald Choa Neuroscience Institute</div><div style=3D"font-s= ize:12.8px">The Chinese University of Hong Kong</div><div style=3D"font-size= :12.8px"><span style=3D"font-size:12.8px">Sha Tin</span></div><div style=3D"= font-size:12.8px"><span style=3D"font-size:12.8px">Hong Kong</span><br></div= ><div><div style=3D"font-size:12.8px"><br></div><a href=3D"https://auditoryn= euroscience.com" target=3D"_blank">https://auditoryneuroscience.com</a></div= ><div><a href=3D"http://jan.schnupp.net" target=3D"_blank">http://jan.schnup= p.net<br></a></div></div></div></div></div></div></div></div></div><br></div= ></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">= On Thu, 19 Sept 2024 at 12:19, R=C3=A9my MASSON <<a href=3D"mailto:remy.= masson@xxxxxxxx">remy.masson@xxxxxxxx</a>> wrote:<br></div><blockquot= e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px so= lid rgb(204,204,204);padding-left:1ex"><div class=3D"msg473229531229307533">= <div lang=3D"FR" style=3D"overflow-wrap: break-word;"> <div class=3D"m_-8617555306533983283WordSection1"> <p class=3D"MsoNormal">Hello AUDITORY list,<u></u><u></u></p> <p class=3D"MsoNormal"><u></u> <u></u></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">We are attempting to do automati= c syllable segmentation on a collection of sound files that we use in an exp= eriment. Our stimuli are a rapid sequence of syllables (all beginning with a= consonant and ending with a vowel) with no underlying semantic meaning and with no pauses. We would like to au= tomatically extract the syllable/speech rate and obtain the timestamps for e= ach syllable onset.<u></u><u></u></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u> <u></u></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">We are a bit lost on which tool t= o use. We tried PRAAT with the Syllable Nuclei v3 script, the software Voice= Lab and the website WebMaus. Unfortunately, for each of them their estimatio= n of the total number of syllables did not consistently match what we were able to count manually, despite tog= gling with the parameters. <u></u><u></u></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u> <u></u></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">Do you have any advice on how to= go further? Do you have any experience in syllable onset extraction?<u></u>= <u></u></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u> <u></u></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US">Thank you for your understanding= ,<u></u><u></u></span></p> <p class=3D"MsoNormal"><span lang=3D"EN-US"><u></u> <u></u></span></p> <p class=3D"MsoNormal"><b><span lang=3D"EN-US" style=3D"color:rgb(68,114,196= )">R=C3=A9my MASSON<u></u><u></u></span></b></p> <p class=3D"MsoNormal"><i><span lang=3D"EN-US" style=3D"color:rgb(68,114,196= )">Research Engineer<u></u><u></u></span></i></p> <p class=3D"MsoNormal"><span lang=3D"EN-US" style=3D"color:rgb(47,84,150)">L= aboratory "Neural coding and neuroengineering of human speech functions" (Ne= uroSpeech)<u></u><u></u></span></p> <p class=3D"MsoNormal"><span style=3D"color:rgb(47,84,150)">Institut de l=E2= =80=99Audition =E2=80=93 Institut Pasteur (Paris)<u></u><u></u></span></p> <p class=3D"MsoNormal"><span><div><image001.jpg></div><u></u><u></u></= span></p> <p class=3D"MsoNormal"><u></u> <u></u></p> </div> </div> </div></blockquote></div> </div></blockquote></div></body></html>= --Apple-Mail-03D44C33-3763-418E-9A2D-E9D951BE3F0F--