Re: [AUDITORY] Tool for automatic syllable segmentation

Subject: Re: [AUDITORY] Tool for automatic syllable segmentation

From: Jan Schnupp <000000e042a1ec30-dmarc-request@xxxxxxxxxxxxxxx>

Date: Fri, 20 Sep 2024 17:29:18 +0800

Approved-by: jan.schnupp@xxxxxxxxxxxxxx

Arc-authentication-results: i=1; mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b=WjpMpmQ8; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mcgill.ca

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:in-reply-to:to:comments:subject:from:sender:reply-to :date:message-id:references:mime-version:approved-by:dkim-signature; bh=LOIsh/FFkWfidy+3Shjg7VBucgaICaZqeyAzHmSGQMw=; fh=5/42mu9FVmfuMp6n0xGXVcDar2H3ENcHt8Uv11Om8gY=; b=hGSNj/T38hody48HH1jXYLEc89txEWN7bAXHoywWXuOw/EFqHf6KNSyXnhydpeB3sK EaK5HXp6CezvUAUVhsM7Io6oTBchrthQr4BR+QF8M9j5fkOqN2moGBQ86D6Af3Z7Ky63 cMpoTAKIQHl/s6RZKaZ8JNpgJLD5kPk8hqUI/zm6QKIwIey0QHwBynHnJu361bmiSMWt 2G1d1JMz8w9tAmqyGW6SqX4bYZfY2Z38Q7XCXHFmY/xu7raRqWq01W62m80mR488Xgkg Lnvaaxikpc58xIGYFtmVD8k0vhqCDm8XpZpkCZ7slBcffjTREvSMRnkDHBOWH8evXEZK hjSQ==; dara=google.com

Arc-seal: i=1; a=rsa-sha256; t=1726825693; cv=none; d=google.com; s=arc-20240605; b=Bk7+TyjJGaI1M5hyFG8TffbuhcRgMjghlRkI2pUEh58sIlGKMD9cAAxb9vftJTgSDn fKc+zraHmpdNMwhczOu+j8WS7+0lgxOGMso9CyqbhVY73UcDs0bWCFnPIVwyKSvcNsaS /gK2As+HTksJMgq37pY8X8Tvi9GdAJrc4JugU2tujsDB+txXqBvZVkurQALXG3TQe39n hotBzMj+W0gCOs3li2Zr5h0aDPwQ3juUSHwlAW7M7owxk5zf3qMArhAtv1iLTJL5lI0a M3eCUJss9e6+08jdHboVxYAfB5BmmFJo65rUiw2Wb6AqStzsntJcRoxoAYTeJxjs0Odm 85sQ==

Authentication-results: mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b=WjpMpmQ8; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mcgill.ca

Comments: To: Rémy MASSON <remy.masson@xxxxxxxxxx>

Delivered-to: dan.ellis@xxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; d=LISTS.MCGILL.CA; s=SELECTOR1; c=relaxed/relaxed; bh=LOIsh/FFkWfidy+3Shjg7VBucgaICaZqeyAzHmSGQMw=; i=@LISTS.MCGILL.CA; h=Approved-By:MIME-Version:References:Content-Type:Message-ID:Date:Reply-To:Sender:From:Subject:To:In-Reply-To:List-Help:List-Unsubscribe:List-Subscribe:List-Owner:List-Archive; b=WjpMpmQ8KtauZgR4YNF/RgDP3hHPoto5vhWSSUFcs186YyUjgQDc9fAuqud+bGW1UrJHF+z5qjQzsWbKKEvO0A2lxV36M5fLk9Ufd9/kytd31GVXoU7QEqRrUbUR1+W2cLyjWnaNumuGFBZiD+5XOPHa4jVSBZABzstBcNllFm2BDVaVwyyus0BxVnPtPtbxvYSxTxHhDPTOazCZc4fW/PTpsl6Op7nUwH7grbVBIwvnXZNCR5e5+/VB0v1GxMNh4PiBxNGnwVJfRKmMUEwZZQ9PyqrkKZKTxKLWY5OmKUO5OX5KTOLtFGB1scZ9od3tt2iRfm/7aYiQJH9p6Lv/Fw==

In-reply-to: <1e5ea9e398c44e609f9053065a7642d3@pasteur.fr>

List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

References: <1e5ea9e398c44e609f9053065a7642d3@pasteur.fr>

Reply-to: Jan Schnupp <jan.schnupp@xxxxxxxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Dear Remy,

it might be useful for us to know where your meaningless CV syllable stimuli come from.

But in any event, if you are any good at coding you are likely better off working directly computing parameters of the recording waveforms and apply criteria to those. CV syllables have an "energy arc" such that the V is invariably louder than the C. In speech there are rarely silent gaps between syllables, so you may be looking at a CVCVCVCV... stream where the only "easy" handle on the syllable boundary is likely to be the end of end of the vowel, which should be recognizable by a marked decline in acoustic energy, which you can quantify by some running RMS value (perhaps after low-pass filtering given that consonants rarely have much low frequency energy). If that's not accurate or reliable enough then things are likely to get a lot trickier. You could look for voicing in a running autocorrelation as an additional cue given that all vowels are voiced but only some consonants are.

How many of these do you have to process? If the number isn't huge, it may be quicker to find the boundaries "by ear" than trying to develop a piece of computer code. The best way forward really depends enormously on the nature of your original stimulus set.

Best wishes,

Jan

---------------------------------------

Prof Jan Schnupp
Gerald Choa Neuroscience Institute

The Chinese University of Hong Kong

Sha Tin

Hong Kong

https://auditoryneuroscience.com

http://jan.schnupp.net

On Thu, 19 Sept 2024 at 12:19, Rémy MASSON <remy.masson@xxxxxxxxxx> wrote:

Hello AUDITORY list,

We are attempting to do automatic syllable segmentation on a collection of sound files that we use in an experiment. Our stimuli are a rapid sequence of syllables (all beginning with a consonant and ending with a vowel) with no underlying semantic meaning and with no pauses. We would like to automatically extract the syllable/speech rate and obtain the timestamps for each syllable onset.

We are a bit lost on which tool to use. We tried PRAAT with the Syllable Nuclei v3 script, the software VoiceLab and the website WebMaus. Unfortunately, for each of them their estimation of the total number of syllables did not consistently match what we were able to count manually, despite toggling with the parameters.

Do you have any advice on how to go further? Do you have any experience in syllable onset extraction?

Thank you for your understanding,

Rémy MASSON

Research Engineer

Laboratory "Neural coding and neuroengineering of human speech functions" (NeuroSpeech)

Institut de l’Audition – Institut Pasteur (Paris)