Re: [AUDITORY] Looking for guidance on preparing speech stimuli (words & sentences)

Subject: Re: [AUDITORY] Looking for guidance on preparing speech stimuli (words & sentences)

From: Tiago Lima Bicalho Cruz <tiagolbc@xxxxxxxxx>

Date: Fri, 21 Nov 2025 07:49:18 -0300

Arc-authentication-results: i=1; mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b="st/8VL5N"; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:in-reply-to:to:comments:subject:from:sender:reply-to :date:message-id:references:mime-version:approved-by:dkim-signature; bh=Pz0wWPQI8dRtiCOMEi2a7FDHk2HWOQ+9sZ/uC0Ty7zo=; fh=5/42mu9FVmfuMp6n0xGXVcDar2H3ENcHt8Uv11Om8gY=; b=a0mnU3Ru3fZLCj7+3WNSFTU29Nr2kB2WfMjZp90g+//rf+rvRPfExbduOD0gUavfxs eynbqCATlHqrcoI/JYBGTLwYOXwkCPVWqNCLgrHaO8HCxgYNXzBD4QMpTvbg/0ZnzKEh Un5RbyMPAzo9AMDZUxVZTUd4RK6t3LNpmuIm1kbW6djgHEqlQdpqGJuele7JcQ2lID4e LKTBFIN1GeYYRPy+uSGRJthyTEBLbNRI86DNX9Yb05WbaxY1/PlDEGsdxiFRicgIomJa k9W5wqtaBWYP3QvE6DXqjRbC5mGcUxqX6EURp9no3T/0yFl1LrpiYn0TTeaT4ajLsOWd NThQ==; dara=google.com

Arc-seal: i=1; a=rsa-sha256; t=1763788110; cv=none; d=google.com; s=arc-20240605; b=c4X3QSOh0A9noyBcLcXmWjbiRkn9hI0i/nl6XEl4cA+tnISej49ADCzxqZH67LN5A7 makE0hgUi6JuzlK7U8F/kLjSObBn4hqt5CT3fLLxK2W33SkTeiKrdbBTa1QqppJrl+mo siGONaGV1qsCZQaX9rs28U5C04YksbpK1PUwho58a58TOBxe4L0Upu3EBy7bAleAabE7 VtpDBr+hl9XiSx6zVjDAyuf3aJwkkmFKJHXU3f+51Ctg0+kOqANoU7jbQle260mH/C65 Jddb0OtQ+d/c4sz+GKHMDgyaU+Q3pGS9zSaIZrH+8GUG5a4P4cRdre5vz/uaU1gY3VTR hqmg==

Authentication-results: mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b="st/8VL5N"; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com

Comments: To: "Kaduk, Katharina" <k.kaduk@xxxxxxxxxxxxxxx>

Delivered-to: dan.ellis@xxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; d=LISTS.MCGILL.CA; s=SELECTOR1; c=relaxed/relaxed; bh=Pz0wWPQI8dRtiCOMEi2a7FDHk2HWOQ+9sZ/uC0Ty7zo=; i=@LISTS.MCGILL.CA; h=Approved-By:MIME-Version:References:Content-Type:Message-ID:Date:Reply-To:Sender:From:Subject:To:In-Reply-To:List-Help:List-Unsubscribe:List-Subscribe:List-Owner:List-Archive; b=st/8VL5NuPAvzJVAcj22Wt9P1mwEy8mjGi1hqEdqjQ6cDUj+HkafJ680LELniXhSZnl0I3hpLwh4cpvxksX/5m7EM3uNniUzjydoiLPrO8uYUYslvL0t6WUHCvrF0Qsh44U/d1cCgHsyJQmRjIa5pIhRBdckHnO+7CBQXT5u/sZBRY6nlnQ0ZfWgnG+eLYENwLch7oXJ5TF6o4O1Hytynf8abErIjkd7uTMtpQKgJH65sUbeRCW9kPGpVnaibiwOvglAlqqliJe0vITFf+WklU8C5zG47vz6nrG/2oqUjRy0hKWFtendfJLRmwcFrN+lWYqABAHQkfUFKRP3dL81cg==

In-reply-to: <LO0P265MB27640FFBB1448F87F01E7AA6A2D5A@LO0P265MB2764.GBRP265.PROD.OUTLOOK.COM>

List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

References: <LO0P265MB27640FFBB1448F87F01E7AA6A2D5A@LO0P265MB2764.GBRP265.PROD.OUTLOOK.COM>

Reply-to: Tiago Lima Bicalho Cruz <tiagolbc@xxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Dear Katharina,

I hope this message finds you well.

In my opinion, much depends on the objectives of your study.

Regarding onset and offset detection, automatic methods can be convenient but may introduce errors, especially with complex speech material. It’s often worth testing both approaches. Are you familiar with Sonic Visualiser? It’s a free program with several plug-ins for automatic onset and offset detection. You might also try Praat, which allows both manual and scripted detection of voice onset time (VOT) and related cues.

Normalization is indeed common practice, particularly for perception studies. However, if you are working with multiple speakers, a calibration procedure before recording (for level matching) is preferable to excessive post-processing.

As for noise reduction or fade-in/fade-out, that again depends on your research goals. Fades can unintentionally remove parts of the initial or final syllables, and noise reduction can alter spectral properties such as formant amplitudes. For example, when I prepare perception stimuli with singers, I avoid these steps and instead control the recording environment to achieve a good signal-to-noise ratio (SNR) from the start.

I hope this helps.

Best regards,
Tiago.

Em sex., 21 de nov. de 2025 às 07:22, Kaduk, Katharina <k.kaduk@xxxxxxxxxxxxxxx> escreveu:

Dear AUDITORY community,

I am preparing speech stimuli for a perception study, and I wanted to ask whether there are methodological papers or guidelines that describe common practices for processing recorded speech for both single words and full sentences. I have already recorded the items (in Audacity) and I am overall happy with the quality, but now that I am working through the processing, a few questions keep coming up, such as:

how people usually detect onset and offset (automatic vs. manual trimming)

typical processing steps after recording (normalization, noise reduction, fade-in/fade-out...)

how fade durations are usually chosen (fixed milliseconds vs. percentage of item length)

Stimulus-preparation steps do not seem to be reported in much detail, so I wanted to ask whether there are any recommended methods papers, workflows, or best-practice examples that people rely on.

Any pointers or suggestions would be very much appreciated.

Many thanks in advance!

Katharina

Dr Katharina Kaduk
Senior Research Associate in Pediatric Auditory Neuroscience (iCAT Project - UKRI)
Pediatric Listening, Cognition, and Neuroscience Laboratory (The PELiCAN Lab)
Infant and Child Development Lab (ICDLab)

Department of Psychology
Lancaster University
Fylde D42
Lancaster LA1 4YF

📞 +44 (0)7747 551261

Speech Language Pathologist, Singer, Entrepreneur.

Voice Specialist

Master and PhD in Sonology

School of Music / UFMG

+55 31 99113-9509

Visiting Scholar at Syracuse University (2021-2022)

Setnor School of Music