I hope this message finds you well.
In my opinion, much depends on the objectives of your study.
Regarding onset and offset detection, automatic methods can be convenient but may introduce errors, especially with complex speech material. It’s often worth testing both approaches. Are you familiar with Sonic Visualiser? It’s a free program with several plug-ins for automatic onset and offset detection. You might also try Praat, which allows both manual and scripted detection of voice onset time (VOT) and related cues.
Normalization is indeed common practice, particularly for perception studies. However, if you are working with multiple speakers, a calibration procedure before recording (for level matching) is preferable to excessive post-processing.
As for noise reduction or fade-in/fade-out, that again depends on your research goals. Fades can unintentionally remove parts of the initial or final syllables, and noise reduction can alter spectral properties such as formant amplitudes. For example, when I prepare perception stimuli with singers, I avoid these steps and instead control the recording environment to achieve a good signal-to-noise ratio (SNR) from the start.
I hope this helps.
Best regards,
Tiago.
Dear AUDITORY community,I am preparing speech stimuli for a perception study, and I wanted to ask whether there are methodological papers or guidelines that describe common practices for processing recorded speech for both single words and full sentences. I have already recorded the items (in Audacity) and I am overall happy with the quality, but now that I am working through the processing, a few questions keep coming up, such as:
how people usually detect onset and offset (automatic vs. manual trimming) typical processing steps after recording (normalization, noise reduction, fade-in/fade-out...) how fade durations are usually chosen (fixed milliseconds vs. percentage of item length)Stimulus-preparation steps do not seem to be reported in much detail, so I wanted to ask whether there are any recommended methods papers, workflows, or best-practice examples that people rely on.Any pointers or suggestions would be very much appreciated.Many thanks in advance!Katharina
Dr Katharina Kaduk
Senior Research Associate in Pediatric Auditory Neuroscience (iCAT Project - UKRI)
Pediatric Listening, Cognition, and Neuroscience Laboratory (The PELiCAN Lab)
Infant and Child Development Lab (ICDLab)
Department of Psychology
Lancaster University
Fylde D42
Lancaster LA1 4YF📞 +44 (0)7747 551261