[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [AUDITORY] Assistance with MATLAB 'detectSpeech' function



Dear Ian,

I have in the past used a Voice Activity Detection (VAD) processor, however not in MATLAB, but  came across the exact same issue as you are reporting. 
(I have used the WEBRTC VAD and customized the functions as necessary:  https://github.com/mozilla/webrtcvad_js). 

In my implementation, adjusting function parameters for threshold, tail margin and head margin as depicted in the figure below have solved the problem.
vad_parameters.png 
Unfortunately I have no experience with the detectSpeech function, but maybe you can manually introduce tail margin and head margin to capture the whole of your voiced segment, although this may result in the inclusion of little bits of unvoiced signal, or in worst cases, even catch a bit of the last and next voiced signal if pauses between them are small.

Please let me know if I can help with anything else.

Best regards,
Frederico


On Fri, Mar 22, 2024 at 4:42 AM Mertes, Ian Benjamin <imertes@xxxxxxxxxxxx> wrote:

Hello all,

 

 

I am using Matlab R2023b and the Audio Toolbox. I would like to use the 'detectSpeech.m' function to find the boundaries of speech for a word recognition task.

 

I'm having difficulty getting the function to correctly capture the boundaries. Below is an example figure using the sentence "Say the word laud." The blue shaded area is the detected region of speech. Note that it does not correctly detect the onset and offset of the sentence. The figure was generated using the default values of the function. I also tried manipulating the window duration, percent overlap, and merge duration but I was unable to improve the detection.

 

Any recommendations you may have would be greatly appreciated. Thank you!

 

Best,
Ian

 

 

 

 

Ian Mertes, PhD, AuD, CCC-A

Assistant Professor

 
Dept. of Speech and Hearing Science
University of Illinois Urbana-Champaign
208 Speech and Hearing Science Building

901 S. Sixth St. | M/C 482 | Champaign, IL 61820
217.300.4756 | imertes@xxxxxxxxxxxx
Dept. website: shs.illinois.edu | Lab website: hrl.shs.illinois.edu



Under the Illinois Freedom of Information Act any written communication to or from university employees regarding university business is a public record and may be subject to public disclosure.

 

 



--
Frederico Pereira
Mobile:+351 937356301
Audio and Acoustics, Perception Interaction and Usability