Re: [AUDITORY] Assistance with MATLAB 'detectSpeech' function

Subject: Re: [AUDITORY] Assistance with MATLAB 'detectSpeech' function

From: Brian Hemmat <000002e7f0d4b00b-dmarc-request@xxxxxxxxxxxxxxx>

Date: Fri, 22 Mar 2024 14:46:17 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b=ODeVRr9B; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mcgill.ca

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-archive:list-owner:list-subscribe:list-unsubscribe:list-help :precedence:in-reply-to:to:comments:subject:from:sender:reply-to :date:message-id:content-language:mime-version:msip_labels :accept-language:references:thread-index:thread-topic:approved-by :dkim-signature; bh=S/ocMrtz1OsCvWlotgwQ/q/MxMwTdb1Ur7SC2NWewsE=; fh=5/42mu9FVmfuMp6n0xGXVcDar2H3ENcHt8Uv11Om8gY=; b=eJzehFRGFzHrRWnrgDEPh5IgKPROJoU97QkSvkArcHIddZ1X3ER/ufou+cJ7nPZSfM T3uz4x9v3Ml/q2IV9UuV5El9F7wZbGOHdmoT68cJRtJVaVTeFhFNHiEox8EMzXGpZdKn jQO9v+CKytLWw3mAR5KUdcgc51Ud22OxiC+UvFakis40UVb0V5vcL40fzqUHu9xD9Vfv Jnxa2d6Ez6tl1GjlkjtliOny1xdCTKEbA32XSTHMEGTFzZOc50gSrnrOgNz1RIFB9hgf CtP2YJ3nrwHJnxSE3g3pmkMN61dUaGqjd87KmEMryr+MvcGigoC5WFckGuFK8QRpUpph Vhmg==; dara=google.com

Arc-seal: i=1; a=rsa-sha256; t=1711169084; cv=none; d=google.com; s=arc-20160816; b=x80fjSBNa7ki3uYsuE2ojWnOdg2cCkIq5ar6/ugGzb6SUH1tSjVSgBjjMYis3h7EXr bmElwtOXfvB8/IYeHB7zdhP9USNBjcYoCpQfFuhPSEclxKd2z9Mlda0KyrqDG+bRpeWf 4GJsJ5Z0xoidWYUFAHkJbcs5f0sA8wYkYaqH8tXe0oKH67oYpfHktfzay9DgG3iygQV/ fyJHoxGjSFg3HZ2lnaIjTS/Qomky57YHXLJ4PmwtKQAp0MiMbNkBgmuEyWb4jgn54/S2 dnlnO6OlAIXvv00IQT7x5QMInrYKHvkAYlajizMMIpxE/jaCTmC0M8S+ABzwEFOd8GOp ULNg==

Authentication-results: mx.google.com; dkim=pass header.i=@LISTS.MCGILL.CA header.s=SELECTOR1 header.b=ODeVRr9B; spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.103 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mcgill.ca

Comments: To: "Mertes, Ian Benjamin" <imertes@xxxxxxxxxxxx>

Delivered-to: dan.ellis@xxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; d=LISTS.MCGILL.CA; s=SELECTOR1; c=relaxed/relaxed; bh=S/ocMrtz1OsCvWlotgwQ/q/MxMwTdb1Ur7SC2NWewsE=; i=@LISTS.MCGILL.CA; h=Approved-By:Thread-Topic:Thread-Index:References:Accept-Language:msip_labels:MIME-Version:Content-Language:Content-Type:Message-ID:Date:Reply-To:Sender:From:Subject:To:In-Reply-To:List-Help:List-Unsubscribe:List-Subscribe:List-Owner:List-Archive; b=ODeVRr9BFl64OUebTpB8Iq6ofVGkZ2nx0bDO/Bwe7VToZxsj5AyJUNDR+AJ4XSXDePQ6BHWGzyYnWw9Rh1JNsz+veh0jgiHWz170Wo2U8ViMkwCiM6nQRmlSahF86hbpw55zOlMhk1qZDhrXe76xrwU54lShlBsJXs4kFux3Z7A6q08Xy9xXE9C++E+sro9/T02mbhyExAAZYHy636x3T2LQn4lpel6Jr4RC13mAOSDXSrqs3kgXkgFLVe89itHUrwSZOB4Ue9O/Hljmt3HiND81A4oD/W1+GGNOh6XMGsw7507gYPiDOrkWMBvrKvvt5Fp4uSndRGEa3UpDQnbTJw==

In-reply-to: <CH2PR11MB4263C85C3FCFB1E63D23E664CA322@CH2PR11MB4263.namprd11.prod.outlook.com>

List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

Msip_labels:

References: <CH2PR11MB4263C85C3FCFB1E63D23E664CA322@CH2PR11MB4263.namprd11.prod.outlook.com>

Reply-to: Brian Hemmat <bhemmat@xxxxxxxxxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Thread-index: Adp7x4KgCEHk4r4yRp6yzHxMEQrErgAnkOnQ

Thread-topic: Assistance with MATLAB 'detectSpeech' function

Hi Ian,

The detectSpeech function does in general have aggressive boundaries to the speech region. It’s a simple algorithm that uses frame-based energy and spectral spread thresholding and there is no logic to hold-over or extend decisions except between regions. This means it does especially poorly for speech regions that begin or end with unvoiced speech. In some of our machine learning examples, we’ve found we get better results when we manually extend the roi as a postprocessing step. The extendsigroi function might be useful for that.

Another option is to use the detectspeechnn function, which is available since 23a—but it also requires Deep Learning Toolbox since it uses a deep learning model under-the-hood. On a sample of the same sentence you used, it performed well. It also has a number of parameters to give the type of control you’re looking for (e.g. ActivationThreshold, DeactivationThreshold).

Feel free to reach out directly to me if you want to discuss further/I can be of any help.

Best,

Brian Hemmat (Software Developer for Audio Toolbox at MathWorks).

From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx> On Behalf Of Mertes, Ian Benjamin
Sent: Thursday, March 21, 2024 3:51 PM
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: [AUDITORY] Assistance with MATLAB 'detectSpeech' function

Hello all,

I am using Matlab R2023b and the Audio Toolbox. I would like to use the 'detectSpeech.m' function to find the boundaries of speech for a word recognition task.

I'm having difficulty getting the function to correctly capture the boundaries. Below is an example figure using the sentence "Say the word laud." The blue shaded area is the detected region of speech. Note that it does not correctly detect the onset and offset of the sentence. The figure was generated using the default values of the function. I also tried manipulating the window duration, percent overlap, and merge duration but I was unable to improve the detection.

Any recommendations you may have would be greatly appreciated. Thank you!

Best,
Ian

—

Ian Mertes, PhD, AuD, CCC-A

Assistant Professor

Dept. of Speech and Hearing Science
University of Illinois Urbana-Champaign
208 Speech and Hearing Science Building

901 S. Sixth St. | M/C 482 | Champaign, IL 61820
217.300.4756 | imertes@xxxxxxxxxxxx
Dept. website: shs.illinois.edu | Lab website: hrl.shs.illinois.edu

Under the Illinois Freedom of Information Act any written communication to or from university employees regarding university business is a public record and may be subject to public disclosure.