Re: [AUDITORY] Request for objective evaluation models based on temporal envelope

Subject: Re: [AUDITORY] Request for objective evaluation models based on temporal envelope

From: Raul Sanchez Lopez <0000014fb8251444-dmarc-request@xxxxxxxxxxxxxxx>

Date: Thu, 15 Apr 2021 06:22:12 +0000

Accept-language: en-GB, en-US, da-DK

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 192.38.82.195) smtp.rcpttodomain=lists.mcgill.ca smtp.mailfrom=dtu.dk; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=dtu.dk; dkim=none (message not signed); arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+o8RUUuGuKP5656AkWLwg0We0muCJZ1WhMLcK7F827s=; b=QHhSQRu/k6THvWDAsiCwotCZDNAWPeBstaAqZS4z09BJ++jo6NCu9i5r3QscVHic/lDpulr0zs3GgFzi6pdwx2qH8Jje4mZ9uKrGM3zCz9ionBby3/KHMfjIhEwsqeSC9NwVljPb0gI5q2APyLqfYZlh7jVYhrF9oOp7F75gsS++ZuKZzFMWFuw7hNbINcB6UWeT++z9HRbJE4N2cXsQOpBiwQoMfrrNLOVQuGScFyLnF1oIzFjSdLHZHih+WBu2YgBkJjeqEBwDS+f5unClAUWODhUCzsCJwA6jorCPCRvL2OAW+IBRmQ2aYoZZMlPwgYPRtllSO+tHzApDvIo5eg==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=T3PDHWAKfoFmrieoD7500rtPSvOPwViG8lL077u5+ZZ1frhCsc3cBprso1f3QEh8K3MAJsjAZQebSmxwwQel+nPeWkzDrv4BWIa5zno6M1mAZquwVBytsYCfx6+SsQMJ+L1jxJiXc1N5HdcXp/CaQrKkKsgtLtM4CB3/sF9qqcob2CE9RhdXQCosXytaMMPDk4e3NryGWXcyeAMyqJSOjlCxs+SWYiJdUzLaijjCIOoTzcO2A5ecZ312wg14DPHatHBIG0a7SGa0oGsbRso5QzhcHcHzkcA9lwznaOjtouko8K1OigQ1ixSwmiPIrz/VUzQQM+aBxi1FPy+sWlJ8nw==

Authentication-results: mx.google.com; arc=fail (signature failed); spf=pass (google.com: domain of owner-auditory@xxxxxxxxxxxxxxx designates 132.206.27.104 as permitted sender) smtp.mailfrom=owner-auditory@xxxxxxxxxxxxxxx; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mcgill.ca

Comments: To: Sohhom Bandyopadhyay <sohhom.bandyopadhyay@xxxxxxxxxxx>

Delivered-to: dan.ellis@xxxxxxxxx

In-reply-to: <CAM-3dAk8m4eLYdSE7RmVqJY2yuVc5sJv8a51E3O5-zY7Rz7DDQ@mail.gmail.com>

List-archive: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>

List-help: <https://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO%20AUDITORY>

List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>

List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>

List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>

References: <CAM-3dAk8m4eLYdSE7RmVqJY2yuVc5sJv8a51E3O5-zY7Rz7DDQ@mail.gmail.com>

Reply-to: Raul Sanchez Lopez <rsalo@xxxxxx>

Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>

Thread-index: AQHXMa4goR6Xdl1kkEWLW7+1MALMTaq1FVF8

Thread-topic: [AUDITORY] Request for objective evaluation models based on temporal envelope

Hej Sohhom

You can look into the "family" of the Envelope Power Spectrum models (at DTU):

Energy-based SNRenv
Dau, T., & Jo/rgensen, S. (2011). Predicting speech intelligibility based on the envelope power signal‐to‐noise ratio after modulation‐frequency selective processing. Journal of the Acoustical Society of America. Acoustical Society of America. https://doi.org/10.1121/1.3587737

Jørgensen, S., Ewert, S. D., & Dau, T. (2013). A multi-resolution envelope-power based model for speech intelligibility. Journal of the Acoustical Society of America, 134(1), 436–446. https://doi.org/10.1121/1.4807563

Also Binaural
Chabot-Leclerc, A., MacDonald, E., & Dau, T. (2016). Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain. Journal of the Acoustical Society of America, 140(1), 192–205. https://doi.org/10.1121/1.4954254

Or correlation-based preditions
Relaño-Iborra, H., Chabot-Leclerc, A., Scheidiger, C., Zaar, J., & Dau, T. (2017). The speech-based envelope power spectrum model (sEPSM) family: Development, achievements, and current challenges. Journal of the Acoustical Society of America, 141(5), 3970–3970. https://doi.org/10.1121/1.4989047

For consonant recognition
Zaar, J., & Dau, T. (2018). Predicting consonant recognition and confusions using a microscopic speech perception model. Journal of the Acoustical Society of America, 141(5), 3633–3633. https://doi.org/10.1121/1.4987824

Or even a more complex front end to model hearing impairments
Relaño-Iborra, H., Zaar, J., & Dau, T. (2019). A speech-based computational auditory signal processing and perception model. Journal of the Acoustical Society of America, 146(5), 3306–3317. https://doi.org/10.1121/1.5129114

Also, the work from Biberger and colleagues (Oldenburg) where there is also quality predictions in "Generalized" power spectrum model
Biberger, T., & Ewert, S. D. (2016). Envelope and intensity based prediction of psychoacoustic masking and speech intelligibility. Journal of the Acoustical Society of America, 140(2), 1023–1038. https://doi.org/10.1121/1.4960574

Most recent: Biberger, T., Schepker, H., Denk, F., & Ewert, S. D. (2021). Instrumental Quality Predictions and Analysis of Auditory Cues for Algorithms in Modern Headphone Technology. Trends in Hearing, 25, 23312165211001219. https://doi.org/10.1177/23312165211001219

Some implementations here: http://amtoolbox.sourceforge.net/models.php or contacting the authors

Best

From: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx> on behalf of Sohhom Bandyopadhyay <sohhom.bandyopadhyay@xxxxxxxxxxx>
Sent: 14 April 2021 13:43:52
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: [AUDITORY] Request for objective evaluation models based on temporal envelope

Dear list,

I am looking for objective quality or intelligibility models (general audio or speech) that take into account the temporal envelope of the signal(s). Both intrusive and non-intrusive models are welcome.

Two examples of such models are:

* Falk, T. H., Zheng, C., & Chan, W. Y. (2010). A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), 1766-1774.

(implementation: https://github.com/MuSAELab/SRMRToolbox)

* van de Par, S., Disch, S., Niedermeier, A., Burdiel Pérez, E., & Edler, B. (2019, October). Temporal Envelope-Based Psychoacoustic Modelling for Evaluating Non-Waveform Preserving Audio Codecs. In Audio Engineering Society Convention 147. Audio Engineering Society.

(implementation not available)

Would really prefer models that have publicly available implementations, or it is available upon request from the authors. Please let me know if you know of any such work.

Thanks and regards

Sohhom

Sohhom Bandyopadhyay

PhD Scholar | Center for Cognitive Science

Indian Institute of Technology Gandhinagar

http://cogs.iitgn.ac.in/member/sohhom-bandyopadhyay/