[AUDITORY] Special Issue, Hearing Research - Predicting Speech Intelligibility ("Carney, Laurel" )


Subject: [AUDITORY] Special Issue, Hearing Research - Predicting Speech Intelligibility
From:    "Carney, Laurel"  <Laurel_Carney@xxxxxxxx>
Date:    Mon, 5 Dec 2022 02:24:41 +0000

--_000_PH0PR07MB8702F15D2AC056EB8CD4A711BA189PH0PR07MB8702namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear Colleagues, We are happy to announce that Hearing Research has just published a Spe= cial Issue on Predicting Speech Intelligibility - December 2022, Vol 426. The Authors, Titles, Abstracts and links to each paper are included below. Guest Editors, Torsten Dau and Laurel Carney Satyabrata Parida, Michael G. Heinz Underlying neural mechanisms of degrad= ed speech intelligibility following noise-induced hearing loss: The importa= nce of distorted tonotopy https://doi.org/10.1016/j.heares.2022.108586 Abstract: Listeners with sensorineural hearing loss (SNHL) have substantial= perceptual deficits, especially in noisy environments. Unfortunately, spee= ch-intelligibility models have limited success in predicting the performanc= e of listeners with hearing loss. A better understanding of the various sup= rathreshold factors that contribute to neural-coding degradations of speech= in noisy conditions will facilitate better modeling and clinical outcomes.= Here, we highlight the importance of one physiological factor that has rec= eived minimal attention to date, termed distorted tonotopy, which refers to= a disruption in the mapping between acoustic frequency and cochlear place = that is a hallmark of normal hearing. More so than commonly assumed factors= (e.g., threshold elevation, reduced frequency selectivity, diminished temp= oral coding), distorted tonotopy severely degrades the neural representatio= ns of speech (particularly in noise) in single- and across-fiber responses = in the auditory nerve following noise-induced hearing loss. Key results inc= lude: 1) effects of distorted tonotopy depend on stimulus spectral bandwidt= h and timbre, 2) distorted tonotopy increases across-fiber correlation and = thus reduces information capacity to the brain, and 3) its effects vary acr= oss etiologies, which may contribute to individual differences. These resul= ts motivate the development and testing of noninvasive measures that can as= sess the severity of distorted tonotopy in human listeners. The development= of such noninvasive measures of distorted tonotopy would advance precision= -audiological approaches to improving diagnostics and rehabilitation for li= steners with SNHL. Johannes Zaar, Laurel H. Carney Predicting speech intelligibility in heari= ng-impaired listeners using a physiologically inspired auditory model htt= ps://doi.org/10.1016/j.heares.2022.108553 Abstract: This study presents a major update and full evaluation of a speec= h intelligibility (SI) prediction model previously introduced by Scheidiger= , Carney, Dau, and Zaar [(2018), Acta Acust. United Ac. 104, 914-917]. The = model predicts SI in speech-in-noise conditions via comparison of the noisy= speech and the noise-alone reference. The two signals are processed throug= h a physiologically inspired nonlinear model of the auditory periphery, for= a range of characteristic frequencies (CFs), followed by a modulation anal= ysis in the range of the fundamental frequency of speech. The decision metr= ic of the model is the mean of a series of short-term, across-CF correlatio= ns between population responses to noisy speech and noise alone, with a sen= sitivity-limitation process imposed. The decision metric is assumed to be i= nversely related to SI and is converted to a percent-correct score using a = single data-based fitting function. The model performance was evaluated in = conditions of stationary, fluctuating, and speech-like interferers using se= ntence-based speech-reception thresholds (SRTs) previously obtained in 5 no= rmal-hearing (NH) and 13 hearing-impaired (HI) listeners. For the NH listen= er group, the model accurately predicted SRTs across the different acoustic= conditions (apart from a slight overestimation of the masking release obse= rved for fluctuating maskers), as well as plausible effects in response to = changes in presentation level. For HI listeners, the model was adjusted to = account for the individual audiograms using standard assumptions concerning= the amount of HI attributed to inner-hair-cell (IHC) and outer-hair-cell (= OHC) impairment. HI model results accounted remarkably well for elevated in= dividual SRTs and reduced masking release. Furthermore, plausible predictio= ns of worsened SI were obtained when the relative contribution of IHC impai= rment to HI was increased. Overall, the present model provides a useful too= l to accurately predict speech-in-noise outcomes in NH and HI listeners, an= d may yield important insights into auditory processes that are crucial for= speech understanding. Helia Rela=F1o-Iborra, Torsten Dau Speech intelligibility prediction based= on modulation frequency-selective processing https://doi.org/10.1016/j.he= ares.2022.108610 Abstract: Speech intelligibility models can provide insights regarding the = auditory processes involved in human speech perception and communication. O= ne successful approach to modelling speech intelligibility has been based o= n the analysis of the amplitude modulations present in speech as well as co= mpeting interferers. This review covers speech intelligibility models that = include a modulation-frequency selective processing stage i.e., a modulatio= n filterbank, as part of their front end. The speech-based envelope power s= pectrum model [sEPSM, J=F8rgensen and Dau (2011). J. Acoust. Soc. Am. 130(3= ), 1475-1487], several variants of the sEPSM including modifications with r= espect to temporal resolution, spectro-temporal processing and binaural pro= cessing, as well as the speech-based computational auditory signal processi= ng and perception model [sCASP; Rela=F1o-Iborra et al. (2019). J. Acoust. S= oc. Am. 146(5), 3306-3317], which is based on an established auditory signa= l detection and masking model, are discussed. The key processing stages of = these models for the prediction of speech intelligibility across a variety = of acoustic conditions are addressed in relation to competing modeling appr= oaches. The strengths and weaknesses of the modulation-based analysis are o= utlined and perspectives presented, particularly in connection with the cha= llenge of predicting the consequences of individual hearing loss on speech = intelligibility. Amin Edraki, Wai-Yip Chan, Jesper Jensen, Daniel Fogerty Spectro-temporal = modulation glimpsing for speech intelligibility prediction https://doi.org= /10.1016/j.heares.2022.108620 Abstract: We compare two alternative speech intelligibility prediction algo= rithms: time-frequency glimpse proportion (GP) and spectro-temporal glimpsi= ng index (STGI). Both algorithms hypothesize that listeners understand spee= ch in challenging acoustic environments by "glimpsing" partially available = information from degraded speech. GP defines glimpses as those time-frequen= cy regions whose local signal-to-noise ratio is above a certain threshold a= nd estimates intelligibility as the proportion of the time-frequency region= s glimpsed. STGI, on the other hand, applies glimpsing to the spectro-tempo= ral modulation (STM) domain and uses a similarity measure based on the norm= alized cross-correlation between the STM envelopes of the clean and degrade= d speech signals to estimate intelligibility as the proportion of the STM c= hannels glimpsed. Our experimental results demonstrate that STGI extends th= e notion of glimpsing proportion to a wider range of distortions, including= non-linear signal processing, and outperforms GP for the additive uncorrel= ated noise datasets we tested. Furthermore, the results show that spectro-t= emporal modulation analysis enables STGI to account for the effects of mask= er type on speech intelligibility, leading to superior performance over GP = in modulated noise datasets. Luna Prud'homme, Mathieu Lavandier, Virginia Best Investigating the role o= f harmonic cancellation in speech-on-speech masking https://doi.org/10.10= 16/j.heares.2022.108562 Abstract: This study investigated the role of harmonic cancellation in the = intelligibility of speech in "cocktail party" situations. While there is ev= idence that harmonic cancellation plays a role in the segregation of simple= harmonic sounds based on fundamental frequency (F0), its utility for mixtu= res of speech containing non-stationary F0s and unvoiced segments is unclea= r. Here we focused on the energetic masking of speech targets caused by com= peting speech maskers. Speech reception thresholds were measured using seve= n maskers: speech-shaped noise, monotonized and intonated harmonic complexe= s, monotonized speech, noise-vocoded speech, reversed speech and natural sp= eech. These maskers enabled an estimate of how the masking potential of spe= ech is influenced by harmonic structure, amplitude modulation and variation= s in F0 over time. Measured speech reception thresholds were compared to th= e predictions of two computational models, with and without a harmonic canc= ellation component. Overall, the results suggest a minor role of harmonic c= ancellation in reducing energetic masking in speech mixtures. Luna Prud'homme, Mathieu Lavandier, Virginia Best A dynamic binaural harmo= nic-cancellation model to predict speech intelligibility against a harmonic= masker varying in intonation, temporal envelope, and location https://do= i.org/10.1016/j.heares.2022.108535 Abstract: The aim of this study was to extend the harmonic-cancellation mod= el proposed by Prud'homme et al. [J. Acoust. Soc. Am. 148 (2020) 3246--3254= ] to predict speech intelligibility against a harmonic masker, so that it t= akes into account binaural hearing, amplitude modulations in the masker and= variations in masker fundamental frequency (F0) over time. This was done b= y segmenting the masker signal into time frames and combining the previous = long-term harmonic-cancellation model with the binaural model proposed by V= icente and Lavandier [Hear. Res. 390 (2020) 107937]. The new model was test= ed on the data from two experiments involving harmonic complex maskers that= varied in spatial location, temporal envelope and F0 contour. The interact= ions between the associated effects were accounted for in the model by vary= ing the time frame duration and excluding the binaural unmasking computatio= n when harmonic cancellation is active. Across both experiments, the correl= ation between data and model predictions was over 0.96, and the mean and la= rgest absolute prediction errors were lower than 0.6 and 1.5 dB, respective= ly. David H=FClsmeier, Birger Kollmeier How much individualization is require= d to predict the individual effect of suprathreshold processing deficits? A= ssessing Plomp's distortion component with psychoacoustic detection thresho= lds and FADE https://doi.org/10.1016/j.heares.2022.108609 Abstract: Plomp introduced an empirical separation of the increased speech = recognition thresholds (SRT) in listeners with a sensorineural hearing loss= into an Attenuation (A) component (which can be compensated by amplificati= on) and a non-compensable Distortion (D) component. Previous own research b= acked up this notion by speech recognition models that derive their SRT pre= diction from the individual audiogram with or without a psychoacoustic meas= ure of suprathreshold processing deficits. To determine the precision in se= parating the A and D component for the individual listener with various ind= ividual measures and individualized models, SRTs with 40 listeners with a v= ariation in hearing impairment were obtained in quiet, stationary noise, an= d fluctuating noise (ICRA 5-250 and babble). Both the clinical audiogram an= d an adaptive, precise sweep audiogram were obtained as well as tone-in-noi= se detection thresholds at four frequencies to characterize the individual = hearing impairment. For predicting the SRT, the FADE-model (which is based = on machine learning) was used with either of the two audiogram procedures a= nd optionally the individual tone-in-noise detection thresholds. The result= s indicate that the precisely measured swept tone audiogram allows for a mo= re precise prediction of the individual SRT in comparison to the clinical a= udiogram (RMS error of 4.3 dB vs. 6.4 dB, respectively). While an estimatio= n from the precise audiogram and FADE performed equally well in predicting = the individual A and D component, the further refinement of including the t= one-in-noise detection threshold with FADE led to a slight improvement of p= rediction accuracy (RMS error of 3.3 dB, 4.6 dB and 1.4 dB, for SRT, A and = D component, respectively). Hence, applying FADE is advantageous for scient= ific purposes where a consistent modeling of different psychoacoustical eff= ects in the same listener with a minimum amount of assumptions is desirable= . For clinical purposes, however, a precisely measured audiogram and an est= imation of the expected D component using a linear regression appears to be= a satisfactory first step towards precision audiology. Jan Rennies, Saskia R=F6ttges, Rainer Huber, Christopher F. Hauth, Thomas B= rand A joint framework for blind prediction of binaural speech intelligibi= lity and perceived listening effort https://doi.org/10.1016/j.heares.2022.1= 08598 Abstract: Speech perception is strongly affected by noise and reverberation= in the listening room, and binaural processing can substantially facilitat= e speech perception in conditions when target speech and maskers originate = from different directions. Most studies and proposed models for predicting = spatial unmasking have focused on speech intelligibility. The present study= introduces a model framework that predicts both speech intelligibility and= perceived listening effort from the same output measure. The framework is = based on a combination of a blind binaural processing stage employing a bli= nd equalization cancelation (EC) mechanism, and a blind backend based on ph= oneme probability classification. Neither frontend nor backend require any = additional information, such as the source directions, the signal-to-noise = ratio (SNR), or the number of sources, allowing for a fully blind perceptua= l assessment of binaural input signals consisting of target speech mixed wi= th noise. The model is validated against a recent data set in which speech = intelligibility and perceived listening effort were measured for a range of= acoustic conditions differing in reverberation and binaural cues [Rennies = and Kidd (2018), J. Acoust. Soc. Am. 144, 2147-2159]. Predictions of the pr= oposed model are compared with a non-blind binaural model consisting of a n= on-blind EC stage and a backend based on the speech intelligibility index. = The analyses indicated that all main trends observed in the experiments wer= e correctly predicted by the blind model. The overall proportion of varianc= e explained by the model (R=B2 =3D 0.94) for speech intelligibility was sli= ghtly worse than for the non-blind model (R=B2 =3D 0.98). For listening eff= ort predictions, both models showed lower prediction accuracy, but still ex= plained significant proportions of the observed variance (R=B2 =3D 0.88 and= R=B2 =3D 0.71 for the non-blind and blind model, respectively). Closer ins= pection showed that the differences between data and predictions were large= st for binaural conditions at high SNRs, where the perceived listening effo= rt of human listeners tended to be underestimated by the models, specifical= ly by the blind version. James M. Kates, Kathryn H. Arehart An overview of the HASPI and HASQI metr= ics for predicting speech intelligibility and speech quality for normal hea= ring, hearing loss, and hearing aids https://doi.org/10.1016/j.heares.202= 2.108608 Abstract: Alterations of the speech signal, including additive noise and no= nlinear distortion, can reduce speech intelligibility and quality. Hearing = aids present an especially complicated situation since these devices may im= plement nonlinear processing designed to compensate for the hearing loss. H= earing-aid processing is often realized as time-varying multichannel gain a= djustments, and may also include frequency reassignment. The challenge in d= esigning metrics for hearing aids and hearing-impaired listeners is to accu= rately model the perceptual trade-offs between speech audibility and the no= nlinear distortion introduced by hearing-aid processing. This paper focuses= on the Hearing Aid Speech Perception Index (HASPI) and the Hearing Aid Spe= ech Quality Index (HASQI) as representative metrics for predicting intellig= ibility and quality. These indices start with a model of the auditory perip= hery that can be adjusted to represent hearing loss. The peripheral model, = the speech features computed from the model outputs, and the procedures use= d to fit the features to subject data are described. Examples are then pres= ented for using the metrics to measure the effects of additive noise, evalu= ate noise-suppression processing, and to measure the differences among comm= ercial hearing aids. Open questions and considerations in using these and r= elated metrics are then discussed. Marlies Gillis, Jana Van Canneyt, Tom Francart, Jonas Vanthornhout Neural= tracking as a diagnostic tool to assess the auditory pathway https://doi= .org/10.1016/j.heares.2022.108607 Abstract: When a person listens to sound, the brain time-locks to specific = aspects of the sound. This is called neural tracking and it can be investig= ated by analysing neural responses (e.g., measured by electroencephalograph= y) to continuous natural speech. Measures of neural tracking allow for an o= bjective investigation of a range of auditory and linguistic processes in t= he brain during natural speech perception. This approach is more ecological= ly valid than traditional auditory evoked responses and has great potential= for research and clinical applications. This article reviews the neural tr= acking framework and highlights three prominent examples of neural tracking= analyses: neural tracking of the fundamental frequency of the voice (f0), = the speech envelope and linguistic features. Each of these analyses provide= s a unique point of view into the human brain's hierarchical stages of spee= ch processing. F0-tracking assesses the encoding of fine temporal informati= on in the early stages of the auditory pathway, i.e., from the auditory per= iphery up to early processing in the primary auditory cortex. Envelope trac= king reflects bottom-up and top-down speech-related processes in the audito= ry cortex and is likely necessary but not sufficient for speech intelligibi= lity. Linguistic feature tracking (e.g. word or phoneme surprisal) relates = to neural processes more directly related to speech intelligibility. Togeth= er these analyses form a multi-faceted objective assessment of an individua= l's auditory and linguistic processing. Mahdie Karbasi, Dorothea Kolossa ASR-based speech intelligibility predicti= on: A review https://doi.org/10.1016/j.heares.2022.108606 Abstract: Various types of methods and approaches are available to predict = the intelligibility of speech signals, but many of these still suffer from = two major problems: first, their required prior knowledge, which itself cou= ld limit the applicability and lower the objectivity of the method, and sec= ond, a low generalization capacity, e.g. across noise types, degradation co= nditions, and speech material. Automatic speech recognition (ASR) has been = suggested as a machine-learning-based component of speech intelligibility p= rediction (SIP), aiming to ameliorate the shortcomings of other SIP methods= . Since their first introduction, ASR-based SIP approaches have been develo= ping at an increasingly rapid pace, were deployed in a range of contexts, a= nd have shown promising performance in many scenarios. Our article provides= an overview of this body of research. The main differences between competi= ng methods are highlighted and their benefits are explained next to their l= imitations. We conclude with an outlook on future work and new related dire= ctions. Torsten Dau tdau@xxxxxxxx<mailto:tdau@xxxxxxxx> Laurel H. Carney Laurel_Carney@xxxxxxxx<mailto:Laurel_Carney@xxxxxxxx= .Rochester.edu> --_000_PH0PR07MB8702F15D2AC056EB8CD4A711BA189PH0PR07MB8702namp_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-= 1"> <meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)"> <style><!-- /* Font Definitions */ @xxxxxxxx {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @xxxxxxxx {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:11.0pt; font-family:"Calibri",sans-serif;} a:link, span.MsoHyperlink {mso-style-priority:99; color:#0563C1; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:#954F72; text-decoration:underline;} span.EmailStyle17 {mso-style-type:personal-compose; font-family:"Calibri",sans-serif; color:windowtext;} .MsoChpDefault {mso-style-type:export-only; font-family:"Calibri",sans-serif;} @xxxxxxxx WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1 {page:WordSection1;} --></style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext=3D"edit"> <o:idmap v:ext=3D"edit" data=3D"1" /> </o:shapelayout></xml><![endif]--> </head> <body lang=3D"EN-US" link=3D"#0563C1" vlink=3D"#954F72"> <div class=3D"WordSection1"> <p class=3D"MsoNormal">Dear Colleagues,<o:p></o:p></p> <p class=3D"MsoNormal">&nbsp;&nbsp;&nbsp; We are happy to announce that <i>= Hearing Research</i> has just published a <b>Special Issue on Predicting Speech Intelligibility </b>- December 2022, = Vol 426.&nbsp; <o:p></o:p></p> <p class=3D"MsoNormal">The Authors, Titles, Abstracts and links to each pap= er are included below.<br> Guest Editors, Torsten Dau and Laurel Carney<o:p></o:p></p> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p> <p class=3D"MsoNormal"><b>Satyabrata Parida, Michael G. Heinz&nbsp; <i>Unde= rlying neural mechanisms of degraded speech intelligibility following noise= -induced hearing loss: The importance of distorted tonotopy </i></b><a href=3D"https://doi.org/10.1016/j.heares.2022.108586">https://do= i.org/10.1016/j.heares.2022.108586</a> <b><i><br> </i></b>Abstract: Listeners with sensorineural hearing loss (SNHL) have sub= stantial perceptual deficits, especially in noisy environments. Unfortunate= ly, speech-intelligibility models have limited success in predicting the pe= rformance of listeners with hearing loss. A better understanding of the various suprathreshold factors that co= ntribute to neural-coding degradations of speech in noisy conditions will f= acilitate better modeling and clinical outcomes. Here, we highlight the imp= ortance of one physiological factor that has received minimal attention to date, termed distorted tonotopy, wh= ich refers to a disruption in the mapping between acoustic frequency and co= chlear place that is a hallmark of normal hearing. More so than commonly as= sumed factors (e.g., threshold elevation, reduced frequency selectivity, diminished temporal coding), distorted tono= topy severely degrades the neural representations of speech (particularly i= n noise) in single- and across-fiber responses in the auditory nerve follow= ing noise-induced hearing loss. Key results include: 1) effects of distorted tonotopy depend on stimulus s= pectral bandwidth and timbre, 2) distorted tonotopy increases across-fiber = correlation and thus reduces information capacity to the brain, and 3) its = effects vary across etiologies, which may contribute to individual differences. These results motivate the= development and testing of noninvasive measures that can assess the severi= ty of distorted tonotopy in human listeners. The development of such noninv= asive measures of distorted tonotopy would advance precision-audiological approaches to improving diagnostics a= nd rehabilitation for listeners with SNHL.<o:p></o:p></p> <p class=3D"MsoNormal"><b><o:p>&nbsp;</o:p></b></p> <p class=3D"MsoNormal"><b>Johannes Zaar, Laurel H. Carney&nbsp; <i>Predicti= ng speech intelligibility in hearing-impaired listeners using a physiologic= ally inspired auditory model</i>&nbsp;&nbsp; </b><a href=3D"https://doi.org/10.1016/j.heares.2022.108553">https://doi.or= g/10.1016/j.heares.2022.108553</a> <b><br> </b>Abstract: This study presents a major update and full evaluation of a s= peech intelligibility (SI) prediction model previously introduced by Scheid= iger, Carney, Dau, and Zaar [(2018), Acta Acust. United Ac. 104, 914-917]. = The model predicts SI in speech-in-noise conditions via comparison of the noisy speech and the noise-alone referenc= e. The two signals are processed through a physiologically inspired nonline= ar model of the auditory periphery, for a range of characteristic frequenci= es (CFs), followed by a modulation analysis in the range of the fundamental frequency of speech. The decision= metric of the model is the mean of a series of short-term, across-CF corre= lations between population responses to noisy speech and noise alone, with = a sensitivity-limitation process imposed. The decision metric is assumed to be inversely related to SI and = is converted to a percent-correct score using a single data-based fitting f= unction. The model performance was evaluated in conditions of stationary, f= luctuating, and speech-like interferers using sentence-based speech-reception thresholds (SRTs) previously obtaine= d in 5 normal-hearing (NH) and 13 hearing-impaired (HI) listeners. For the = NH listener group, the model accurately predicted SRTs across the different= acoustic conditions (apart from a slight overestimation of the masking release observed for fluctuating ma= skers), as well as plausible effects in response to changes in presentation= level. For HI listeners, the model was adjusted to account for the individ= ual audiograms using standard assumptions concerning the amount of HI attributed to inner-hair-cell (IHC) and outer-= hair-cell (OHC) impairment. HI model results accounted remarkably well for = elevated individual SRTs and reduced masking release. Furthermore, plausibl= e predictions of worsened SI were obtained when the relative contribution of IHC impairment to HI was increa= sed. Overall, the present model provides a useful tool to accurately predic= t speech-in-noise outcomes in NH and HI listeners, and may yield important = insights into auditory processes that are crucial for speech understanding.<b><o:p></o:p></b></p> <p class=3D"MsoNormal"><b><o:p>&nbsp;</o:p></b></p> <p class=3D"MsoNormal"><b>Helia Rela=F1o-Iborra, Torsten Dau&nbsp; <i>Speec= h intelligibility prediction based on modulation frequency-selective proces= sing&nbsp; </i></b><a href=3D"https://doi.org/10.1016/j.heares.2022.108610">https://do= i.org/10.1016/j.heares.2022.108610</a> <b><i><br> </i></b>Abstract: Speech intelligibility models can provide insights regard= ing the auditory processes involved in human speech perception and communic= ation. One successful approach to modelling speech intelligibility has been= based on the analysis of the amplitude modulations present in speech as well as competing interferers. This revie= w covers speech intelligibility models that include a modulation-frequency = selective processing stage i.e., a modulation filterbank, as part of their = front end. The speech-based envelope power spectrum model [sEPSM, J=F8rgensen and Dau (2011). J. Acoust. Soc. A= m. 130(3), 1475-1487], several variants of the sEPSM including modification= s with respect to temporal resolution, spectro-temporal processing and bina= ural processing, as well as the speech-based computational auditory signal processing and perception model [sCASP; Rela= =F1o-Iborra et&nbsp;al. (2019). J. Acoust. Soc. Am. 146(5), 3306&#8211;3317= ], which is based on an established auditory signal detection and masking m= odel, are discussed. The key processing stages of these models for the prediction of speech intelligibility across a vari= ety of acoustic conditions are addressed in relation to competing modeling = approaches. The strengths and weaknesses of the modulation-based analysis a= re outlined and perspectives presented, particularly in connection with the challenge of predicting the consequenc= es of individual hearing loss on speech intelligibility.<b><i><o:p></o:p></= i></b></p> <p class=3D"MsoNormal"><b><o:p>&nbsp;</o:p></b></p> <p class=3D"MsoNormal"><b>Amin Edraki, Wai-Yip Chan, Jesper Jensen, Daniel = Fogerty&nbsp; <i>Spectro-temporal modulation glimpsing for speech intelligibility predict= ion&nbsp; </i> </b><a href=3D"https://doi.org/10.1016/j.heares.2022.108620">https://doi.or= g/10.1016/j.heares.2022.108620</a> <b><i>&nbsp;<br> </i></b>Abstract: We compare two alternative speech intelligibility predict= ion algorithms: time-frequency glimpse proportion (GP) and spectro-temporal= glimpsing index (STGI). Both algorithms hypothesize that listeners underst= and speech in challenging acoustic environments by &#8220;glimpsing&#8221; partially available information fr= om degraded speech. GP defines glimpses as those time-frequency regions who= se local signal-to-noise ratio is above a certain threshold and estimates i= ntelligibility as the proportion of the time-frequency regions glimpsed. STGI, on the other hand, applies glimpsing to the spectr= o-temporal modulation (STM) domain and uses a similarity measure based on t= he normalized cross-correlation between the STM envelopes of the clean and = degraded speech signals to estimate intelligibility as the proportion of the STM channels glimpsed. Our experi= mental results demonstrate that STGI extends the notion of glimpsing propor= tion to a wider range of distortions, including non-linear signal processin= g, and outperforms GP for the additive uncorrelated noise datasets we tested. Furthermore, the results show that = spectro-temporal modulation analysis enables STGI to account for the effect= s of masker type on speech intelligibility, leading to superior performance= over GP in modulated noise datasets.<o:p></o:p></p> <p class=3D"MsoNormal"><b><o:p>&nbsp;</o:p></b></p> <p class=3D"MsoNormal"><b>Luna Prud&#8217;homme, Mathieu Lavandier, Virgini= a Best&nbsp; <i>Investigating the role of harmonic cancellation in speech-o= n-speech masking&nbsp;&nbsp; </i></b><a href=3D"https://doi.org/10.1016/j.heares.2022.108562">https://do= i.org/10.1016/j.heares.2022.108562</a> <b><i><br> </i></b>Abstract: This study investigated the role of harmonic cancellation= in the intelligibility of speech in &#8220;cocktail party&#8221; situation= s. While there is evidence that harmonic cancellation plays a role in the s= egregation of simple harmonic sounds based on fundamental frequency (F0), its utility for mixtures of speech containing = non-stationary F0s and unvoiced segments is unclear. Here we focused on the= energetic masking of speech targets caused by competing speech maskers. Sp= eech reception thresholds were measured using seven maskers: speech-shaped noise, monotonized and intonated harmon= ic complexes, monotonized speech, noise-vocoded speech, reversed speech and= natural speech. These maskers enabled an estimate of how the masking poten= tial of speech is influenced by harmonic structure, amplitude modulation and variations in F0 over time. M= easured speech reception thresholds were compared to the predictions of two= computational models, with and without a harmonic cancellation component. = Overall, the results suggest a minor role of harmonic cancellation in reducing energetic masking in speech mixt= ures.<b><i><o:p></o:p></i></b></p> <p class=3D"MsoNormal"><b><o:p>&nbsp;</o:p></b></p> <p class=3D"MsoNormal"><b>Luna Prud&#8217;homme, Mathieu Lavandier, Virgini= a Best&nbsp; <i>A dynamic binaural harmonic-cancellation model to predict s= peech intelligibility against a harmonic masker varying in intonation, temp= oral envelope, and location&nbsp;&nbsp; </i></b><a href=3D"https://doi.org/10.1016/j.heares.2022.108535">https://do= i.org/10.1016/j.heares.2022.108535</a> <b><i><br> </i></b>Abstract: The aim of this study was to extend the harmonic-cancella= tion model proposed by Prud&#8217;homme et&nbsp;al. [J. Acoust. Soc. Am. 14= 8 (2020) 3246-&#8211;3254] to predict speech intelligibility against a harm= onic masker, so that it takes into account binaural hearing, amplitude modulations in the masker and variations in masker fund= amental frequency (F0) over time. This was done by segmenting the masker si= gnal into time frames and combining the previous long-term harmonic-cancell= ation model with the binaural model proposed by Vicente and Lavandier [Hear. Res. 390 (2020) 107937]. The new = model was tested on the data from two experiments involving harmonic comple= x maskers that varied in spatial location, temporal envelope and F0 contour= . The interactions between the associated effects were accounted for in the model by varying the time frame duration= and excluding the binaural unmasking computation when harmonic cancellatio= n is active. Across both experiments, the correlation between data and mode= l predictions was over 0.96, and the mean and largest absolute prediction errors were lower than 0.6 and 1.= 5&nbsp;dB, respectively.<o:p></o:p></p> <p class=3D"MsoNormal"><b><o:p>&nbsp;</o:p></b></p> <p class=3D"MsoNormal"><b>David H=FClsmeier, Birger Kollmeier&nbsp;&nbsp; <= i>How much individualization is required to predict the individual effect o= f suprathreshold processing deficits? Assessing Plomp's distortion componen= t with psychoacoustic detection thresholds and FADE&nbsp; </i></b><a href=3D"https://doi.org/10.1016/j.heares.2022.108609= ">https://doi.org/10.1016/j.heares.2022.108609</a> <br> Abstract: Plomp introduced an empirical separation of the increased speech = recognition thresholds (SRT) in listeners with a sensorineural hearing loss= into an Attenuation (A) component (which can be compensated by amplificati= on) and a non-compensable Distortion (D) component. Previous own research backed up this notion by speech recog= nition models that derive their SRT prediction from the individual audiogra= m with or without a psychoacoustic measure of suprathreshold processing def= icits. To determine the precision in separating the A and D component for the individual listener with vario= us individual measures and individualized models, SRTs with 40 listeners wi= th a variation in hearing impairment were obtained in quiet, stationary noi= se, and fluctuating noise (ICRA 5&#8211;250 and babble). Both the clinical audiogram and an adaptive, prec= ise sweep audiogram were obtained as well as tone-in-noise detection thresh= olds at four frequencies to characterize the individual hearing impairment.= For predicting the SRT, the FADE-model (which is based on machine learning) was used with either of the two audio= gram procedures and optionally the individual tone-in-noise detection thres= holds. The results indicate that the precisely measured swept tone audiogra= m allows for a more precise prediction of the individual SRT in comparison to the clinical audiogram (RMS error o= f 4.3&nbsp;dB vs. 6.4&nbsp;dB, respectively). While an estimation from the = precise audiogram and FADE performed equally well in predicting the individ= ual A and D component, the further refinement of including the tone-in-noise detection threshold with FADE led to a slig= ht improvement of prediction accuracy (RMS error of 3.3&nbsp;dB, 4.6&nbsp;d= B and 1.4&nbsp;dB, for SRT, A and D component, respectively). Hence, applyi= ng FADE is advantageous for scientific purposes where a consistent modeling of different psychoacoustical effects in the s= ame listener with a minimum amount of assumptions is desirable. For clinica= l purposes, however, a precisely measured audiogram and an estimation of th= e expected D component using a linear regression appears to be a satisfactory first step towards precision audio= logy.<b><o:p></o:p></b></p> <p class=3D"MsoNormal"><b><o:p>&nbsp;</o:p></b></p> <p class=3D"MsoNormal"><b>Jan Rennies, Saskia R=F6ttges, Rainer Huber, Chri= stopher F. Hauth, Thomas Brand&nbsp; <i>A joint framework for blind prediction of binaural speech intelligibilit= y and perceived listening effort </i></b><a href=3D"https://doi.org/10.1016/j.heares.2022.108598">https://do= i.org/10.1016/j.heares.2022.108598</a> <b><i><br> </i></b>Abstract: Speech perception is strongly affected by noise and rever= beration in the listening room, and binaural processing can substantially f= acilitate speech perception in conditions when target speech and maskers or= iginate from different directions. Most studies and proposed models for predicting spatial unmasking have foc= used on speech intelligibility. The present study introduces a model framew= ork that predicts both speech intelligibility and perceived listening effor= t from the same output measure. The framework is based on a combination of a blind binaural processing sta= ge employing a blind equalization cancelation (EC) mechanism, and a blind b= ackend based on phoneme probability classification. Neither frontend nor ba= ckend require any additional information, such as the source directions, the signal-to-noise ratio (SNR), or the num= ber of sources, allowing for a fully blind perceptual assessment of binaura= l input signals consisting of target speech mixed with noise. The model is = validated against a recent data set in which speech intelligibility and perceived listening effort were me= asured for a range of acoustic conditions differing in reverberation and bi= naural cues [Rennies and Kidd (2018), J.&nbsp;Acoust. Soc. Am. 144, 2147-21= 59]. Predictions of the proposed model are compared with a non-blind binaural model consisting of a non-blind EC = stage and a backend based on the speech intelligibility index. The analyses= indicated that all main trends observed in the experiments were correctly = predicted by the blind model. The overall proportion of variance explained by the model (R=B2&nbsp;=3D&nbsp;= 0.94) for speech intelligibility was slightly worse than for the non-blind = model (R=B2&nbsp;=3D&nbsp;0.98). For listening effort predictions, both mod= els showed lower prediction accuracy, but still explained significant proportions of the observed variance (R=B2&nbsp;=3D&nbsp;0.88 and R=B2&nbs= p;=3D&nbsp;0.71 for the non-blind and blind model, respectively). Closer in= spection showed that the differences between data and predictions were larg= est for binaural conditions at high SNRs, where the perceived listening effort of human listeners tended to be underestimated by the mod= els, specifically by the blind version.<b><i><o:p></o:p></i></b></p> <p class=3D"MsoNormal"><b><o:p>&nbsp;</o:p></b></p> <p class=3D"MsoNormal"><b>James M. Kates, Kathryn H. Arehart&nbsp; <i>An ov= erview of the HASPI and HASQI metrics for predicting speech intelligibility= and speech quality for normal hearing, hearing loss, and hearing aids&nbsp= ;&nbsp; </i></b><a href=3D"https://doi.org/10.1016/j.heares.2022.108608">https://do= i.org/10.1016/j.heares.2022.108608</a> <b><i><br> </i></b>Abstract: Alterations of the speech signal, including additive nois= e and nonlinear distortion, can reduce speech intelligibility and quality. = Hearing aids present an especially complicated situation since these device= s may implement nonlinear processing designed to compensate for the hearing loss. Hearing-aid processing is oft= en realized as time-varying multichannel gain adjustments, and may also inc= lude frequency reassignment. The challenge in designing metrics for hearing= aids and hearing-impaired listeners is to accurately model the perceptual trade-offs between speech audibility= and the nonlinear distortion introduced by hearing-aid processing. This pa= per focuses on the Hearing Aid Speech Perception Index (HASPI) and the Hear= ing Aid Speech Quality Index (HASQI) as representative metrics for predicting intelligibility and quality. Thes= e indices start with a model of the auditory periphery that can be adjusted= to represent hearing loss. The peripheral model, the speech features compu= ted from the model outputs, and the procedures used to fit the features to subject data are described. Exa= mples are then presented for using the metrics to measure the effects of ad= ditive noise, evaluate noise-suppression processing, and to measure the dif= ferences among commercial hearing aids. Open questions and considerations in using these and related metrics= are then discussed.<o:p></o:p></p> <p class=3D"MsoNormal"><b><o:p>&nbsp;</o:p></b></p> <p class=3D"MsoNormal"><b>Marlies Gillis, Jana Van Canneyt, Tom Francart, J= onas Vanthornhout&nbsp;&nbsp; <i>Neural tracking as a diagnostic tool to assess the auditory pathway&nbsp= ;&nbsp; </i></b><a href=3D"https://doi.org/10.1016/j.heares.2022.108607">ht= tps://doi.org/10.1016/j.heares.2022.108607</a> <b><i><br> </i></b>Abstract: When a person listens to sound, the brain time-locks to s= pecific aspects of the sound. This is called neural tracking and it can be = investigated by analysing neural responses (e.g., measured by electroenceph= alography) to continuous natural speech. Measures of neural tracking allow for an objective investigation o= f a range of auditory and linguistic processes in the brain during natural = speech perception. This approach is more ecologically valid than traditiona= l auditory evoked responses and has great potential for research and clinical applications. This article r= eviews the neural tracking framework and highlights three prominent example= s of neural tracking analyses: neural tracking of the fundamental frequency= of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of= view into the human brain&#8217;s hierarchical stages of speech processing= . F0-tracking assesses the encoding of fine temporal information in the ear= ly stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory= cortex. Envelope tracking reflects bottom-up and top-down speech-related p= rocesses in the auditory cortex and is likely necessary but not sufficient = for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural proces= ses more directly related to speech intelligibility. Together these analyse= s form a multi-faceted objective assessment of an individual&#8217;s audito= ry and linguistic processing.<b><i><o:p></o:p></i></b></p> <p class=3D"MsoNormal"><b><o:p>&nbsp;</o:p></b></p> <p class=3D"MsoNormal"><b>Mahdie Karbasi, Dorothea Kolossa&nbsp; <i>ASR-bas= ed speech intelligibility prediction: A review</i> </b><a href=3D"https://doi.org/10.1016/j.heares.2022.108606">https://doi.or= g/10.1016/j.heares.2022.108606</a> <b><br> </b>Abstract: Various types of methods and approaches are available to pred= ict the intelligibility of speech signals, but many of these still suffer f= rom two major problems: first, their required prior knowledge, which itself= could limit the applicability and lower the objectivity of the method, and second, a low generalization capa= city, e.g.&nbsp;across noise types, degradation conditions, and speech mate= rial. Automatic speech recognition (ASR) has been suggested as a machine-le= arning-based component of speech intelligibility prediction (SIP), aiming to ameliorate the shortcomings of other SIP metho= ds. Since their first introduction, ASR-based SIP approaches have been deve= loping at an increasingly rapid pace, were deployed in a range of contexts,= and have shown promising performance in many scenarios. Our article provides an overview of this body of resear= ch. The main differences between competing methods are highlighted and thei= r benefits are explained next to their limitations. We conclude with an out= look on future work and new related directions.<b><o:p></o:p></b></p> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p> <p class=3D"MsoNormal">Torsten Dau&nbsp; <a href=3D"mailto:tdau@xxxxxxxx">tda= u@xxxxxxxx</a> <o:p> </o:p></p> <p class=3D"MsoNormal">Laurel H. Carney <a href=3D"mailto:Laurel_Carney@xxxxxxxx= C.Rochester.edu"> Laurel_Carney@xxxxxxxx</a> <o:p></o:p></p> <p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p> </div> </body> </html> --_000_PH0PR07MB8702F15D2AC056EB8CD4A711BA189PH0PR07MB8702namp_--


This message came from the mail archive
src/postings/2022/
maintained by:
DAn Ellis <dpwe@ee.columbia.edu>
Electrical Engineering Dept., Columbia University