Hej Sohhom
You can look into the "family" of the Envelope Power Spectrum models (at DTU):
Energy-based SNRenv
Dau, T., & Jo/rgensen, S. (2011). Predicting speech intelligibility based on the envelope power signal‐to‐noise ratio after modulation‐frequency
selective processing. Journal of the Acoustical Society of America. Acoustical Society of America.
https://doi.org/10.1121/1.3587737
Jørgensen,
S., Ewert, S. D., & Dau, T. (2013). A multi-resolution envelope-power based model for speech intelligibility. Journal of the Acoustical Society of America, 134(1), 436–446.
https://doi.org/10.1121/1.4807563
Also Binaural
Chabot-Leclerc, A., MacDonald, E., & Dau, T. (2016). Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope
power spectrum domain. Journal of the Acoustical Society of America, 140(1), 192–205.
https://doi.org/10.1121/1.4954254
Or
correlation-based preditions
Relaño-Iborra, H., Chabot-Leclerc, A., Scheidiger, C., Zaar, J., & Dau, T. (2017). The speech-based envelope power spectrum model (sEPSM) family:
Development, achievements, and current challenges. Journal of the Acoustical Society of America, 141(5), 3970–3970. https://doi.org/10.1121/1.4989047
For consonant recognition
Zaar, J., & Dau, T. (2018). Predicting consonant recognition and confusions using a microscopic speech perception model. Journal of the Acoustical
Society of America, 141(5), 3633–3633.
https://doi.org/10.1121/1.4987824
Or even a more complex front end to model hearing impairments
Relaño-Iborra, H., Zaar, J., & Dau, T. (2019). A speech-based computational auditory signal processing and perception model. Journal of the Acoustical
Society of America, 146(5), 3306–3317.
https://doi.org/10.1121/1.5129114
Also, the work from Biberger and colleagues (Oldenburg) where there is also quality predictions in "Generalized" power spectrum model
Biberger, T., & Ewert, S. D. (2016). Envelope and intensity based prediction of psychoacoustic masking and speech intelligibility. Journal of
the Acoustical Society of America, 140(2), 1023–1038.
https://doi.org/10.1121/1.4960574
Most recent: Biberger, T., Schepker, H., Denk, F., & Ewert, S. D. (2021). Instrumental Quality Predictions and Analysis of Auditory Cues for
Algorithms in Modern Headphone Technology. Trends in Hearing, 25, 23312165211001219.
https://doi.org/10.1177/23312165211001219
Some implementations here: http://amtoolbox.sourceforge.net/models.php or contacting the authors
Best
RH