nach oben

International Journal of Speech Technology

Erschienen in:

18.11.2023

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

verfasst von: Navneet Upadhyay

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we investigate a psychoacoustic model-driven spectral subtraction framework for enhancement of noisy speech. In the proposed framework, the noisy speech spectrum is separated into six distinct and unevenly frequency-spaced subbands as per the psychoacoustic model of the human hearing system, and spectral over-subtraction is applied independently in each subband. The noise in each subband is estimated using an adaptive noise estimator that does not require a speech pause tracker. To compute and update the noise, the noisy speech power is adaptively smoothed using a smoothing factor controlled by a posterior SNR. The performance of the proposed framework is evaluated using SNR, segmental SNR (SegSNR), and PESQ scores for a variety of non-stationary and stationary noise environments at varying SNR levels. The experimental results show that the proposed framework outperforms various up-to-date speech enhancement technologies on three extensively used objective metrics assessments and speech spectrograms.

Vorheriger Artikel Optimized cross-corpus speech emotion recognition framework based on normalized 1D convolutional neural network with data augmentation and feature selection

Nächster Artikel Identification of Parkinson’s disease from speech signal using machine learning approach

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

NOIZEUS (2007). A noisy speech corpus for assessment of speech enhancement algorithms. http://www.utdallas.edu/~loizou/speech/noizeus/

Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings of the international conference on acoustic, speech, signal processing, (ICASP) (pp. 208–211) Washington DC, April 1979.

Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transaction on Acoustic, Speech, Signal Processing, 27(2), 113–120.CrossRef

Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transaction on Speech and Audio Processing, 11, 466–475.CrossRef

Doblinger, G. (1995). Computationally efficient speech enhancement by spectral minima tracking in sub-bands. Proceedings of Euro Speech, 2, 1513–1516.

Ephraim, Y. (1992). Statistical-model-based speech enhancement systems. Proceedings of IEEE, 80(10), 1526–1555.CrossRef

Ephraim, Y., Ari, H. L., & Roberts, W. (2006). A brief survey of speech enhancement. In The electrical engineering handbook (3rd ed.). CRC Press.

Ephraim, Y., & Cohen, I. (2006). Ch. 5: Recent advancements in speech enhancement. In The electrical engineering handbook (pp. 12–26). CRC Press.

Kamath, S., & Loizou, P. (2002). A multiband spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the international conference on acoustic, speech, signal processing, (ICASP), Orlando, USA, May 2002.

Li, S., Wang, J. Q., & Jing, X. J. (2010). The application of non-linear spectral subtraction method on millimeter wave conducted speech enhancement. Mathematical Problems in Engineering, 2010, 1–12.

Lim, J. S., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of IEEE, 67, 1586–1604.CrossRef

Lin, L., Holmes, W. H., & Ambikairajah, E. (2003). Adaptive noise estimation algorithm for speech enhancement. Electronics Letters, 39, 754–755.CrossRef

Loizou, P. C. (2013). Speech enhancement: Theory and practice (IInd edn.). Taylor and Francis.

Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transaction on Speech and Audio Processing, 9, 504–512.CrossRef

O'Shaughnessy, D. (2007). Speech communications: Human and machine (2nd edn.). University Press (India) Pvt. Ltd.

Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU, ITU-T Rec. P. 862 (2000).

Rao, V. R., Murthy, R., & Rao, K. S. (2011). Speech enhancement using cross-correlation compensated multiband wiener filter combined with harmonic regeneration. Journal of Signal and Information Processing, 2, 117–124.CrossRef

Udrea, R. M., Vizireanu, N., Ciochina, S., & Halunga, S. (2008). Non-linear spectral subtraction method for colored noise reduction using multiband Bark scale. Signal Processing, 88, 1299–1303.CrossRef

Upadhyay, N., & Karmakar, A. (2012a). A perceptually motivated multiband spectral subtraction algorithm for enhancement of degraded speech. In Proceedings of the IEEE international conference on computer and communication technology, (ICC&CT) (pp. 340–345), MNNIT Allahabad, India, November 23–25, 2012.

Upadhyay, N., & Karmakar, A. (2012b). An auditory perception based improved multiband spectral subtraction algorithm for enhancement of speech degraded by non-stationary noises. In Proceedings of the IEEE international conference on intelligent human computer interaction, (pp. 392–398), IIT Kharagpur, India, December 27–29, 2012.

Upadhyay, N., & Karmakar, A. (2014). A perceptually motivated stationary wavelet packet filter bank using improved spectral over-subtraction for enhancement of speech in various noise environments. International Journal of Speech Technology, 17, 117–132.CrossRef

Zwicker, E., & Fastl, H. (1990). Psychoacoustics: Facts and models. Springer.

Zwicker, E., & Terhardt, E. (1980). Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. Journal of Acoustic Society of America, 68, 1523–1525.CrossRef

Titel: Psychoacoustic model-driven spectral subtraction for monaural speech enhancement
verfasst von: Navneet Upadhyay
Publikationsdatum: 18.11.2023
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2023
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-023-10062-9

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Thorsten Mücke/© Alexandra Bachran, Gamification/© Sergey Shulgin / Getty Images / iStock, Benedikt Bonnmann von Adesso/© Adesso, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2023

Neural RAPT: deep learning-based pitch tracking with prior algorithmic knowledge instillation

An optimized convolutional neural network for speech enhancement

A hybrid adaptive neuro-fuzzy approach for automatic spoken digit recognition

Robust and efficient keyword spotting using a bidirectional attention LSTM

Correction to: Novel data augmentation for named entity recognition

CI-Mix: cut instance mix for robust speaker verification

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.