Skip to main content
Erschienen in: International Journal of Speech Technology 4/2023

18.11.2023

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

verfasst von: Navneet Upadhyay

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we investigate a psychoacoustic model-driven spectral subtraction framework for enhancement of noisy speech. In the proposed framework, the noisy speech spectrum is separated into six distinct and unevenly frequency-spaced subbands as per the psychoacoustic model of the human hearing system, and spectral over-subtraction is applied independently in each subband. The noise in each subband is estimated using an adaptive noise estimator that does not require a speech pause tracker. To compute and update the noise, the noisy speech power is adaptively smoothed using a smoothing factor controlled by a posterior SNR. The performance of the proposed framework is evaluated using SNR, segmental SNR (SegSNR), and PESQ scores for a variety of non-stationary and stationary noise environments at varying SNR levels. The experimental results show that the proposed framework outperforms various up-to-date speech enhancement technologies on three extensively used  objective metrics assessments and speech spectrograms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings of the international conference on acoustic, speech, signal processing, (ICASP) (pp. 208–211) Washington DC, April 1979. Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings of the international conference on acoustic, speech, signal processing, (ICASP) (pp. 208–211) Washington DC, April 1979.
Zurück zum Zitat Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transaction on Acoustic, Speech, Signal Processing, 27(2), 113–120.CrossRef Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transaction on Acoustic, Speech, Signal Processing, 27(2), 113–120.CrossRef
Zurück zum Zitat Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transaction on Speech and Audio Processing, 11, 466–475.CrossRef Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transaction on Speech and Audio Processing, 11, 466–475.CrossRef
Zurück zum Zitat Doblinger, G. (1995). Computationally efficient speech enhancement by spectral minima tracking in sub-bands. Proceedings of Euro Speech, 2, 1513–1516. Doblinger, G. (1995). Computationally efficient speech enhancement by spectral minima tracking in sub-bands. Proceedings of Euro Speech, 2, 1513–1516.
Zurück zum Zitat Ephraim, Y. (1992). Statistical-model-based speech enhancement systems. Proceedings of IEEE, 80(10), 1526–1555.CrossRef Ephraim, Y. (1992). Statistical-model-based speech enhancement systems. Proceedings of IEEE, 80(10), 1526–1555.CrossRef
Zurück zum Zitat Ephraim, Y., Ari, H. L., & Roberts, W. (2006). A brief survey of speech enhancement. In The electrical engineering handbook (3rd ed.). CRC Press. Ephraim, Y., Ari, H. L., & Roberts, W. (2006). A brief survey of speech enhancement. In The electrical engineering handbook (3rd ed.). CRC Press.
Zurück zum Zitat Ephraim, Y., & Cohen, I. (2006). Ch. 5: Recent advancements in speech enhancement. In The electrical engineering handbook (pp. 12–26). CRC Press. Ephraim, Y., & Cohen, I. (2006). Ch. 5: Recent advancements in speech enhancement. In The electrical engineering handbook (pp. 12–26). CRC Press.
Zurück zum Zitat Kamath, S., & Loizou, P. (2002). A multiband spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the international conference on acoustic, speech, signal processing, (ICASP), Orlando, USA, May 2002. Kamath, S., & Loizou, P. (2002). A multiband spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the international conference on acoustic, speech, signal processing, (ICASP), Orlando, USA, May 2002.
Zurück zum Zitat Li, S., Wang, J. Q., & Jing, X. J. (2010). The application of non-linear spectral subtraction method on millimeter wave conducted speech enhancement. Mathematical Problems in Engineering, 2010, 1–12. Li, S., Wang, J. Q., & Jing, X. J. (2010). The application of non-linear spectral subtraction method on millimeter wave conducted speech enhancement. Mathematical Problems in Engineering, 2010, 1–12.
Zurück zum Zitat Lim, J. S., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of IEEE, 67, 1586–1604.CrossRef Lim, J. S., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of IEEE, 67, 1586–1604.CrossRef
Zurück zum Zitat Lin, L., Holmes, W. H., & Ambikairajah, E. (2003). Adaptive noise estimation algorithm for speech enhancement. Electronics Letters, 39, 754–755.CrossRef Lin, L., Holmes, W. H., & Ambikairajah, E. (2003). Adaptive noise estimation algorithm for speech enhancement. Electronics Letters, 39, 754–755.CrossRef
Zurück zum Zitat Loizou, P. C. (2013). Speech enhancement: Theory and practice (IInd edn.). Taylor and Francis. Loizou, P. C. (2013). Speech enhancement: Theory and practice (IInd edn.). Taylor and Francis.
Zurück zum Zitat Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transaction on Speech and Audio Processing, 9, 504–512.CrossRef Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transaction on Speech and Audio Processing, 9, 504–512.CrossRef
Zurück zum Zitat O'Shaughnessy, D. (2007). Speech communications: Human and machine (2nd edn.). University Press (India) Pvt. Ltd. O'Shaughnessy, D. (2007). Speech communications: Human and machine (2nd edn.). University Press (India) Pvt. Ltd.
Zurück zum Zitat Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU, ITU-T Rec. P. 862 (2000). Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU, ITU-T Rec. P. 862 (2000).
Zurück zum Zitat Rao, V. R., Murthy, R., & Rao, K. S. (2011). Speech enhancement using cross-correlation compensated multiband wiener filter combined with harmonic regeneration. Journal of Signal and Information Processing, 2, 117–124.CrossRef Rao, V. R., Murthy, R., & Rao, K. S. (2011). Speech enhancement using cross-correlation compensated multiband wiener filter combined with harmonic regeneration. Journal of Signal and Information Processing, 2, 117–124.CrossRef
Zurück zum Zitat Udrea, R. M., Vizireanu, N., Ciochina, S., & Halunga, S. (2008). Non-linear spectral subtraction method for colored noise reduction using multiband Bark scale. Signal Processing, 88, 1299–1303.CrossRef Udrea, R. M., Vizireanu, N., Ciochina, S., & Halunga, S. (2008). Non-linear spectral subtraction method for colored noise reduction using multiband Bark scale. Signal Processing, 88, 1299–1303.CrossRef
Zurück zum Zitat Upadhyay, N., & Karmakar, A. (2012a). A perceptually motivated multiband spectral subtraction algorithm for enhancement of degraded speech. In Proceedings of the IEEE international conference on computer and communication technology, (ICC&CT) (pp. 340–345), MNNIT Allahabad, India, November 23–25, 2012. Upadhyay, N., & Karmakar, A. (2012a). A perceptually motivated multiband spectral subtraction algorithm for enhancement of degraded speech. In Proceedings of the IEEE international conference on computer and communication technology, (ICC&CT) (pp. 340–345), MNNIT Allahabad, India, November 23–25, 2012.
Zurück zum Zitat Upadhyay, N., & Karmakar, A. (2012b). An auditory perception based improved multiband spectral subtraction algorithm for enhancement of speech degraded by non-stationary noises. In Proceedings of the IEEE international conference on intelligent human computer interaction, (pp. 392–398), IIT Kharagpur, India, December 27–29, 2012. Upadhyay, N., & Karmakar, A. (2012b). An auditory perception based improved multiband spectral subtraction algorithm for enhancement of speech degraded by non-stationary noises. In Proceedings of the IEEE international conference on intelligent human computer interaction, (pp. 392–398), IIT Kharagpur, India, December 27–29, 2012.
Zurück zum Zitat Upadhyay, N., & Karmakar, A. (2014). A perceptually motivated stationary wavelet packet filter bank using improved spectral over-subtraction for enhancement of speech in various noise environments. International Journal of Speech Technology, 17, 117–132.CrossRef Upadhyay, N., & Karmakar, A. (2014). A perceptually motivated stationary wavelet packet filter bank using improved spectral over-subtraction for enhancement of speech in various noise environments. International Journal of Speech Technology, 17, 117–132.CrossRef
Zurück zum Zitat Zwicker, E., & Fastl, H. (1990). Psychoacoustics: Facts and models. Springer. Zwicker, E., & Fastl, H. (1990). Psychoacoustics: Facts and models. Springer.
Zurück zum Zitat Zwicker, E., & Terhardt, E. (1980). Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. Journal of Acoustic Society of America, 68, 1523–1525.CrossRef Zwicker, E., & Terhardt, E. (1980). Analytical expressions for critical-band rate and critical bandwidth as a function of frequency. Journal of Acoustic Society of America, 68, 1523–1525.CrossRef
Metadaten
Titel
Psychoacoustic model-driven spectral subtraction for monaural speech enhancement
verfasst von
Navneet Upadhyay
Publikationsdatum
18.11.2023
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2023
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-023-10062-9

Weitere Artikel der Ausgabe 4/2023

International Journal of Speech Technology 4/2023 Zur Ausgabe

Neuer Inhalt