nach oben

International Journal of Speech Technology

Erschienen in:

01.12.2012

Enhancement of noisy speech based on a custom thresholding function with a statistically determined threshold

verfasst von: Tahsina Farah Sanam, Celia Shahnaz

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Performance of the thresholding based speech enhancement methods largely depend on the estimate of the exact threshold value as well as on the choice of the thresholding function. In this paper, a speech enhancement method is presented, in which a custom thresholding function is proposed and employed upon the Wavelet Packet (WP) coefficients of the noisy speech. The thresholding function is capable of switching between modified hard and semisoft thresholding functions depending on a parameter that decides the signal characteristics under consideration. Here, the threshold is determined based on the statistical modeling of the Teager energy operated WP coefficients of the noisy speech. Extensive simulations indicate that the threshold thus obtained in conjunction with the custom thresholding function is very effective in reduction of not only the white noise but also the color noise from the noisy speech thus resulting in an enhanced speech with better quality and intelligibility. Several standard objective measures and subjective evaluations including informal listening tests show that the proposed method outperforms the recent state-of-the-art thresholding based approaches of noisy speech enhancement from high to low levels of SNR.

Vorheriger Artikel Tree distributions approximation model for robust discrete speech recognition

Nächster Artikel Performance of new voice conversion systems based on GMM models and applied to Arabic language

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Almajai, I., & Milner, B. (2011). Visually derived wiener filters for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1642–1651. CrossRef

Bahoura, M., & Rouat, J. (2001). A new approach for wavelet speech enhancement. In EUROSPEECH (pp. 1937–1940).

Chang, J.-H. (2005). Warped discrete cosine transform-based noisy speech enhancement. IEEE Transactions on Circuits and Systems. II, Express Briefs, 52, 535–539. CrossRef

Chang, J.-H. (2007). Complex Laplacian probability density function for noisy speech enhancement. IEICE Electronics Express, 4, 245–250. CrossRef

Chang, S., Kwon, Y., Yang, S.-I., & Kim, I.-J. (2002). Speech enhancement for non-stationary noise environment by adaptive wavelet packet. In Proc. IEEE int. conf. acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. I-561–I-564).

Chen, B., & Loizou, P. C. (2007). A Laplacian-based MMSE estimator for speech enhancement. Speech Communication, 49, 134–143. CrossRef

Donoho, D. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41, 613–627. MathSciNetMATHCrossRef

Ghanbari, Y., & Mollaei, M. R. K. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication, 48(8), 927–940. CrossRef

Hirsch, H., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions (ISCA ITRW ASR2000). Paris, France.

Hu, Y., & Loizou, P. (2004). Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio Processing, 12, 59–67. CrossRef

Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49, 588–601. CrossRef

Jabloun, F., & Champagne, B. (2003). Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 11, 700–708. CrossRef

Johnson, M. T., Yuan, X., & Ren, Y. (2007). Speech signal enhancement through adaptive wavelet thresholding. Speech Communication, 2007, 123–133. CrossRef

Kaiser, J. (1993). Some useful properties of teager’s energy operators. In Proc. IEEE int. conf. speech, and signal processing (ICASSP) (Vol. 3, pp. 149–152).

Kim, N. S., & Chang, J.-H. (2000). Spectral enhancement based on global soft decision. Signal Processing Letters, 7, 108–110. CrossRef

O’Shaughnessy, D. (2000). Speech enhancement: theory and practice. New York: IEEE Press.

Sameti, H., Sheikhzadeh, H., Deng, L., & Brennan, R. (1998). HMM-based strategies for enhancement of speech signals embedded in nonstationary noise. IEEE Transactions on Speech and Audio Processing, 6(5), 445–455. CrossRef

Shao, Y., & Chang, C.-H. (2007). A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. IEEE Transactions on Systems, Man, and Cybernetics, 37(4), 877–889. CrossRef

Sheikhzadeh, H., & Abutalebi, H. R. (2001). An improved wavelet-based speech enhancement system. In EUROSPEECH (pp. 1855–1858).

Tabibian, S., Akbari, A., & Nasersharif, B. (2009). A new wavelet thresholding method for speech enhancement based on symmetric Kullback-Leibler divergence. In Computer conference, 2009. CSICC 2009. 14th international CSI (pp. 495–500). CrossRef

Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251. CrossRef

Yamashita, K., & Shimamura, T. (2005). Nonstationary noise estimation using low-frequency regions for spectral subtraction. Signal Processing Letters, 12, 465–468. CrossRef

Titel: Enhancement of noisy speech based on a custom thresholding function with a statistically determined threshold
verfasst von: Tahsina Farah Sanam
Celia Shahnaz
Publikationsdatum: 01.12.2012
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2012
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9144-6

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Kundenpotenzial/© Andrii Yalanskyi / Getty Images / iStock, Toyota-Logo/© ollo / Getty Images / iStock, Sebastian Glenschek/© Hermes International, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH, adäsion-Webinar-Matinee/© krystiannawrocki_ Getty Images

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2012

Turning point algorithm for speech signal compression

Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features

Tree distributions approximation model for robust discrete speech recognition

Speaker-independent ASR for Modern Standard Arabic: effect of regional accents

Multivariability speaker recognition database in Indian scenario

Performance of new voice conversion systems based on GMM models and applied to Arabic language

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.