nach oben

International Journal of Speech Technology

Erschienen in:

01.06.2014

A perceptually motivated stationary wavelet packet filterbank using improved spectral over-subtraction for enhancement of speech in various noise environments

verfasst von: Navneet Upadhyay, Abhijit Karmakar

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech.

Vorheriger Artikel An improved feature transformation method using mutual information

Nächster Artikel Audio watermarking in transform domain based on singular value decomposition and Cartesian-polar transformation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

A Noisy Speech Corpus for Assessment of Speech Enhancement Algorithms. http://www.utdallas.edu/~loizou/speech/noizeus/.

Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings of int. conf. on acoustics, speech, and signal processing, Washington, DC, April 1979 (pp. 208–211).

Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120. CrossRef

Chen, S. H., & Wang, J. F. (2004). Speech enhancement using perceptual wavelet packet decomposition and teager energy operator. Journal of VLSI Signal Processing, 36(2–3), 125–139.

Cohen, I. (2003). Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475. CrossRef

Doblinger, G. (1995). Computationally efficient speech enhancement by spectral minima tracking in subbands. In Proceedings of euro speech (Vol. 2, pp. 1513–1516).

Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41(3), 613–627. CrossRefMATHMathSciNet

Ephraim, Y. (1992). Statistical-model-based speech enhancement systems. Proceedings of the IEEE, 80(10), 1526–1555. CrossRef

Ephraim, Y., & Cohen, I. (2006). Recent advancements in speech enhancement. In The electrical engineering handbook (pp. 12–26). Boca Raton: CRC Press. Chap. 5.

Ephraim, Y., Ari, H. L., & Roberts, W. (2006). A brief survey of speech enhancement. In The electrical engineering handbook (3rd ed.). Boca Raton: CRC Press.

Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of int. conf. on acoustics, speech, and signal processing, Orlando, USA, May 2002.

Lim, J. S., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67, 1586–1604. CrossRef

Lin, L., Ambikairajah, E., & Holmes, W. H. (2002). Speech enhancement for non-stationary noise environment. In Proceedings of IEEE Asia pacific conf. on circuits and systems, Oct. 2002 (Vol. 1, pp. 177–180). CrossRef

Loizou, P. C. (2007). Speech enhancement: theory and practice (1st ed.). London: Taylor & Francis.

Lu, C. T., & Wang, H. C. (2004). Speech enhancement using perceptually constrained gain factors in critical band wavelet packet transform. Electronics Letters, 40(6), 394–396. CrossRef

Mallat, S. (1989). A theory for multi-resolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693. CrossRefMATH

Mallat, S. (2009). A wavelet tour of signal processing: the sparse way (3rd ed.). New York: Academic Press/United Press.

Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9, 504–512. CrossRef

Olhede, S., & Walden, A. T. (2005). A generalized demodulation approach to time frequency projections for multi-component signals. Proceedings of the Royal Society A. Mathematical, Physical and Engineering Sciences, 461, 2159–2179. CrossRefMATHMathSciNet

O’Shaughnessy, D. (2007). Speech communications: human and machine (2nd ed.). Hyderabad: University Press (India) Pvt. Ltd.

Rix, A., Beerends, J., Hollier, M., & Hekstra, A. (2001) Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proceedings of IEEE int. conf. on acoustics, speech, and signal processing, Salt Lake City, UT (Vol. 2, pp. 749–752).

Strang, G., & Nguyen, T. (1996). Wavelets and filter banks. Wellesley: Wellesley-Cambridge Press. MATH

Upadhyay, N., & Karmakar, A. (2012). The spectral subtractive-type algorithms for enhancing speech in noisy environments. In Proceedings of IEEE int. conf. on recent advances in information technology, I.S.M. Dhanbad, India, March 15–17 (pp. 841–847).

Upadhyay, N., & Karmakar, A. (2012). A perceptually motivated stationary wavelet filterbank utilizing improved spectral over-subtraction algorithm for enhancing speech in non-stationary environments. In Proceedings of IEEE int. conf. on intelligent human computer interaction, IIT Khargpur, India, Dec 27–29 (pp. 472–478).

Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing, 7, 126–137. CrossRef

Walden, A. T., & Contreras, C. (1998). The phase-corrected undecimated discrete wavelet packet transform and its application to interpreting the timing of events. Proceedings of the Royal Society A. Mathematical, Physical and Engineering Sciences, 454, 2243–2266. CrossRefMATH

Zwicker, E., & Terhardt, E. (1980). Analytical expressions for critical band rate and critical bandwidth as a function of frequency. The Journal of the Acoustical Society of America, 68, 1523–1525. CrossRef

Titel: A perceptually motivated stationary wavelet packet filterbank using improved spectral over-subtraction for enhancement of speech in various noise environments
verfasst von: Navneet Upadhyay
Abhijit Karmakar
Publikationsdatum: 01.06.2014
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 2/2014
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-013-9213-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2014

A semantic parsing approach for Bhutanese language of Dzongkha

Tone modelling in Ibibio speech synthesis

Syllable based text to speech synthesis system using auto associative neural network prosody prediction

GMM based language identification system using robust features

Recent developments in spoken term detection: a survey

Car noise verification and applications