Skip to main content
Erschienen in: International Journal of Speech Technology 2/2014

01.06.2014

A perceptually motivated stationary wavelet packet filterbank using improved spectral over-subtraction for enhancement of speech in various noise environments

verfasst von: Navneet Upadhyay, Abhijit Karmakar

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we propose a speech enhancement method where the front-end decomposition of the input speech is performed by temporally processing using a filterbank. The proposed method incorporates a perceptually motivated stationary wavelet packet filterbank (PM-SWPFB) and an improved spectral over-subtraction (I-SOS) algorithm for the enhancement of speech in various noise environments. The stationary wavelet packet transform (SWPT) is a shift invariant transform. The PM-SWPFB is obtained by selecting the stationary wavelet packet tree in such a manner that it matches closely the non-linear resolution of the critical band structure of the psychoacoustic model. After the decomposition of the input speech, the I-SOS algorithm is applied in each subband, separately for the estimation of speech. The I-SOS uses a continuous noise estimation approach and estimate noise power from each subband without the need of explicit speech silence detection. The subband noise power is estimated and updated by adaptively smoothing the noisy signal power. The smoothing parameter in each subband is controlled by a function of the estimated signal-to-noise ratio (SNR). The performance of the proposed speech enhancement method is tested on speech signals degraded by various real-world noises. Using objective speech quality measures (SNR, segmental SNR (SegSNR), perceptual evaluation of speech quality (PESQ) score), and spectrograms with informal listening tests, we show that the proposed speech enhancement method outperforms than the spectral subtractive-type algorithms and improves quality and intelligibility of the enhanced speech.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A Noisy Speech Corpus for Assessment of Speech Enhancement Algorithms. http://​www.​utdallas.​edu/​~loizou/​speech/​noizeus/​.
 
Literatur
Zurück zum Zitat Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings of int. conf. on acoustics, speech, and signal processing, Washington, DC, April 1979 (pp. 208–211). Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings of int. conf. on acoustics, speech, and signal processing, Washington, DC, April 1979 (pp. 208–211).
Zurück zum Zitat Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120. CrossRef Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120. CrossRef
Zurück zum Zitat Chen, S. H., & Wang, J. F. (2004). Speech enhancement using perceptual wavelet packet decomposition and teager energy operator. Journal of VLSI Signal Processing, 36(2–3), 125–139. Chen, S. H., & Wang, J. F. (2004). Speech enhancement using perceptual wavelet packet decomposition and teager energy operator. Journal of VLSI Signal Processing, 36(2–3), 125–139.
Zurück zum Zitat Cohen, I. (2003). Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475. CrossRef Cohen, I. (2003). Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475. CrossRef
Zurück zum Zitat Doblinger, G. (1995). Computationally efficient speech enhancement by spectral minima tracking in subbands. In Proceedings of euro speech (Vol. 2, pp. 1513–1516). Doblinger, G. (1995). Computationally efficient speech enhancement by spectral minima tracking in subbands. In Proceedings of euro speech (Vol. 2, pp. 1513–1516).
Zurück zum Zitat Ephraim, Y. (1992). Statistical-model-based speech enhancement systems. Proceedings of the IEEE, 80(10), 1526–1555. CrossRef Ephraim, Y. (1992). Statistical-model-based speech enhancement systems. Proceedings of the IEEE, 80(10), 1526–1555. CrossRef
Zurück zum Zitat Ephraim, Y., & Cohen, I. (2006). Recent advancements in speech enhancement. In The electrical engineering handbook (pp. 12–26). Boca Raton: CRC Press. Chap. 5. Ephraim, Y., & Cohen, I. (2006). Recent advancements in speech enhancement. In The electrical engineering handbook (pp. 12–26). Boca Raton: CRC Press. Chap. 5.
Zurück zum Zitat Ephraim, Y., Ari, H. L., & Roberts, W. (2006). A brief survey of speech enhancement. In The electrical engineering handbook (3rd ed.). Boca Raton: CRC Press. Ephraim, Y., Ari, H. L., & Roberts, W. (2006). A brief survey of speech enhancement. In The electrical engineering handbook (3rd ed.). Boca Raton: CRC Press.
Zurück zum Zitat Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of int. conf. on acoustics, speech, and signal processing, Orlando, USA, May 2002. Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of int. conf. on acoustics, speech, and signal processing, Orlando, USA, May 2002.
Zurück zum Zitat Lim, J. S., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67, 1586–1604. CrossRef Lim, J. S., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67, 1586–1604. CrossRef
Zurück zum Zitat Lin, L., Ambikairajah, E., & Holmes, W. H. (2002). Speech enhancement for non-stationary noise environment. In Proceedings of IEEE Asia pacific conf. on circuits and systems, Oct. 2002 (Vol. 1, pp. 177–180). CrossRef Lin, L., Ambikairajah, E., & Holmes, W. H. (2002). Speech enhancement for non-stationary noise environment. In Proceedings of IEEE Asia pacific conf. on circuits and systems, Oct. 2002 (Vol. 1, pp. 177–180). CrossRef
Zurück zum Zitat Loizou, P. C. (2007). Speech enhancement: theory and practice (1st ed.). London: Taylor & Francis. Loizou, P. C. (2007). Speech enhancement: theory and practice (1st ed.). London: Taylor & Francis.
Zurück zum Zitat Lu, C. T., & Wang, H. C. (2004). Speech enhancement using perceptually constrained gain factors in critical band wavelet packet transform. Electronics Letters, 40(6), 394–396. CrossRef Lu, C. T., & Wang, H. C. (2004). Speech enhancement using perceptually constrained gain factors in critical band wavelet packet transform. Electronics Letters, 40(6), 394–396. CrossRef
Zurück zum Zitat Mallat, S. (1989). A theory for multi-resolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693. CrossRefMATH Mallat, S. (1989). A theory for multi-resolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674–693. CrossRefMATH
Zurück zum Zitat Mallat, S. (2009). A wavelet tour of signal processing: the sparse way (3rd ed.). New York: Academic Press/United Press. Mallat, S. (2009). A wavelet tour of signal processing: the sparse way (3rd ed.). New York: Academic Press/United Press.
Zurück zum Zitat Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9, 504–512. CrossRef Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9, 504–512. CrossRef
Zurück zum Zitat Olhede, S., & Walden, A. T. (2005). A generalized demodulation approach to time frequency projections for multi-component signals. Proceedings of the Royal Society A. Mathematical, Physical and Engineering Sciences, 461, 2159–2179. CrossRefMATHMathSciNet Olhede, S., & Walden, A. T. (2005). A generalized demodulation approach to time frequency projections for multi-component signals. Proceedings of the Royal Society A. Mathematical, Physical and Engineering Sciences, 461, 2159–2179. CrossRefMATHMathSciNet
Zurück zum Zitat O’Shaughnessy, D. (2007). Speech communications: human and machine (2nd ed.). Hyderabad: University Press (India) Pvt. Ltd. O’Shaughnessy, D. (2007). Speech communications: human and machine (2nd ed.). Hyderabad: University Press (India) Pvt. Ltd.
Zurück zum Zitat Rix, A., Beerends, J., Hollier, M., & Hekstra, A. (2001) Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proceedings of IEEE int. conf. on acoustics, speech, and signal processing, Salt Lake City, UT (Vol. 2, pp. 749–752). Rix, A., Beerends, J., Hollier, M., & Hekstra, A. (2001) Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proceedings of IEEE int. conf. on acoustics, speech, and signal processing, Salt Lake City, UT (Vol. 2, pp. 749–752).
Zurück zum Zitat Strang, G., & Nguyen, T. (1996). Wavelets and filter banks. Wellesley: Wellesley-Cambridge Press. MATH Strang, G., & Nguyen, T. (1996). Wavelets and filter banks. Wellesley: Wellesley-Cambridge Press. MATH
Zurück zum Zitat Upadhyay, N., & Karmakar, A. (2012). The spectral subtractive-type algorithms for enhancing speech in noisy environments. In Proceedings of IEEE int. conf. on recent advances in information technology, I.S.M. Dhanbad, India, March 15–17 (pp. 841–847). Upadhyay, N., & Karmakar, A. (2012). The spectral subtractive-type algorithms for enhancing speech in noisy environments. In Proceedings of IEEE int. conf. on recent advances in information technology, I.S.M. Dhanbad, India, March 15–17 (pp. 841–847).
Zurück zum Zitat Upadhyay, N., & Karmakar, A. (2012). A perceptually motivated stationary wavelet filterbank utilizing improved spectral over-subtraction algorithm for enhancing speech in non-stationary environments. In Proceedings of IEEE int. conf. on intelligent human computer interaction, IIT Khargpur, India, Dec 27–29 (pp. 472–478). Upadhyay, N., & Karmakar, A. (2012). A perceptually motivated stationary wavelet filterbank utilizing improved spectral over-subtraction algorithm for enhancing speech in non-stationary environments. In Proceedings of IEEE int. conf. on intelligent human computer interaction, IIT Khargpur, India, Dec 27–29 (pp. 472–478).
Zurück zum Zitat Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing, 7, 126–137. CrossRef Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing, 7, 126–137. CrossRef
Zurück zum Zitat Walden, A. T., & Contreras, C. (1998). The phase-corrected undecimated discrete wavelet packet transform and its application to interpreting the timing of events. Proceedings of the Royal Society A. Mathematical, Physical and Engineering Sciences, 454, 2243–2266. CrossRefMATH Walden, A. T., & Contreras, C. (1998). The phase-corrected undecimated discrete wavelet packet transform and its application to interpreting the timing of events. Proceedings of the Royal Society A. Mathematical, Physical and Engineering Sciences, 454, 2243–2266. CrossRefMATH
Zurück zum Zitat Zwicker, E., & Terhardt, E. (1980). Analytical expressions for critical band rate and critical bandwidth as a function of frequency. The Journal of the Acoustical Society of America, 68, 1523–1525. CrossRef Zwicker, E., & Terhardt, E. (1980). Analytical expressions for critical band rate and critical bandwidth as a function of frequency. The Journal of the Acoustical Society of America, 68, 1523–1525. CrossRef
Metadaten
Titel
A perceptually motivated stationary wavelet packet filterbank using improved spectral over-subtraction for enhancement of speech in various noise environments
verfasst von
Navneet Upadhyay
Abhijit Karmakar
Publikationsdatum
01.06.2014
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2014
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-013-9213-5

Weitere Artikel der Ausgabe 2/2014

International Journal of Speech Technology 2/2014 Zur Ausgabe