Skip to main content
Erschienen in: International Journal of Speech Technology 4/2016

08.10.2016

Auditory driven subband speech enhancement for automatic recognition of noisy speech

verfasst von: Navneet Upadhyay, Hamurabi Gamboa Rosales

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speech recognizers achieve high recognition accuracy under quiet acoustic environments, but their performance degrades drastically when they are deployed in real environments, where the speech is degraded by additive ambient noise. This paper advocates a two phase approach for robust speech recognition in such environment. Firstly, a front end subband speech enhancement with adaptive noise estimation (ANE) approach is used to filter the noisy speech. The whole noisy speech spectrum is portioned into eighteen dissimilar subbands based on Bark scale and noise power from each subband is estimated by the ANE approach, which does not require the speech pause detection. Secondly, the filtered speech spectrum is processed by the non parametric frequency domain algorithm based on human perception along with the back end building a robust classifier to recognize the utterance. A suite of experiments is conducted to evaluate the performance of the speech recognizer in a variety of real environments, with and without the use of a front end speech enhancement stage. Recognition accuracy is evaluated at the word level, and at a wide range of signal to noise ratios for real world noises. Experimental evaluations show that the proposed algorithm attains good recognition performance when signal to noise ratio is lower than 5 dB.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Acero, A., & Stern, R. M. (1990). Environmental robustness in automatic speech recognition. International conference acoustic, speech, signal processing, vol. 2, (pp. 849–852). Albuquerque, NM. Acero, A., & Stern, R. M. (1990). Environmental robustness in automatic speech recognition. International conference acoustic, speech, signal processing, vol. 2, (pp. 849–852). Albuquerque, NM.
Zurück zum Zitat Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. International Conference Acoustics, Speech, Signal Processing, 4, 208–211. Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. International Conference Acoustics, Speech, Signal Processing, 4, 208–211.
Zurück zum Zitat Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transaction Speech and Audio Processing, 27(2), 113–120. Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transaction Speech and Audio Processing, 27(2), 113–120.
Zurück zum Zitat Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475.CrossRef Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475.CrossRef
Zurück zum Zitat Cutajar, M., Gatt, E., Grech, I., Casha, O., & Micallef, J. (2013). Comparative study of automatic speech recognition techniques. IET Signal Processing, 7(1), 25–46.CrossRef Cutajar, M., Gatt, E., Grech, I., Casha, O., & Micallef, J. (2013). Comparative study of automatic speech recognition techniques. IET Signal Processing, 7(1), 25–46.CrossRef
Zurück zum Zitat Flores, J. A. N., & Young, S. J. (1993). Adapting a HMM based recognizer for noisy speech enhanced by spectral subtraction. European conference on speech communication and technology (pp. 829–832). Flores, J. A. N., & Young, S. J. (1993). Adapting a HMM based recognizer for noisy speech enhanced by spectral subtraction. European conference on speech communication and technology (pp. 829–832).
Zurück zum Zitat Gong, Y. (1995). Speech recognition in noisy environments: A survey. Computer Speech & Language, 16, 261–291. Gong, Y. (1995). Speech recognition in noisy environments: A survey. Computer Speech & Language, 16, 261–291.
Zurück zum Zitat Hirsch, H. G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Automatic speech recognition: Challenges for the new millennium, tutorial and research workshop. Paris. Hirsch, H. G., & Pearce, D. (2000). The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Automatic speech recognition: Challenges for the new millennium, tutorial and research workshop. Paris.
Zurück zum Zitat Juang, B. H. (1991). Speech recognition in adverse environments. Computer Speech & Language, 5, 275–294.CrossRef Juang, B. H. (1991). Speech recognition in adverse environments. Computer Speech & Language, 5, 275–294.CrossRef
Zurück zum Zitat Kamath, S., & Loizou, P. (2002). A multi band spectral subtraction method for enhancing speech corrupted by colored noise. International conference acoustics, speech, signal processing. Orlando, FL. Kamath, S., & Loizou, P. (2002). A multi band spectral subtraction method for enhancing speech corrupted by colored noise. International conference acoustics, speech, signal processing. Orlando, FL.
Zurück zum Zitat Lin, L., Holmes, W., & Ambikairajah, E. (2002). Speech denoising using perceptual modification of Wiener filtering. Electronics Letters, 38(23), 1486–1487.CrossRef Lin, L., Holmes, W., & Ambikairajah, E. (2002). Speech denoising using perceptual modification of Wiener filtering. Electronics Letters, 38(23), 1486–1487.CrossRef
Zurück zum Zitat Lin, L., Holmes, W. H., & Ambikairajah, E. (2003). Adaptive noise estimation algorithm for speech enhancement. Electronics Letters, 39(9), 754–755.CrossRef Lin, L., Holmes, W. H., & Ambikairajah, E. (2003). Adaptive noise estimation algorithm for speech enhancement. Electronics Letters, 39(9), 754–755.CrossRef
Zurück zum Zitat Lockwood, P., & Boudy, J. (1992). Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars. Speech Communication, 11(2–3), 215–228.CrossRef Lockwood, P., & Boudy, J. (1992). Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars. Speech Communication, 11(2–3), 215–228.CrossRef
Zurück zum Zitat Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transaction Speech Audio Processing, 9(5), 504–512.CrossRef Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transaction Speech Audio Processing, 9(5), 504–512.CrossRef
Zurück zum Zitat Pallett, D. S. (1985). Performance assessment of automatic speech recognizers. Journal of Research of the National Bureau of Standards, 90(5), 371–385.CrossRef Pallett, D. S. (1985). Performance assessment of automatic speech recognizers. Journal of Research of the National Bureau of Standards, 90(5), 371–385.CrossRef
Zurück zum Zitat Singh, L., & Sridharan, S. (1998). Speech enhancement using critical band spectral subtraction. International conference on spoken language processing (pp. 2827–2830). Sydney. Singh, L., & Sridharan, S. (1998). Speech enhancement using critical band spectral subtraction. International conference on spoken language processing (pp. 2827–2830). Sydney.
Zurück zum Zitat Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing, 7(2), 126–137.CrossRef Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on Speech and Audio Processing, 7(2), 126–137.CrossRef
Zurück zum Zitat Volkmann, J., Stevens, S. S., & Newman, E. B. (1937). A scale for the measurement of the psychological magnitude pitch. Journal of the Acoustical Society of America, 8(3), 208–208.CrossRef Volkmann, J., Stevens, S. S., & Newman, E. B. (1937). A scale for the measurement of the psychological magnitude pitch. Journal of the Acoustical Society of America, 8(3), 208–208.CrossRef
Zurück zum Zitat Yamada, T., Kumakura, M., & Kitawaki, N. (2006). Performance estimation of speech recognition system under noise conditions using objective quality measures and artificial voice. IEEE Transactions on Audio, Speech and Language Processing, 14(6), 2006–2013.CrossRef Yamada, T., Kumakura, M., & Kitawaki, N. (2006). Performance estimation of speech recognition system under noise conditions using objective quality measures and artificial voice. IEEE Transactions on Audio, Speech and Language Processing, 14(6), 2006–2013.CrossRef
Zurück zum Zitat Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., et al. (2000). The HTK book. Cambridge: Cambridge University Speech Group. Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., et al. (2000). The HTK book. Cambridge: Cambridge University Speech Group.
Zurück zum Zitat Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands. Journal of the Acoustical Society of America, 33(2), 248.CrossRef Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands. Journal of the Acoustical Society of America, 33(2), 248.CrossRef
Zurück zum Zitat Zwicker, E., & Fastl, H. (1990). Psychoacoustics: Facts and models (2nd ed.). New York: Springer. Zwicker, E., & Fastl, H. (1990). Psychoacoustics: Facts and models (2nd ed.). New York: Springer.
Metadaten
Titel
Auditory driven subband speech enhancement for automatic recognition of noisy speech
verfasst von
Navneet Upadhyay
Hamurabi Gamboa Rosales
Publikationsdatum
08.10.2016
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-016-9370-4

Weitere Artikel der Ausgabe 4/2016

International Journal of Speech Technology 4/2016 Zur Ausgabe

Neuer Inhalt