Skip to main content

2017 | OriginalPaper | Buchkapitel

Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier

verfasst von : Sergey Salishev, Ilya Klotchkov, Andrey Barabanov

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a novel computationally efficient real-time microphone array speech enhancement postfilter with a small delay that takes into account features of speech signal and recognition algorithms. The algorithm is efficient for small microphone arrays. The filter is based on applying a binary classification model to the Log Short-Term Spectral Amplitude (Log-STSA). The proposed algorithm allows substantial improvement of recognition accuracy with minor increase in complexity compared to Wiener post-filter and lower complexity compared to existing voice model based approaches. Objective tests using dual microphone array, ETSI binaural noise database, TIDIGITS database, and CMU Sphinx 4 speech recognizer demonstrate overall 41% Error Rate reduction for SNR from 15 dB to 0 dB. Subjective evaluation also demonstrates substantial noise reduction and intelligibility improvement without musical noise artifacts common for Wiener and Spectral Subtraction based methods. Testing with SiSEC10 four microphone linear equispaced array database shows that recognition accuracy is improved with increased base and/or number of microphones in array.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat ETSI EG 202 396–1 speech and multimedia transmission quality (STQ); part 1: Background noise simulation technique and background noise database, March 2009 ETSI EG 202 396–1 speech and multimedia transmission quality (STQ); part 1: Background noise simulation technique and background noise database, March 2009
4.
Zurück zum Zitat Barker, J., Marxer, R., Vincent, E., Watanabe, S.: The third chimespeech separation and recognition challenge: dataset, task and baselines. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 504–511. IEEE (2015) Barker, J., Marxer, R., Vincent, E., Watanabe, S.: The third chimespeech separation and recognition challenge: dataset, task and baselines. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 504–511. IEEE (2015)
5.
Zurück zum Zitat Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process. 33(2), 443–445 (1985)CrossRef Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process. 33(2), 443–445 (1985)CrossRef
6.
Zurück zum Zitat Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)CrossRef Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)CrossRef
7.
Zurück zum Zitat Kamkar-Parsi, A.H., Bouchard, M.: Improved noise power spectrum density estimation for binaural hearing aids operating in a diffuse noise field environment. IEEE Trans. Audio Speech Lang. Process. 17(4), 521–533 (2009)CrossRef Kamkar-Parsi, A.H., Bouchard, M.: Improved noise power spectrum density estimation for binaural hearing aids operating in a diffuse noise field environment. IEEE Trans. Audio Speech Lang. Process. 17(4), 521–533 (2009)CrossRef
8.
Zurück zum Zitat Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., Wolf, P.: Design of the CMU sphinx-4 decoder. In: Eighth European Conference on Speech Communication and Technology (2003) Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., Wolf, P.: Design of the CMU sphinx-4 decoder. In: Eighth European Conference on Speech Communication and Technology (2003)
9.
Zurück zum Zitat Lefkimmiatis, S., Maragos, P.: A generalized estimation approach for linear and nonlinear microphone array post-filters. Speech Commun. 49(7), 657–666 (2007)CrossRef Lefkimmiatis, S., Maragos, P.: A generalized estimation approach for linear and nonlinear microphone array post-filters. Speech Commun. 49(7), 657–666 (2007)CrossRef
10.
Zurück zum Zitat Leonard, R.G., Doddington, G.: Tidigits. Linguistic Data Consortium, Philadelphia (1993) Leonard, R.G., Doddington, G.: Tidigits. Linguistic Data Consortium, Philadelphia (1993)
11.
Zurück zum Zitat McCowan, I.A., Bourlard, H.: Microphone array post-filter based on noise field coherence. IEEE Trans. Speech Audio Process. 11(6), 709–716 (2003)CrossRef McCowan, I.A., Bourlard, H.: Microphone array post-filter based on noise field coherence. IEEE Trans. Speech Audio Process. 11(6), 709–716 (2003)CrossRef
12.
Zurück zum Zitat McCowan, I.A., Marro, C., Mauuary, L.: Robust speech recognition using near-field superdirective beamforming with post-filtering. In: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000, Proceedings, vol. 3, pp. 1723–1726. IEEE (2000) McCowan, I.A., Marro, C., Mauuary, L.: Robust speech recognition using near-field superdirective beamforming with post-filtering. In: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000, Proceedings, vol. 3, pp. 1723–1726. IEEE (2000)
13.
Zurück zum Zitat Plapous, C., Marro, C., Scalart, P.: Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 14(6), 2098–2108 (2006)CrossRef Plapous, C., Marro, C., Scalart, P.: Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 14(6), 2098–2108 (2006)CrossRef
14.
Zurück zum Zitat Raj, B., Stern, R.M.: Missing-feature approaches in speech recognition. IEEE Sig. Process. Mag. 22(5), 101–116 (2005)CrossRef Raj, B., Stern, R.M.: Missing-feature approaches in speech recognition. IEEE Sig. Process. Mag. 22(5), 101–116 (2005)CrossRef
16.
Zurück zum Zitat Yoshioka, T., Nakatani, T.: A microphone array system integrating beamforming, feature enhancement, and spectral mask-based noise estimation. In: 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp. 219–224. IEEE (2011) Yoshioka, T., Nakatani, T.: A microphone array system integrating beamforming, feature enhancement, and spectral mask-based noise estimation. In: 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp. 219–224. IEEE (2011)
17.
Zurück zum Zitat Zelinski, R.: A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1988, pp. 2578–2581. IEEE (1988) Zelinski, R.: A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1988, pp. 2578–2581. IEEE (1988)
Metadaten
Titel
Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier
verfasst von
Sergey Salishev
Ilya Klotchkov
Andrey Barabanov
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-66429-3_52