nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier

verfasst von : Sergey Salishev, Ilya Klotchkov, Andrey Barabanov

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We propose a novel computationally efficient real-time microphone array speech enhancement postfilter with a small delay that takes into account features of speech signal and recognition algorithms. The algorithm is efficient for small microphone arrays. The filter is based on applying a binary classification model to the Log Short-Term Spectral Amplitude (Log-STSA). The proposed algorithm allows substantial improvement of recognition accuracy with minor increase in complexity compared to Wiener post-filter and lower complexity compared to existing voice model based approaches. Objective tests using dual microphone array, ETSI binaural noise database, TIDIGITS database, and CMU Sphinx 4 speech recognizer demonstrate overall 41% Error Rate reduction for SNR from 15 dB to 0 dB. Subjective evaluation also demonstrates substantial noise reduction and intelligibility improvement without musical noise artifacts common for Wiener and Spectral Subtraction based methods. Testing with SiSEC10 four microphone linear equispaced array database shows that recognition accuracy is improved with increased base and/or number of microphones in array.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Medical Speech Recognition: Reaching Parity with Humans

Nächstes Kapitel Multimodal Keyword Search for Multilingual and Mixlingual Speech Corpus

ETSI EG 202 396–1 speech and multimedia transmission quality (STQ); part 1: Background noise simulation technique and background noise database, March 2009

Source separation in the presence of real-world background noise: Test database for 2 channels case (2010). http://www.irisa.fr/metiss/SiSEC10/noise/SiSEC2010_diffuse_noise_2ch.html. Accessed 27 May 2017

Aleinik, S.: Optimization of Zelinski post-filtering calculation. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS, vol. 9811, pp. 523–530. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_63 CrossRef

Barker, J., Marxer, R., Vincent, E., Watanabe, S.: The third chimespeech separation and recognition challenge: dataset, task and baselines. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 504–511. IEEE (2015)

Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process. 33(2), 443–445 (1985)CrossRef

Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)CrossRef

Kamkar-Parsi, A.H., Bouchard, M.: Improved noise power spectrum density estimation for binaural hearing aids operating in a diffuse noise field environment. IEEE Trans. Audio Speech Lang. Process. 17(4), 521–533 (2009)CrossRef

Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., Wolf, P.: Design of the CMU sphinx-4 decoder. In: Eighth European Conference on Speech Communication and Technology (2003)

Lefkimmiatis, S., Maragos, P.: A generalized estimation approach for linear and nonlinear microphone array post-filters. Speech Commun. 49(7), 657–666 (2007)CrossRef

10.

Leonard, R.G., Doddington, G.: Tidigits. Linguistic Data Consortium, Philadelphia (1993)

11.

McCowan, I.A., Bourlard, H.: Microphone array post-filter based on noise field coherence. IEEE Trans. Speech Audio Process. 11(6), 709–716 (2003)CrossRef

12.

McCowan, I.A., Marro, C., Mauuary, L.: Robust speech recognition using near-field superdirective beamforming with post-filtering. In: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000, Proceedings, vol. 3, pp. 1723–1726. IEEE (2000)

13.

Plapous, C., Marro, C., Scalart, P.: Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 14(6), 2098–2108 (2006)CrossRef

14.

Raj, B., Stern, R.M.: Missing-feature approaches in speech recognition. IEEE Sig. Process. Mag. 22(5), 101–116 (2005)CrossRef

15.

Schmidt, G.: Lecture notes in pattern recognition: noise suppression (2016). http://dss.kirat-online.de/images/teaching/lectures/pattern_recognition/slides/pattern_recognition_02_noise_suppression.pdf. Accessed 27 May 2017

16.

Yoshioka, T., Nakatani, T.: A microphone array system integrating beamforming, feature enhancement, and spectral mask-based noise estimation. In: 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp. 219–224. IEEE (2011)

17.

Zelinski, R.: A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1988, pp. 2578–2581. IEEE (1988)

Titel: Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier
verfasst von: Sergey Salishev
Ilya Klotchkov
Andrey Barabanov
Verlag: Springer International Publishing
Buch: Speech and Computer
Print ISBN: 978-3-319-66428-6

Electronic ISBN: 978-3-319-66429-3

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-66429-3_52

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"