Top

Published in:

2019 | OriginalPaper | Chapter

4. Neural Beamforming for Speech Enhancement: Preliminary Results

Authors : Stefano Tomassetti, Leonardo Gabrielli, Emanuele Principi, Daniele Ferretti, Stefano Squartini

Published in: Neural Advances in Processing Nonlinear Dynamic Signals

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In the field of multi-channel speech quality enhancement, beamforming algorithms play a key role, being able to reduce noise and reverberation by spatial filtering. To that extent, an accurate knowledge of the Direction of Arrival (DOA) is crucial for the beamforming to be effective. This paper reports extremely improved DOA estimates with the use of a recently introduced neural DOA estimation technique, when compared to a reference algorithm such as Multiple Signal Classification (MUSIC). These findings motivated for the evaluation of beamforming with neural DOA estimation in the field of speech enhancement. By using the neural DOA estimation in conjunction with beamforming, speech signals affected by reverberation and noise improve their quality. These first findings are reported to be taken as a reference for further works related to beamforming for speech enhancement.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Data Mining by Evolving Agents for Clusters Discovery and Metric Learning

next chapter Error Resilient Neural Networks on Low-Dimensional Manifolds

https://www.audiolabs-erlangen.de/fau/professor/habets/software/rir-generator.

http://www.itu.int/rec/T-REC-P.862/en.

Allen, J., Berkley, D.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am., 943 (1979)CrossRef

Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Using neural network front-ends on far field multiple microphones based speech recognition. In: Proceedings of ICASSP, Florence, Italy, pp. 5542–5546, 4–9 May 2014

Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Exploring multi-channel features for denoising-autoencoder-based speech enhancement. In: Proceedings of ICASSP, pp. 116–120 (2015)

Benesty, J., Chen, J., Huang, Y.: Microphone Array Signal Processing, vol. 1. Springer Science & Business Media (2008)

Capon, J.: High resolution frequency-wavenumber spectrum analysis. Proc. IEEE 57(8), 1408–1418 (1969)CrossRef

Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)CrossRef

Erdogan, H., Hayashi, T., Hershey, J.R., Hori, T., Hori, C., Hsu, W.n., Kim, S., Roux, J.L., Meng, Z., Watanabe, S.: Multi-channel speech recognition: LSTMs all the way through. In: Proceedings of the 4th CHiME Speech Separation and Recognition Challenge, San Francisco, CA, USA (2016)

Gannot, S., Cohen, I.: Speech enhancement based on the general transfer function gsc and postfiltering. IEEE Trans. Speech Audio Process. 12(6), 561–571 (2004)CrossRef

Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)CrossRef

10.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)CrossRef

11.

Hoshen, Y., Weiss, R., Wilson, K.: Speech Acoustic Modeling from Raw Multichannel Waveforms, pp. 4624–4628 (2015)

12.

Hussain, A., Chetouani, M., Squartini, S., Bastari, A., Piazza, F.: Nonlinear Speech Enhancement: An Overview, pp. 217–248. Springer Berlin (2007)

13.

Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)

14.

Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Sehr, A., Kellermann, W., Maas, R.: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4. IEEE (2013)

15.

Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)CrossRef

16.

Knecht, W., Schenkel, M.E., Moschytz, G.S.: Neural network filters for speech enhancement. IEEE Trans. Speech Audio Process. 3(6), 433–438 (1995)CrossRef

17.

Li, B., Sainath, T., Weiss, R., Wilson, K., Bacchiani, M.: Neural network adaptive beamforming for robust multichannel speech recognition. In: Proceedings of Interspeech, pp. 1976–1980, 8–12 Sept 2016

18.

Li, J., Deng, L., Haeb-Umbach, R., Gong, Y.: Robust Automatic Speech Recognition: A Bridge to Practical Applications. Academic Press (2015)

19.

Loizou, P.: Speech processing in vocoder-centric cochlear implants. In: Cochlear and Brainstem Implants, vol. 64, pp. 109–143. Karger Publishers (2006)CrossRef

20.

Philipos C. Loizou: Speech Enhancement: Theory and Practice. CRC Press (2013)

21.

Principi, E., Fuselli, D., Squartini, S., Bonifazi, M., Piazza, F.: A speech-based system for in-home emergency detection and remote assistance. In: Proceedings of the 134th International AES Convention, Rome, Italy, pp. 560–569, 4–7 May 2013

22.

Principi, E., Squartini, S., Bonfigli, R., Ferroni, G., Piazza, F.: An integrated system for voice command recognition and emergency detection based on audio signals. Expert Syst. Appl. 42(13), 5668–5683 (2015)CrossRef

23.

Principi, E., Squartini, S., Piazza, F.: Power normalized cepstral coefficients based supervectors and i-vectors for small vocabulary speech recognition. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), Beijing, China, pp. 3562–3568, 6–11 July 2014

24.

Renals, S., Swietojanski, P.: Neural networks for distant speech recognition. In: Proceedings of HSCMA, pp. 172–176 (2014)

25.

Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: WSJ-CAM0: a british english corpus for large vocabulary continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (1994)

26.

Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)CrossRef

27.

Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)CrossRef

28.

Swietojanski, P., Ghoshal, A., Renals, S.: Convolutional neural networks for distant speech recognition. IEEE Signal Process. Lett. 21(9), 1120–1124 (2014)CrossRef

29.

Xiao, X., Watanabe, S., Erdogan, H., Lu, L., Hershey, J., Seltzer, M., Chen, G., Zhang, Y., Mandel, M., Yu, D.: Deep beamforming networks for multi-channel speech recognition. In: Proceedings of ICASSP, pp. 5745–5749 (2016)

30.

Xiao, X., Zhao, S., Zhong, X., Jones, D.L., Chng, E.S., Li, H.: A learning-based approach to direction of arrival estimation in noisy and reverberant environments. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2814–2818. IEEE (2015)

31.

Yoganathan, V., Moir, T.: Multi-microphone adaptive neural switched Griffiths-Jim beamformer for noise reduction. In: Proceedings of the 10th International Conference on Signal Processing, pp. 299–302 (2010)

32.

Zhang, H., Zhang, X., Gao, G.: Multi-channel speech enhancement based on deep stacking network. In: Proceedings of the 4th CHiME Speech Separation and Recognition Challenge, San Francisco, CA, USA (2016)

Title: Neural Beamforming for Speech Enhancement: Preliminary Results
Authors: Stefano Tomassetti
Leonardo Gabrielli
Emanuele Principi
Daniele Ferretti
Stefano Squartini
Publisher: Springer International Publishing
Book: Neural Advances in Processing Nonlinear Dynamic Signals
Print ISBN: 978-3-319-95097-6

Electronic ISBN: 978-3-319-95098-3

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-319-95098-3_4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner