Skip to main content
Top

2019 | OriginalPaper | Chapter

4. Neural Beamforming for Speech Enhancement: Preliminary Results

Authors : Stefano Tomassetti, Leonardo Gabrielli, Emanuele Principi, Daniele Ferretti, Stefano Squartini

Published in: Neural Advances in Processing Nonlinear Dynamic Signals

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the field of multi-channel speech quality enhancement, beamforming algorithms play a key role, being able to reduce noise and reverberation by spatial filtering. To that extent, an accurate knowledge of the Direction of Arrival (DOA) is crucial for the beamforming to be effective. This paper reports extremely improved DOA estimates with the use of a recently introduced neural DOA estimation technique, when compared to a reference algorithm such as Multiple Signal Classification (MUSIC). These findings motivated for the evaluation of beamforming with neural DOA estimation in the field of speech enhancement. By using the neural DOA estimation in conjunction with beamforming, speech signals affected by reverberation and noise improve their quality. These first findings are reported to be taken as a reference for further works related to beamforming for speech enhancement.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Allen, J., Berkley, D.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am., 943 (1979)CrossRef Allen, J., Berkley, D.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am., 943 (1979)CrossRef
2.
go back to reference Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Using neural network front-ends on far field multiple microphones based speech recognition. In: Proceedings of ICASSP, Florence, Italy, pp. 5542–5546, 4–9 May 2014 Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Using neural network front-ends on far field multiple microphones based speech recognition. In: Proceedings of ICASSP, Florence, Italy, pp. 5542–5546, 4–9 May 2014
3.
go back to reference Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Exploring multi-channel features for denoising-autoencoder-based speech enhancement. In: Proceedings of ICASSP, pp. 116–120 (2015) Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Exploring multi-channel features for denoising-autoencoder-based speech enhancement. In: Proceedings of ICASSP, pp. 116–120 (2015)
4.
go back to reference Benesty, J., Chen, J., Huang, Y.: Microphone Array Signal Processing, vol. 1. Springer Science & Business Media (2008) Benesty, J., Chen, J., Huang, Y.: Microphone Array Signal Processing, vol. 1. Springer Science & Business Media (2008)
5.
go back to reference Capon, J.: High resolution frequency-wavenumber spectrum analysis. Proc. IEEE 57(8), 1408–1418 (1969)CrossRef Capon, J.: High resolution frequency-wavenumber spectrum analysis. Proc. IEEE 57(8), 1408–1418 (1969)CrossRef
6.
go back to reference Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)CrossRef Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)CrossRef
7.
go back to reference Erdogan, H., Hayashi, T., Hershey, J.R., Hori, T., Hori, C., Hsu, W.n., Kim, S., Roux, J.L., Meng, Z., Watanabe, S.: Multi-channel speech recognition: LSTMs all the way through. In: Proceedings of the 4th CHiME Speech Separation and Recognition Challenge, San Francisco, CA, USA (2016) Erdogan, H., Hayashi, T., Hershey, J.R., Hori, T., Hori, C., Hsu, W.n., Kim, S., Roux, J.L., Meng, Z., Watanabe, S.: Multi-channel speech recognition: LSTMs all the way through. In: Proceedings of the 4th CHiME Speech Separation and Recognition Challenge, San Francisco, CA, USA (2016)
8.
go back to reference Gannot, S., Cohen, I.: Speech enhancement based on the general transfer function gsc and postfiltering. IEEE Trans. Speech Audio Process. 12(6), 561–571 (2004)CrossRef Gannot, S., Cohen, I.: Speech enhancement based on the general transfer function gsc and postfiltering. IEEE Trans. Speech Audio Process. 12(6), 561–571 (2004)CrossRef
9.
go back to reference Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)CrossRef Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)CrossRef
10.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)CrossRef
11.
go back to reference Hoshen, Y., Weiss, R., Wilson, K.: Speech Acoustic Modeling from Raw Multichannel Waveforms, pp. 4624–4628 (2015) Hoshen, Y., Weiss, R., Wilson, K.: Speech Acoustic Modeling from Raw Multichannel Waveforms, pp. 4624–4628 (2015)
12.
go back to reference Hussain, A., Chetouani, M., Squartini, S., Bastari, A., Piazza, F.: Nonlinear Speech Enhancement: An Overview, pp. 217–248. Springer Berlin (2007) Hussain, A., Chetouani, M., Squartini, S., Bastari, A., Piazza, F.: Nonlinear Speech Enhancement: An Overview, pp. 217–248. Springer Berlin (2007)
13.
go back to reference Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015) Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
14.
go back to reference Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Sehr, A., Kellermann, W., Maas, R.: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4. IEEE (2013) Kinoshita, K., Delcroix, M., Yoshioka, T., Nakatani, T., Sehr, A., Kellermann, W., Maas, R.: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech. In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4. IEEE (2013)
15.
go back to reference Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)CrossRef Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)CrossRef
16.
go back to reference Knecht, W., Schenkel, M.E., Moschytz, G.S.: Neural network filters for speech enhancement. IEEE Trans. Speech Audio Process. 3(6), 433–438 (1995)CrossRef Knecht, W., Schenkel, M.E., Moschytz, G.S.: Neural network filters for speech enhancement. IEEE Trans. Speech Audio Process. 3(6), 433–438 (1995)CrossRef
17.
go back to reference Li, B., Sainath, T., Weiss, R., Wilson, K., Bacchiani, M.: Neural network adaptive beamforming for robust multichannel speech recognition. In: Proceedings of Interspeech, pp. 1976–1980, 8–12 Sept 2016 Li, B., Sainath, T., Weiss, R., Wilson, K., Bacchiani, M.: Neural network adaptive beamforming for robust multichannel speech recognition. In: Proceedings of Interspeech, pp. 1976–1980, 8–12 Sept 2016
18.
go back to reference Li, J., Deng, L., Haeb-Umbach, R., Gong, Y.: Robust Automatic Speech Recognition: A Bridge to Practical Applications. Academic Press (2015) Li, J., Deng, L., Haeb-Umbach, R., Gong, Y.: Robust Automatic Speech Recognition: A Bridge to Practical Applications. Academic Press (2015)
19.
go back to reference Loizou, P.: Speech processing in vocoder-centric cochlear implants. In: Cochlear and Brainstem Implants, vol. 64, pp. 109–143. Karger Publishers (2006)CrossRef Loizou, P.: Speech processing in vocoder-centric cochlear implants. In: Cochlear and Brainstem Implants, vol. 64, pp. 109–143. Karger Publishers (2006)CrossRef
20.
go back to reference Philipos C. Loizou: Speech Enhancement: Theory and Practice. CRC Press (2013) Philipos C. Loizou: Speech Enhancement: Theory and Practice. CRC Press (2013)
21.
go back to reference Principi, E., Fuselli, D., Squartini, S., Bonifazi, M., Piazza, F.: A speech-based system for in-home emergency detection and remote assistance. In: Proceedings of the 134th International AES Convention, Rome, Italy, pp. 560–569, 4–7 May 2013 Principi, E., Fuselli, D., Squartini, S., Bonifazi, M., Piazza, F.: A speech-based system for in-home emergency detection and remote assistance. In: Proceedings of the 134th International AES Convention, Rome, Italy, pp. 560–569, 4–7 May 2013
22.
go back to reference Principi, E., Squartini, S., Bonfigli, R., Ferroni, G., Piazza, F.: An integrated system for voice command recognition and emergency detection based on audio signals. Expert Syst. Appl. 42(13), 5668–5683 (2015)CrossRef Principi, E., Squartini, S., Bonfigli, R., Ferroni, G., Piazza, F.: An integrated system for voice command recognition and emergency detection based on audio signals. Expert Syst. Appl. 42(13), 5668–5683 (2015)CrossRef
23.
go back to reference Principi, E., Squartini, S., Piazza, F.: Power normalized cepstral coefficients based supervectors and i-vectors for small vocabulary speech recognition. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), Beijing, China, pp. 3562–3568, 6–11 July 2014 Principi, E., Squartini, S., Piazza, F.: Power normalized cepstral coefficients based supervectors and i-vectors for small vocabulary speech recognition. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), Beijing, China, pp. 3562–3568, 6–11 July 2014
24.
go back to reference Renals, S., Swietojanski, P.: Neural networks for distant speech recognition. In: Proceedings of HSCMA, pp. 172–176 (2014) Renals, S., Swietojanski, P.: Neural networks for distant speech recognition. In: Proceedings of HSCMA, pp. 172–176 (2014)
25.
go back to reference Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: WSJ-CAM0: a british english corpus for large vocabulary continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (1994) Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: WSJ-CAM0: a british english corpus for large vocabulary continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (1994)
26.
go back to reference Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)CrossRef Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986)CrossRef
27.
go back to reference Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)CrossRef Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)CrossRef
28.
go back to reference Swietojanski, P., Ghoshal, A., Renals, S.: Convolutional neural networks for distant speech recognition. IEEE Signal Process. Lett. 21(9), 1120–1124 (2014)CrossRef Swietojanski, P., Ghoshal, A., Renals, S.: Convolutional neural networks for distant speech recognition. IEEE Signal Process. Lett. 21(9), 1120–1124 (2014)CrossRef
29.
go back to reference Xiao, X., Watanabe, S., Erdogan, H., Lu, L., Hershey, J., Seltzer, M., Chen, G., Zhang, Y., Mandel, M., Yu, D.: Deep beamforming networks for multi-channel speech recognition. In: Proceedings of ICASSP, pp. 5745–5749 (2016) Xiao, X., Watanabe, S., Erdogan, H., Lu, L., Hershey, J., Seltzer, M., Chen, G., Zhang, Y., Mandel, M., Yu, D.: Deep beamforming networks for multi-channel speech recognition. In: Proceedings of ICASSP, pp. 5745–5749 (2016)
30.
go back to reference Xiao, X., Zhao, S., Zhong, X., Jones, D.L., Chng, E.S., Li, H.: A learning-based approach to direction of arrival estimation in noisy and reverberant environments. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2814–2818. IEEE (2015) Xiao, X., Zhao, S., Zhong, X., Jones, D.L., Chng, E.S., Li, H.: A learning-based approach to direction of arrival estimation in noisy and reverberant environments. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2814–2818. IEEE (2015)
31.
go back to reference Yoganathan, V., Moir, T.: Multi-microphone adaptive neural switched Griffiths-Jim beamformer for noise reduction. In: Proceedings of the 10th International Conference on Signal Processing, pp. 299–302 (2010) Yoganathan, V., Moir, T.: Multi-microphone adaptive neural switched Griffiths-Jim beamformer for noise reduction. In: Proceedings of the 10th International Conference on Signal Processing, pp. 299–302 (2010)
32.
go back to reference Zhang, H., Zhang, X., Gao, G.: Multi-channel speech enhancement based on deep stacking network. In: Proceedings of the 4th CHiME Speech Separation and Recognition Challenge, San Francisco, CA, USA (2016) Zhang, H., Zhang, X., Gao, G.: Multi-channel speech enhancement based on deep stacking network. In: Proceedings of the 4th CHiME Speech Separation and Recognition Challenge, San Francisco, CA, USA (2016)
Metadata
Title
Neural Beamforming for Speech Enhancement: Preliminary Results
Authors
Stefano Tomassetti
Leonardo Gabrielli
Emanuele Principi
Daniele Ferretti
Stefano Squartini
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-319-95098-3_4

Premium Partner