Skip to main content
Erschienen in:

07.08.2024 | Short Paper

A Deep Hybrid Model for Stereophonic Acoustic Echo Control

verfasst von: Yang Liu, Sichen Liu, Feiran Yang, Jun Yang

Erschienen in: Circuits, Systems, and Signal Processing | Ausgabe 12/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a deep hybrid model for stereophonic acoustic echo cancellation (SAEC). A two-stage model is considered, i.e., a deep-learning-based Kalman filter (DeepKalman) and a gated convolutional recurrent network (GCRN)-based postfilter, which are jointly trained in an end-to-end manner. The difference between the proposed DeepKalman filter and the conventional one is twofold. First, the input signal of the DeepKalman filter is a combination of the original two far-end signals and the nonlinear reference signal estimated from the microphone signal directly. Second, a low-complexity recurrent neural network is utilized to estimate the covariance of the process noise, which can enhance the tracking capability of the DeepKalman filter. In the second stage, we adopt GCRN to suppress residual echo and noise by estimating complex masks applied to the output signal of the first stage. Computer simulations confirm the performance advantage of the proposed method over existing SAEC algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Weitere Produktempfehlungen anzeigen
Literatur
3.
Zurück zum Zitat J. Benesty, T. Gänsler, D.R. Morgan, M.M. Sondhi, S.L. Gay, Advances in Network and Acoustic Echo Cancellation, 1st edn. (Springer, Berlin, 2001)CrossRef J. Benesty, T. Gänsler, D.R. Morgan, M.M. Sondhi, S.L. Gay, Advances in Network and Acoustic Echo Cancellation, 1st edn. (Springer, Berlin, 2001)CrossRef
4.
Zurück zum Zitat Z. Chen, X. Xia, S. Sun, Z. Wang, C. Chen, G. Xie, P. Zhang, Y. Xiao, A progressive neural network for acoustic echo cancellation, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Rhodes Island, Greece, 2023), pp .1–2 Z. Chen, X. Xia, S. Sun, Z. Wang, C. Chen, G. Xie, P. Zhang, Y. Xiao, A progressive neural network for acoustic echo cancellation, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Rhodes Island, Greece, 2023), pp .1–2
6.
Zurück zum Zitat L. Cheng, C. Zheng, A. Li, Y. Wu, R. Peng, X. Li, A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation, in INTERSPEECH, Incheon, Korea (2022), pp. 2508–2512 L. Cheng, C. Zheng, A. Li, Y. Wu, R. Peng, X. Li, A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation, in INTERSPEECH, Incheon, Korea (2022), pp. 2508–2512
8.
Zurück zum Zitat J. Franzen, T. Fingscheidt, Deep residual echo suppression and noise reduction: a multi-input FCRN approach in a hybrid speech enhancement system, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 666–670 J. Franzen, T. Fingscheidt, Deep residual echo suppression and noise reduction: a multi-input FCRN approach in a hybrid speech enhancement system, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 666–670
9.
Zurück zum Zitat T. Haubner, M.M. Halimeh, A. Brendel, W. Kellermann, A synergistic Kalman- and deep postfiltering approach to acoustic echo cancellation, in 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland (2021), pp. 990–994 T. Haubner, M.M. Halimeh, A. Brendel, W. Kellermann, A synergistic Kalman- and deep postfiltering approach to acoustic echo cancellation, in 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland (2021), pp. 990–994
10.
Zurück zum Zitat A. Ivry, I. Cohen, B. Berdugo, Objective metrics to evaluate residual-echo suppression during double-talk in the stereophonic case, in INTERSPEECH, Incheon, Korea (2022), pp. 5348–5352 A. Ivry, I. Cohen, B. Berdugo, Objective metrics to evaluate residual-echo suppression during double-talk in the stereophonic case, in INTERSPEECH, Incheon, Korea (2022), pp. 5348–5352
11.
Zurück zum Zitat A. Ivry, I. Cohen, B. Berdugo, Deep adaptation control for stereophonic acoustic echo cancellation, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA (2023), pp. 1–5 A. Ivry, I. Cohen, B. Berdugo, Deep adaptation control for stereophonic acoustic echo cancellation, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA (2023), pp. 1–5
15.
Zurück zum Zitat V. Panayotov, G. Chen, D. Povey, S. Khudanpur, LibriSpeech: an ASR corpus based on public domain audio books, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (South Brisbane, Queensland, Australia, 2015), pp. 5206–5210 V. Panayotov, G. Chen, D. Povey, S. Khudanpur, LibriSpeech: an ASR corpus based on public domain audio books, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (South Brisbane, Queensland, Australia, 2015), pp. 5206–5210
17.
Zurück zum Zitat C.K.A. Reddy, H. Dubey, K. Koishida, A. Nair, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, S. Srinivasan, INTERSPEECH 2021 deep noise suppression challenge, in INTERSPEECH, Brno, Czechia (2021), pp. 2796–2800 C.K.A. Reddy, H. Dubey, K. Koishida, A. Nair, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, S. Srinivasan, INTERSPEECH 2021 deep noise suppression challenge, in INTERSPEECH, Brno, Czechia (2021), pp. 2796–2800
18.
Zurück zum Zitat A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (Salt Lake City, UT, USA, 2001), pp. 749–752 A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (Salt Lake City, UT, USA, 2001), pp. 749–752
19.
Zurück zum Zitat J.L. Roux, S. Wisdom, H. Erdogan, J.R. Hershey, SDR-half-baked or well done?, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (Brighton, UK, 2019), pp. 626–630 J.L. Roux, S. Wisdom, H. Erdogan, J.R. Hershey, SDR-half-baked or well done?, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). (Brighton, UK, 2019), pp. 626–630
21.
Zurück zum Zitat J.-M. Valin, S. Tenneti, K. Helwani, U. Isik, A. Krishnaswamy, Low-complexity, real-time joint neural echo control and speech enhancement based on PercepNet, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Canada, Toronto, 2021), pp. 7133–7137 J.-M. Valin, S. Tenneti, K. Helwani, U. Isik, A. Krishnaswamy, Low-complexity, real-time joint neural echo control and speech enhancement based on PercepNet, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Canada, Toronto, 2021), pp. 7133–7137
23.
Zurück zum Zitat Z. Wang, Y. Na, Z. Liu, B. Tian, Q. Fu, Weighted recursive least square filter and neural network based residual echo suppression for the AEC-Challenge, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Canada, Toronto, 2021), pp. 141–145 Z. Wang, Y. Na, Z. Liu, B. Tian, Q. Fu, Weighted recursive least square filter and neural network based residual echo suppression for the AEC-Challenge, in IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP) (Canada, Toronto, 2021), pp. 141–145
24.
Zurück zum Zitat D.S. Williamson, Y. Wang, D. Wang, Complex ratio masking for joint enhancement of magnitude and phase, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai (2016), pp. 5220–5224 D.S. Williamson, Y. Wang, D. Wang, Complex ratio masking for joint enhancement of magnitude and phase, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai (2016), pp. 5220–5224
29.
Zurück zum Zitat C. Zhang, J. Liu, X. Zhang, LCSM: a lightweight complex spectral mapping framework for stereophonic acoustic echo cancellation, in INTERSPEECH, Incheon, Korea (2022), pp. 2523–2527 C. Zhang, J. Liu, X. Zhang, LCSM: a lightweight complex spectral mapping framework for stereophonic acoustic echo cancellation, in INTERSPEECH, Incheon, Korea (2022), pp. 2523–2527
30.
Zurück zum Zitat G. Zhang, L. Yu, C. Wang, J. Wei, Multi-scale temporal frequency convolutional network with axial attention for speech enhancement, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 9122–9126 G. Zhang, L. Yu, C. Wang, J. Wei, Multi-scale temporal frequency convolutional network with axial attention for speech enhancement, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 9122–9126
32.
Zurück zum Zitat H. Zhang, S. Kandadai, H. Rao, M. Kim, T. Pruthi, T. Kristjansson, Deep adaptive AEC: hybrid of deep learning and adaptive acoustic echo cancellation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 756–760 H. Zhang, S. Kandadai, H. Rao, M. Kim, T. Pruthi, T. Kristjansson, Deep adaptive AEC: hybrid of deep learning and adaptive acoustic echo cancellation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 756–760
33.
Zurück zum Zitat S. Zhang, Z. Wang, J. Sun, Y. Fu, B. Tian, Q. Fu, L. Xie, Multi-task deep residual echo suppression with echo-aware loss, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 9127–9131 S. Zhang, Z. Wang, J. Sun, Y. Fu, B. Tian, Q. Fu, L. Xie, Multi-task deep residual echo suppression with echo-aware loss, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore (2022), pp. 9127–9131
34.
Zurück zum Zitat Y. Zhang, M. Yu, H. Zhang, D. Yu, D. Wang, NeuralKalman: a learnable kalman filter for acoustic echo cancellation, in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Taipei, Taiwan (2023), pp. 1–7 Y. Zhang, M. Yu, H. Zhang, D. Yu, D. Wang, NeuralKalman: a learnable kalman filter for acoustic echo cancellation, in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Taipei, Taiwan (2023), pp. 1–7
Metadaten
Titel
A Deep Hybrid Model for Stereophonic Acoustic Echo Control
verfasst von
Yang Liu
Sichen Liu
Feiran Yang
Jun Yang
Publikationsdatum
07.08.2024
Verlag
Springer US
Erschienen in
Circuits, Systems, and Signal Processing / Ausgabe 12/2024
Print ISSN: 0278-081X
Elektronische ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-024-02807-x