Skip to main content
Erschienen in:

16.06.2024

CST-UNet: Cross Swin Transformer Enhanced U-Net with Masked Bottleneck for Single-Channel Speech Enhancement

verfasst von: Zipeng Zhang, Wei Chen, Weiwei Guo, Yiming Liu, Jianhua Yang, Houguang Liu

Erschienen in: Circuits, Systems, and Signal Processing | Ausgabe 9/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speech enhancement performance has improved significantly with the introduction of deep learning models, especially methods based on the Long–Short-Term Memory architecture. However, these methods face challenges such as high computational complexity and redundancy of input features. To address these issues, we propose a U-Net-based approach that utilizes an encoder/decoder to extract more concise features, thereby enhancing single-channel speech performance and reducing computation complexity. The proposed method includes a Cross-Swin-Transformer block and a masked bottleneck module, which down-samples features while preserving the detailed representation through skip connections and carefully designed blocks. The bottleneck module extracts coarse representations of hidden features as masks. We evaluated our method against other U-Net-based approaches on VCTK and DNS corpora using CBAK, eSTOI, PESQ, STOI, and SI-SDR metrics. The results demonstrate that the proposed method achieves promising performance while significantly reducing computational complexity.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Weitere Produktempfehlungen anzeigen
Anhänge
Nur mit Berechtigung zugänglich
Literatur
3.
Zurück zum Zitat Z. Chen, Y. Huang, J. Li, Y. Gong, Interspeech (2017), pp. 3632–3636 Z. Chen, Y. Huang, J. Li, Y. Gong, Interspeech (2017), pp. 3632–3636
4.
Zurück zum Zitat H.-S. Choi, J.-H. Kim, J. Huh, A. Kim, J.-W. Ha, K. Lee, in International Conference on Learning Representations (2019) H.-S. Choi, J.-H. Kim, J. Huh, A. Kim, J.-W. Ha, K. Lee, in International Conference on Learning Representations (2019)
6.
Zurück zum Zitat Y. Fu, Y. Liu, J. Li, D. Luo, S. Lv, Y. Jv, L. Xie, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2022), pp. 7417–7421 Y. Fu, Y. Liu, J. Li, D. Luo, S. Lv, Y. Jv, L. Xie, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2022), pp. 7417–7421
7.
Zurück zum Zitat K. Ghorpade, A. Khaparde, Circuits Syst. Signal Process. 1 (2023) K. Ghorpade, A. Khaparde, Circuits Syst. Signal Process. 1 (2023)
8.
Zurück zum Zitat R. Giri, U. Isik, A. Krishnaswamy, in 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (IEEE, 2019), pp. 249–253 R. Giri, U. Isik, A. Krishnaswamy, in 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (IEEE, 2019), pp. 249–253
9.
Zurück zum Zitat A. Graves, N. Jaitly, and A.-r. Mohamed, in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE, 2013), pp. 273–278 A. Graves, N. Jaitly, and A.-r. Mohamed, in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE, 2013), pp. 273–278
10.
Zurück zum Zitat M.P. Heinrich, M. Stille, T.M. Buzug, Current Dir. Biomed. Eng. 4, 297 (2018)CrossRef M.P. Heinrich, M. Stille, T.M. Buzug, Current Dir. Biomed. Eng. 4, 297 (2018)CrossRef
11.
13.
Zurück zum Zitat Y. Hu, P.C. Loizou, IEEE Trans. Audio Speech Lang. Process. 16, 229 (2007) Y. Hu, P.C. Loizou, IEEE Trans. Audio Speech Lang. Process. 16, 229 (2007)
15.
Zurück zum Zitat T. Jiang, R. Liang, Q. Wang, Y. Bao, Circuits Syst. Signal Process. 37, 1243 (2018) T. Jiang, R. Liang, Q. Wang, Y. Bao, Circuits Syst. Signal Process. 37, 1243 (2018)
16.
Zurück zum Zitat H.K. Kathania, S. Shahnawazuddin, W. Ahmad, N. Adiga, Circuits Syst. Signal Process. 38, 4667 (2019) H.K. Kathania, S. Shahnawazuddin, W. Ahmad, N. Adiga, Circuits Syst. Signal Process. 38, 4667 (2019)
17.
Zurück zum Zitat M.I. Khattak, N. Saleem, J. Gao, E. Verdu, J.P. Fuente, Comput. Electr. Eng. 100, 107887 (2022)CrossRef M.I. Khattak, N. Saleem, J. Gao, E. Verdu, J.P. Fuente, Comput. Electr. Eng. 100, 107887 (2022)CrossRef
18.
Zurück zum Zitat B.K. Khonglah, A. Dey, S.M. Prasanna, Circuits Syst. Signal Process. 38, 643 (2019) B.K. Khonglah, A. Dey, S.M. Prasanna, Circuits Syst. Signal Process. 38, 643 (2019)
19.
Zurück zum Zitat J. Kim, M. El-Khamy, and J. Lee, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp) (IEEE, 2020), pp. 6649–6653 J. Kim, M. El-Khamy, and J. Lee, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp) (IEEE, 2020), pp. 6649–6653
23.
Zurück zum Zitat Y. Li, Y. Sun, W. Wang, S.M. Naqvi, IEEE/ACM Trans. Audio Speech Lang. Process. (2023) Y. Li, Y. Sun, W. Wang, S.M. Naqvi, IEEE/ACM Trans. Audio Speech Lang. Process. (2023)
24.
Zurück zum Zitat Y. Liu, M. Delfarah, D. Wang, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 6354–6358 Y. Liu, M. Delfarah, D. Wang, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 6354–6358
25.
Zurück zum Zitat Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, in Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 10012–10022 Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, in Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 10012–10022
26.
Zurück zum Zitat X. Luo, C. Zheng, A. Li, Y. Ke, X. Li, in Interspeech (2022) X. Luo, C. Zheng, A. Li, Y. Ke, X. Li, in Interspeech (2022)
27.
Zurück zum Zitat Y. Luo, N. Mesgarani, IEEE/ACM Trans. Audio Speech Lang. Process. 27, 1256 (2019) Y. Luo, N. Mesgarani, IEEE/ACM Trans. Audio Speech Lang. Process. 27, 1256 (2019)
28.
Zurück zum Zitat G. Manogaran, N. Chilamkurti, C.-H. Hsu, Circuits Syst. Signal Process. 39, 515 (2020) G. Manogaran, N. Chilamkurti, C.-H. Hsu, Circuits Syst. Signal Process. 39, 515 (2020)
29.
Zurück zum Zitat F. Mathieu, T. Courtat, G. Richard, G. Peeters, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp) (IEEE, 2022), pp. 531–535 F. Mathieu, T. Courtat, G. Richard, G. Peeters, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp) (IEEE, 2022), pp. 531–535
31.
Zurück zum Zitat A. Pandey, D. Wang, IEEE/ACM Trans. Audio Speech Lang. Process. (2023) A. Pandey, D. Wang, IEEE/ACM Trans. Audio Speech Lang. Process. (2023)
32.
Zurück zum Zitat M. Parchami, W.-P. Zhu, B. Champagne, E. Plourde, IEEE Circuits Syst. Mag. 16, 45 (2016)CrossRef M. Parchami, W.-P. Zhu, B. Champagne, E. Plourde, IEEE Circuits Syst. Mag. 16, 45 (2016)CrossRef
33.
Zurück zum Zitat A. Rahimi, T. Afouras, A. Zisserman, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 10493–10502 A. Rahimi, T. Afouras, A. Zisserman, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 10493–10502
34.
Zurück zum Zitat C. K. Reddy, H. Dubey, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, S. Srinivasan, in ICASSP (2021) C. K. Reddy, H. Dubey, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, S. Srinivasan, in ICASSP (2021)
35.
Zurück zum Zitat C.K.A. Reddy, N. Shankar, G.S. Bhat, R. Charan, I. Panahi, IEEE Signal Process. Lett. 24, 1601 (2017)CrossRef C.K.A. Reddy, N. Shankar, G.S. Bhat, R. Charan, I. Panahi, IEEE Signal Process. Lett. 24, 1601 (2017)CrossRef
36.
Zurück zum Zitat A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. proceedings (cat. no. 01ch37221), vol. 2 (IEEE, 2001), pp. 749–752 A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. proceedings (cat. no. 01ch37221), vol. 2 (IEEE, 2001), pp. 749–752
37.
Zurück zum Zitat O. Ronneberger, P. Fischer, T. Brox, in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part iii 18 (Springer, 2015), pp. 234–241 O. Ronneberger, P. Fischer, T. Brox, in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part iii 18 (Springer, 2015), pp. 234–241
38.
40.
Zurück zum Zitat H. Shi, L. Wang, S. Li, J. Dang, T. Kawahara, Proceedings Interspeech 2022, 221 (2022) H. Shi, L. Wang, S. Li, J. Dang, T. Kawahara, Proceedings Interspeech 2022, 221 (2022)
41.
Zurück zum Zitat X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, W.-c. Woo, Advances in Neural Information Processing Systems 28 (2015) X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, W.-c. Woo, Advances in Neural Information Processing Systems 28 (2015)
42.
Zurück zum Zitat G. Skantze, in Proceedings of the 18th Annual SIGDIAL Meeting on Discourse and Dialogue (2017), pp. 220–230 G. Skantze, in Proceedings of the 18th Annual SIGDIAL Meeting on Discourse and Dialogue (2017), pp. 220–230
44.
Zurück zum Zitat M. Strake, B. Defraene, K. Fluyt, W. Tirry, T. Fingscheidt, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 6674–6678 M. Strake, B. Defraene, K. Fluyt, W. Tirry, T. Fingscheidt, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 6674–6678
45.
Zurück zum Zitat C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2010), pp. 4214–4217 C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2010), pp. 4214–4217
46.
Zurück zum Zitat C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, IEEE Trans. Audio Speech Lang. Process. 19, 2125 (2011) C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, IEEE Trans. Audio Speech Lang. Process. 19, 2125 (2011)
47.
Zurück zum Zitat J. Thiemann, N. Ito, E. Vincent, in Proceedings Meetings Acoust (2013), pp. 1–6 J. Thiemann, N. Ito, E. Vincent, in Proceedings Meetings Acoust (2013), pp. 1–6
48.
Zurück zum Zitat A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Advances in Neural Information Processing Systems 30 (2017) A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Advances in Neural Information Processing Systems 30 (2017)
49.
Zurück zum Zitat C. Veaux, J. Yamagishi, K. MacDonald, et al. (2016) C. Veaux, J. Yamagishi, K. MacDonald, et al. (2016)
50.
Zurück zum Zitat T. Vuong, R.M. Stern12, Proceedings Interspeech 2022, 206 (2022) T. Vuong, R.M. Stern12, Proceedings Interspeech 2022, 206 (2022)
51.
Zurück zum Zitat F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J.R. Hershey, B. Schuller, in Latent Variable Analysis and Signal Separation: 12th International Conference, LVA/ICA 2015, Liberec, Czech Republic, August 25–28, 2015, proceedings 12 (Springer, 2015), pp. 91–99 F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J.R. Hershey, B. Schuller, in Latent Variable Analysis and Signal Separation: 12th International Conference, LVA/ICA 2015, Liberec, Czech Republic, August 25–28, 2015, proceedings 12 (Springer, 2015), pp. 91–99
52.
Zurück zum Zitat X. Xu and J. Hao, in 2022 26th International Conference on Pattern Recognition (Icpr) (IEEE, 2022), pp. 663–369 X. Xu and J. Hao, in 2022 26th International Conference on Pattern Recognition (Icpr) (IEEE, 2022), pp. 663–369
53.
Zurück zum Zitat D.-H. Yang, J.-H. Chang, J. King Saud Univ.-Comput. Inf. Sci. 35, 202 (2023) D.-H. Yang, J.-H. Chang, J. King Saud Univ.-Comput. Inf. Sci. 35, 202 (2023)
54.
Zurück zum Zitat T.-H. Zhang, Q. Liu, X. Qian, S.-L. Chen, F. Chen, X.-C. Yin, in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2023), pp. 1–5 T.-H. Zhang, Q. Liu, X. Qian, S.-L. Chen, F. Chen, X.-C. Yin, in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2023), pp. 1–5
55.
Zurück zum Zitat Y. Zhao, D. Wang, in Interspeech, vol. 2020 (2020), pp. 3261–3265 Y. Zhao, D. Wang, in Interspeech, vol. 2020 (2020), pp. 3261–3265
Metadaten
Titel
CST-UNet: Cross Swin Transformer Enhanced U-Net with Masked Bottleneck for Single-Channel Speech Enhancement
verfasst von
Zipeng Zhang
Wei Chen
Weiwei Guo
Yiming Liu
Jianhua Yang
Houguang Liu
Publikationsdatum
16.06.2024
Verlag
Springer US
Erschienen in
Circuits, Systems, and Signal Processing / Ausgabe 9/2024
Print ISSN: 0278-081X
Elektronische ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-024-02736-9