Skip to main content
Top
Published in:

16-06-2024

CST-UNet: Cross Swin Transformer Enhanced U-Net with Masked Bottleneck for Single-Channel Speech Enhancement

Authors: Zipeng Zhang, Wei Chen, Weiwei Guo, Yiming Liu, Jianhua Yang, Houguang Liu

Published in: Circuits, Systems, and Signal Processing | Issue 9/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech enhancement performance has improved significantly with the introduction of deep learning models, especially methods based on the Long–Short-Term Memory architecture. However, these methods face challenges such as high computational complexity and redundancy of input features. To address these issues, we propose a U-Net-based approach that utilizes an encoder/decoder to extract more concise features, thereby enhancing single-channel speech performance and reducing computation complexity. The proposed method includes a Cross-Swin-Transformer block and a masked bottleneck module, which down-samples features while preserving the detailed representation through skip connections and carefully designed blocks. The bottleneck module extracts coarse representations of hidden features as masks. We evaluated our method against other U-Net-based approaches on VCTK and DNS corpora using CBAK, eSTOI, PESQ, STOI, and SI-SDR metrics. The results demonstrate that the proposed method achieves promising performance while significantly reducing computational complexity.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik. 

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Show more products
Appendix
Available only for authorised users
Literature
3.
go back to reference Z. Chen, Y. Huang, J. Li, Y. Gong, Interspeech (2017), pp. 3632–3636 Z. Chen, Y. Huang, J. Li, Y. Gong, Interspeech (2017), pp. 3632–3636
4.
go back to reference H.-S. Choi, J.-H. Kim, J. Huh, A. Kim, J.-W. Ha, K. Lee, in International Conference on Learning Representations (2019) H.-S. Choi, J.-H. Kim, J. Huh, A. Kim, J.-W. Ha, K. Lee, in International Conference on Learning Representations (2019)
6.
go back to reference Y. Fu, Y. Liu, J. Li, D. Luo, S. Lv, Y. Jv, L. Xie, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2022), pp. 7417–7421 Y. Fu, Y. Liu, J. Li, D. Luo, S. Lv, Y. Jv, L. Xie, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2022), pp. 7417–7421
7.
go back to reference K. Ghorpade, A. Khaparde, Circuits Syst. Signal Process. 1 (2023) K. Ghorpade, A. Khaparde, Circuits Syst. Signal Process. 1 (2023)
8.
go back to reference R. Giri, U. Isik, A. Krishnaswamy, in 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (IEEE, 2019), pp. 249–253 R. Giri, U. Isik, A. Krishnaswamy, in 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (IEEE, 2019), pp. 249–253
9.
go back to reference A. Graves, N. Jaitly, and A.-r. Mohamed, in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE, 2013), pp. 273–278 A. Graves, N. Jaitly, and A.-r. Mohamed, in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE, 2013), pp. 273–278
10.
go back to reference M.P. Heinrich, M. Stille, T.M. Buzug, Current Dir. Biomed. Eng. 4, 297 (2018)CrossRef M.P. Heinrich, M. Stille, T.M. Buzug, Current Dir. Biomed. Eng. 4, 297 (2018)CrossRef
13.
go back to reference Y. Hu, P.C. Loizou, IEEE Trans. Audio Speech Lang. Process. 16, 229 (2007) Y. Hu, P.C. Loizou, IEEE Trans. Audio Speech Lang. Process. 16, 229 (2007)
15.
go back to reference T. Jiang, R. Liang, Q. Wang, Y. Bao, Circuits Syst. Signal Process. 37, 1243 (2018) T. Jiang, R. Liang, Q. Wang, Y. Bao, Circuits Syst. Signal Process. 37, 1243 (2018)
16.
go back to reference H.K. Kathania, S. Shahnawazuddin, W. Ahmad, N. Adiga, Circuits Syst. Signal Process. 38, 4667 (2019) H.K. Kathania, S. Shahnawazuddin, W. Ahmad, N. Adiga, Circuits Syst. Signal Process. 38, 4667 (2019)
17.
go back to reference M.I. Khattak, N. Saleem, J. Gao, E. Verdu, J.P. Fuente, Comput. Electr. Eng. 100, 107887 (2022)CrossRef M.I. Khattak, N. Saleem, J. Gao, E. Verdu, J.P. Fuente, Comput. Electr. Eng. 100, 107887 (2022)CrossRef
18.
go back to reference B.K. Khonglah, A. Dey, S.M. Prasanna, Circuits Syst. Signal Process. 38, 643 (2019) B.K. Khonglah, A. Dey, S.M. Prasanna, Circuits Syst. Signal Process. 38, 643 (2019)
19.
go back to reference J. Kim, M. El-Khamy, and J. Lee, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp) (IEEE, 2020), pp. 6649–6653 J. Kim, M. El-Khamy, and J. Lee, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp) (IEEE, 2020), pp. 6649–6653
23.
go back to reference Y. Li, Y. Sun, W. Wang, S.M. Naqvi, IEEE/ACM Trans. Audio Speech Lang. Process. (2023) Y. Li, Y. Sun, W. Wang, S.M. Naqvi, IEEE/ACM Trans. Audio Speech Lang. Process. (2023)
24.
go back to reference Y. Liu, M. Delfarah, D. Wang, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 6354–6358 Y. Liu, M. Delfarah, D. Wang, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 6354–6358
25.
go back to reference Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, in Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 10012–10022 Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, in Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 10012–10022
26.
go back to reference X. Luo, C. Zheng, A. Li, Y. Ke, X. Li, in Interspeech (2022) X. Luo, C. Zheng, A. Li, Y. Ke, X. Li, in Interspeech (2022)
27.
go back to reference Y. Luo, N. Mesgarani, IEEE/ACM Trans. Audio Speech Lang. Process. 27, 1256 (2019) Y. Luo, N. Mesgarani, IEEE/ACM Trans. Audio Speech Lang. Process. 27, 1256 (2019)
28.
go back to reference G. Manogaran, N. Chilamkurti, C.-H. Hsu, Circuits Syst. Signal Process. 39, 515 (2020) G. Manogaran, N. Chilamkurti, C.-H. Hsu, Circuits Syst. Signal Process. 39, 515 (2020)
29.
go back to reference F. Mathieu, T. Courtat, G. Richard, G. Peeters, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp) (IEEE, 2022), pp. 531–535 F. Mathieu, T. Courtat, G. Richard, G. Peeters, in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp) (IEEE, 2022), pp. 531–535
31.
go back to reference A. Pandey, D. Wang, IEEE/ACM Trans. Audio Speech Lang. Process. (2023) A. Pandey, D. Wang, IEEE/ACM Trans. Audio Speech Lang. Process. (2023)
32.
go back to reference M. Parchami, W.-P. Zhu, B. Champagne, E. Plourde, IEEE Circuits Syst. Mag. 16, 45 (2016)CrossRef M. Parchami, W.-P. Zhu, B. Champagne, E. Plourde, IEEE Circuits Syst. Mag. 16, 45 (2016)CrossRef
33.
go back to reference A. Rahimi, T. Afouras, A. Zisserman, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 10493–10502 A. Rahimi, T. Afouras, A. Zisserman, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 10493–10502
34.
go back to reference C. K. Reddy, H. Dubey, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, S. Srinivasan, in ICASSP (2021) C. K. Reddy, H. Dubey, V. Gopal, R. Cutler, S. Braun, H. Gamper, R. Aichner, S. Srinivasan, in ICASSP (2021)
35.
go back to reference C.K.A. Reddy, N. Shankar, G.S. Bhat, R. Charan, I. Panahi, IEEE Signal Process. Lett. 24, 1601 (2017)CrossRef C.K.A. Reddy, N. Shankar, G.S. Bhat, R. Charan, I. Panahi, IEEE Signal Process. Lett. 24, 1601 (2017)CrossRef
36.
go back to reference A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. proceedings (cat. no. 01ch37221), vol. 2 (IEEE, 2001), pp. 749–752 A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. proceedings (cat. no. 01ch37221), vol. 2 (IEEE, 2001), pp. 749–752
37.
go back to reference O. Ronneberger, P. Fischer, T. Brox, in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part iii 18 (Springer, 2015), pp. 234–241 O. Ronneberger, P. Fischer, T. Brox, in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part iii 18 (Springer, 2015), pp. 234–241
40.
go back to reference H. Shi, L. Wang, S. Li, J. Dang, T. Kawahara, Proceedings Interspeech 2022, 221 (2022) H. Shi, L. Wang, S. Li, J. Dang, T. Kawahara, Proceedings Interspeech 2022, 221 (2022)
41.
go back to reference X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, W.-c. Woo, Advances in Neural Information Processing Systems 28 (2015) X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, W.-c. Woo, Advances in Neural Information Processing Systems 28 (2015)
42.
go back to reference G. Skantze, in Proceedings of the 18th Annual SIGDIAL Meeting on Discourse and Dialogue (2017), pp. 220–230 G. Skantze, in Proceedings of the 18th Annual SIGDIAL Meeting on Discourse and Dialogue (2017), pp. 220–230
44.
go back to reference M. Strake, B. Defraene, K. Fluyt, W. Tirry, T. Fingscheidt, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 6674–6678 M. Strake, B. Defraene, K. Fluyt, W. Tirry, T. Fingscheidt, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 6674–6678
45.
go back to reference C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2010), pp. 4214–4217 C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2010), pp. 4214–4217
46.
go back to reference C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, IEEE Trans. Audio Speech Lang. Process. 19, 2125 (2011) C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen, IEEE Trans. Audio Speech Lang. Process. 19, 2125 (2011)
47.
go back to reference J. Thiemann, N. Ito, E. Vincent, in Proceedings Meetings Acoust (2013), pp. 1–6 J. Thiemann, N. Ito, E. Vincent, in Proceedings Meetings Acoust (2013), pp. 1–6
48.
go back to reference A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Advances in Neural Information Processing Systems 30 (2017) A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Advances in Neural Information Processing Systems 30 (2017)
49.
go back to reference C. Veaux, J. Yamagishi, K. MacDonald, et al. (2016) C. Veaux, J. Yamagishi, K. MacDonald, et al. (2016)
50.
go back to reference T. Vuong, R.M. Stern12, Proceedings Interspeech 2022, 206 (2022) T. Vuong, R.M. Stern12, Proceedings Interspeech 2022, 206 (2022)
51.
go back to reference F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J.R. Hershey, B. Schuller, in Latent Variable Analysis and Signal Separation: 12th International Conference, LVA/ICA 2015, Liberec, Czech Republic, August 25–28, 2015, proceedings 12 (Springer, 2015), pp. 91–99 F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J.R. Hershey, B. Schuller, in Latent Variable Analysis and Signal Separation: 12th International Conference, LVA/ICA 2015, Liberec, Czech Republic, August 25–28, 2015, proceedings 12 (Springer, 2015), pp. 91–99
52.
go back to reference X. Xu and J. Hao, in 2022 26th International Conference on Pattern Recognition (Icpr) (IEEE, 2022), pp. 663–369 X. Xu and J. Hao, in 2022 26th International Conference on Pattern Recognition (Icpr) (IEEE, 2022), pp. 663–369
53.
go back to reference D.-H. Yang, J.-H. Chang, J. King Saud Univ.-Comput. Inf. Sci. 35, 202 (2023) D.-H. Yang, J.-H. Chang, J. King Saud Univ.-Comput. Inf. Sci. 35, 202 (2023)
54.
go back to reference T.-H. Zhang, Q. Liu, X. Qian, S.-L. Chen, F. Chen, X.-C. Yin, in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2023), pp. 1–5 T.-H. Zhang, Q. Liu, X. Qian, S.-L. Chen, F. Chen, X.-C. Yin, in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2023), pp. 1–5
55.
go back to reference Y. Zhao, D. Wang, in Interspeech, vol. 2020 (2020), pp. 3261–3265 Y. Zhao, D. Wang, in Interspeech, vol. 2020 (2020), pp. 3261–3265
Metadata
Title
CST-UNet: Cross Swin Transformer Enhanced U-Net with Masked Bottleneck for Single-Channel Speech Enhancement
Authors
Zipeng Zhang
Wei Chen
Weiwei Guo
Yiming Liu
Jianhua Yang
Houguang Liu
Publication date
16-06-2024
Publisher
Springer US
Published in
Circuits, Systems, and Signal Processing / Issue 9/2024
Print ISSN: 0278-081X
Electronic ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-024-02736-9