nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Improving Low-Latency Mono-Channel Speech Enhancement by Compensation Windows in STFT Analysis

verfasst von : Minh N. Bui, Dung N. Tran, Kazuhito Koishida, Trac D. Tran, Peter Chin

Erschienen in: Complex Networks & Their Applications XII

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Speech enhancement is a key component in voice communication technology as it serves as an important pre-processing step for systems such as acoustic echo cancellation, speech separation, speech conversions, etc. A low-latency speech enhancement algorithm is desirable since long latency means delaying the entire system’s response. In STFT-based systems, reducing algorithmic latency by using smaller STFT window sizes leads to significant degradation in speech quality. By introducing a simple additional compensation window along with the original short main window in the analysis step of STFT, we preserve signal quality – comparable to that of the original high latency system while reducing the algorithmic latency from 42 ms to 5 ms. Experiments on the full-band VCD dataset and a large full-band Microsoft’s internal dataset show the effectiveness of the proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Investigating Bias in YouTube Recommendations: Emotion, Morality, and Network Dynamics in China-Uyghur Content

Nächstes Kapitel Filtering Communities in Word Co-Occurrence Networks to Foster the Emergence of Meaning

Allen, J.: Short term spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 25(3), 235–238 (1977). https://doi.org/10.1109/TASSP.1977.1162950CrossRef

Braun, S., Gamper, H., Reddy, C.K.A., Tashev, I.: Towards efficient models for real-time deep noise suppression (2021). https://doi.org/10.48550/ARXIV.2101.09249, https://arxiv.org/abs/2101.09249

Dubey, H., et al.: Deep speech enhancement challenge at ICASSP 2023. In: ICASSP (2023)

Dubey, H., et al.: ICASSP 2022 deep noise suppression challenge. In: ICASSP (2022)

Graetzer, S., et al.: Clarity-2021 challenges: machine learning challenges for advancing hearing aid processing. In: Interspeech (2021)

Li, C.Y., Vu, N.T.: Improving speech recognition on noisy speech via speech enhancement with multi-discriminators CycleGAN. In: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 830–836 (2021). https://api.semanticscholar.org/CorpusID:245123920

Li, Q., Gao, F., Guan, H., Ma, K.: Real-time monaural speech enhancement with short-time discrete cosine transform (2021). https://doi.org/10.48550/ARXIV.2102.04629, https://arxiv.org/abs/2102.04629

Pandey, A., Liu, C., Wang, Y., Saraf, Y.: Dual application of speech enhancement for automatic speech recognition. In: IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, 19-22 January 2021, pp. 223–228. IEEE (2021). https://doi.org/10.1109/SLT48900.2021.9383624, https://doi.org/10.1109/SLT48900.2021.9383624

Rix, A.W., Beerends, J.G., Hollier, M., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221), vol. 2, pp. 749–752 (2001). https://api.semanticscholar.org/CorpusID:5325454

10.

Schröter, H., Escalante, A.N., Rosenkranz, T., Maier, A.K.: DeepFilternet: a low complexity speech enhancement framework for full-band audio based on deep filtering. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7407–7411 (2021). https://api.semanticscholar.org/CorpusID:238634774

11.

Taal, C., Hendriks, R., Heusdens, R., Jensen, J.: A short-time objective intelligibility measure for time-frequency weighted noisy speech, pp. 4214 – 4217 (2010). https://doi.org/10.1109/ICASSP.2010.5495701

12.

Taherian, H., Eskimez, S.E., Yoshioka, T., Wang, H., Chen, Z., Huang, X.: One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 271–275 (2021). https://api.semanticscholar.org/CorpusID:239049883

13.

Valentini-Botinhao, C.: Noisy speech database for training speech enhancement algorithms and TTS models (2017)

14.

Vihari, S., Murthy, A., Soni, P., Naik, D.: Comparison of speech enhancement algorithms. Procedia Comput. Sci. 89, 666–676 (2016). https://doi.org/10.1016/j.procs.2016.06.032CrossRef

15.

Wang, Z.Q., Wichern, G., Watanabe, S., Roux, J.L.: STFT-domain neural speech enhancement with very low algorithmic latency. IEEE/ACM Trans. Audio Speech Lang. Process. 31, 397–410 (2022). https://api.semanticscholar.org/CorpusID:248300088

16.

Westhausen, N.L., Meyer, B.T.: Acoustic Echo Cancellation with the Dual-Signal Transformation LSTM Network. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7138–7142 (2021). https://doi.org/10.1109/ICASSP39728.2021.9413510

17.

Wisdom, S., Hershey, J.R., Wilson, K.W., Thorpe, J., Chinen, M., Patton, B., Saurous, R.A.: Differentiable consistency constraints for improved deep speech enhancement. CoRR abs/1811.08521 (2018). http://arxiv.org/abs/1811.08521

18.

Wood, S.U.N., Rouat, J.: Unsupervised low latency speech enhancement with RT-GCC-NMF. IEEE Journal of Selected Topics in Signal Processing 13(2), 332–346 (2019). https://doi.org/10.1109/jstsp.2019.2909193

19.

Zhang, G., Yu, L., Wang, C., Wei, J.: Multi-scale temporal frequency convolutional network with axial attention for speech enhancement. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 9122–9126 (2022). https://doi.org/10.1109/ICASSP43922.2022.9746610

20.

Zhang, Z., Zhang, L., Zhuang, X., Qian, Y., Li, H., Wang, M.: FB-MSTCN: a full-band single-channel speech enhancement method based on multi-scale temporal convolutional network (2022). https://doi.org/10.48550/ARXIV.2203.07684, https://arxiv.org/abs/2203.07684

21.

Zhao, S., Ma, B., Watcharasupat, K.N., Gan, W.S.: FRCRN: boosting feature representation using frequency recurrence for monaural speech enhancement (2022). https://doi.org/10.48550/ARXIV.2206.07293, https://arxiv.org/abs/2206.07293

Titel: Improving Low-Latency Mono-Channel Speech Enhancement by Compensation Windows in STFT Analysis
verfasst von: Minh N. Bui
Dung N. Tran
Kazuhito Koishida
Trac D. Tran
Peter Chin
Verlag: Springer Nature Switzerland
Buch: Complex Networks & Their Applications XII
Print ISBN: 978-3-031-53467-6

Electronic ISBN: 978-3-031-53468-3

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-3-031-53468-3_31

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner