nach oben

Neural Processing Letters

Erschienen in:

30.05.2021

Online Speech Enhancement by Retraining of LSTM Using SURE Loss and Policy Iteration

verfasst von: Sriharsha Koundinya, Abhijit Karmakar

Erschienen in: Neural Processing Letters | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Speech enhancement is required for improving the quality and intelligibility in various applications such as recognition, hearing aids and other personal assistant devices. Due to the varying acoustic environments, online enhancement is a very significant aspect for its applicability in practical scenarios. This emphasizes the need to observe the environment and enhance the speech accordingly. Adaptive filters were used previously to provide online enhancement, but a neural network based online enhancement has not been proposed previously. In this paper, we employ a unique architecture based on Long- Short Term Memory (LSTM) networks to enhance single channel speech online. The LSTM network is trained online in a novel way by minimizing the Stein’s unbiased risk estimate. This method of retraining helps the network to learn denoising without using a clean sample or ground truth. To avoid training for each and every sample we have used policy iteration with reward function based on ITU-T P.563, the widely-used single ended perceptual measure. The performance of this LSTM retraining can be observed with the increased PESQ of the enhanced speech by 0.53 on average. The proposed method also improves intelligibility which can be seen from the improvement in the metric STOI by 0.22.

Vorheriger Artikel A Neural Network Based System for Efficient Semantic Segmentation of Radar Point Clouds

Nächster Artikel The Study of Sailors’ Brain Activity Difference Before and After Sailing Using Activated Functional Connectivity Pattern

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Braun S, Tashev I (2020) Data augmentation and loss normalization for deep noise suppression. In: International conference on speech and computer. Springer, pp 79–86

Cohen I (2003) Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans Speech Audio Process 11(5):466–475CrossRef

Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Sig Process 81(11):2403–2418CrossRef

Dean I, Robinson BL, Harper NS, McAlpine D (2008) Rapid neural adaptation to sound level statistics. J Neurosci 28(25):6430–6438CrossRef

Eaton J, Gaubitch ND, Moore AH, Naylor PA (2015) The ace challenge–corpus description and performance evaluation. In: 2015 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). IEEE, pp 1–5

Fakoor R, He X, Tashev I, Zarar S (2017) Reinforcement learning to adapt speech enhancement to instantaneous input signal quality. arXiv preprint arXiv:1711.10791

Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232MathSciNetCrossRef

Hadad E, Heese F, Vary P, Gannot S (2014) Multichannel audio database in various acoustic environments. In: 2014 14th international workshop on acoustic signal enhancement (IWAENC). IEEE, pp 313–317

Kala T, Shinozaki T (2018) Reinforcement learning of speech recognition system based on policy gradient and hypothesis selection. In: 2018 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 5759–5763

10.

Koizumi Y, Niwa K, Hioka Y, Kobayashi K, Haneda Y (2017) DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements. 2017 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 81–85

11.

Li R, Liu Y, Shi Y, Dong L, Cui W (2016) Ilmsaf based speech enhancement with DNN and noise classification. Speech Commun 85:53–70CrossRef

12.

Liu B, Tao J, Wen Z, Mo F (2016) Speech enhancement based on analysis-synthesis framework with improved parameter domain enhancement. J Sig Process Syst 82(2):141–150CrossRef

13.

Loizou PC (2013) Speech enhancement: theory and practice. CRC Press, Boca RatonCrossRef

14.

Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder. In: Interspeech, pp 436–440

15.

Mahmmod BM, Ramli AR, Baker T, Al-Obeidat F, Abdulhussain SH, Jassim WA (2019) Speech enhancement algorithm based on super-gaussian modeling and orthogonal polynomials. IEEE Access 7:103485–103504CrossRef

16.

Mai Q, He D, Hou Y, Huang Z (2011) A fast adaptive Kalman filtering algorithm for speech enhancement. In: 2011 IEEE international conference on automation science and engineering. IEEE, pp 327–332

17.

Malfait L, Berger J, Kastner M (2006) P. 563-the itu-t standard for single-ended speech quality assessment. IEEE Trans Audio Speech Lang Process 14(6):1924–1934CrossRef

18.

Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9(5):504–512CrossRef

19.

Martin R (2006) Bias compensation methods for minimum statistics noise power spectral density estimation. Sig Process 86(6):1215–1229CrossRef

20.

Mauler D, Martin R (2006) Noise power spectral density estimation on highly correlated data. In: Proceedings of the international workshop on acoustic echo and noise control, Paris

21.

Meng Z, Li J, Gong Y et al (2018) Adversarial feature-mapping for speech enhancement. arXiv preprint arXiv:1809.02251

22.

Metzler CA, Mousavi A, Heckel R, Baraniuk RG (2018) Unsupervised learning with stein’s unbiased risk estimator. arXiv preprint arXiv:1805.10531

23.

Muraka NR, Seelamantula CS (2011) A risk-estimation-based comparison of mean square error and itakura-saito distortion measures for speech enhancement. In: Twelfth annual conference of the international speech communication association

24.

Muraka NR, Seelamantula CS (2012) A risk-estimation-based formulation for speech enhancement and its relation to wiener filtering. In: 2012 international conference on signal processing and communications (SPCOM). IEEE, pp 1–5

25.

Pandey A, Wang D (2019) A new framework for CNN-based speech enhancement in the time domain. IEEE/ACM Trans Audio Speech Lang Process 27(7):1179–1188CrossRef

26.

Pentina A, Lampert CH (2015) Lifelong learning with non-iid tasks. Adv Neural Inform Process Syst 28:1540–1548

27.

Ramani S, Blu T, Unser M (2008) Monte-Carlo sure: a black-box optimization of regularization parameters for general denoising algorithms. IEEE Trans Image Process 17(9):1540–1554MathSciNetCrossRef

28.

Reddy CK, Beyrami E, Dubey H, Gopal V, Cheng R, Cutler R, Matusevych S, Aichner R, Aazami A, Braun S et al (2020) The interspeech 2020 deep noise suppression challenge: datasets, subjective speech quality and testing framework. arXiv preprint arXiv:2001.08662

29.

Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE international conference on acoustics, speech, and signal processing, 2001. Proceedings (ICASSP’01). IEEE, vol 2, pp 749–752

30.

Sadasivan J, Seelamantula CS, Muraka NR (2020) Speech enhancement using a risk estimation approach. Speech Commun 116:12–29CrossRef

31.

Silver DL, Yang Q, Li L (2013) Lifelong machine learning systems: beyond learning algorithms. In: 2013 AAAI spring symposium series, Citeseer

32.

Stein CM (1981) Estimation of the mean of a multivariate normal distribution. Ann Stat 9:1135–1151MathSciNetCrossRef

33.

Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136CrossRef

34.

Thrun S, Mitchell TM (1995) Lifelong robot learning. Robot Auton Syst 15(1–2):25–46CrossRef

35.

Trentin E, Gori M (2006) Inversion-based nonlinear adaptation of noisy acoustic parameters for a neural/hmm speech recognizer. Neurocomputing 70(1–3):398–408CrossRef

36.

Valentini-Botinhao C, Yamagishi J (2018) Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans Audio Speech Lang Process 26:1420–1433CrossRef

37.

Valentini-Botinhao C et al (2017) Noisy reverberant speech database for training speech enhancement algorithms and tts models, 2016 [dataset]. University of Edinburgh. https://doi.org/10.7488/ds/2139

38.

Wen JY, Gaubitch ND, Habets EA, Myatt T, Naylor PA (2006) Evaluation of speech dereverberation algorithms using the Mardy database. In: Proceedings international workshop acoustics echo noise control (IWAENC), Citeseer

39.

Xia B, Bao C (2014) Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun 60:13–29CrossRef

40.

Xu Y, Du J, Dai LR, Lee CH (2015) A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans Audio Speech Lang Process TASLP 23(1):7–19CrossRef

41.

Zhang S, Wu Y, Che T, Lin Z, Memisevic R, Salakhutdinov R, Bengio Y (2016) Architectural complexity measures of recurrent neural networks. arXiv preprint arXiv:1602.08210

42.

Zhussip M, Soltanayev S, Chun SY (2019) Extending Stein’s unbiased risk estimator to train deep denoisers with correlated pairs of noisy images. In: Advances in neural information processing systems, pp 1465–1475

Titel: Online Speech Enhancement by Retraining of LSTM Using SURE Loss and Policy Iteration
verfasst von: Sriharsha Koundinya
Abhijit Karmakar
Publikationsdatum: 30.05.2021
Verlag: Springer US
Erschienen in: Neural Processing Letters / Ausgabe 5/2021
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-021-10535-5

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 5/2021

Convolutional Feature Frequency Adaptive Fusion Object Detection Network

Correction to: Recent Deep Learning Techniques, Challenges and Its Applications for Medical Healthcare System: A Review

PU Active Learning for Recommender Systems

Improve Semi-supervised Learning with Metric Learning Clusters and Auxiliary Fake Samples

CSCNN: Cost-Sensitive Convolutional Neural Network for Encrypted Traffic Classification

Anfis-Based Defect Severity Prediction on a Multi-Stage Gearbox Operating Under Fluctuating Speeds

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.