Skip to main content
Top

Hint

Swipe to navigate through the articles of this issue

Published in: Neural Processing Letters 5/2021

30-05-2021

Online Speech Enhancement by Retraining of LSTM Using SURE Loss and Policy Iteration

Authors: Sriharsha Koundinya, Abhijit Karmakar

Published in: Neural Processing Letters | Issue 5/2021

Login to get access
share
SHARE

Abstract

Speech enhancement is required for improving the quality and intelligibility in various applications such as recognition, hearing aids and other personal assistant devices. Due to the varying acoustic environments, online enhancement is a very significant aspect for its applicability in practical scenarios. This emphasizes the need to observe the environment and enhance the speech accordingly. Adaptive filters were used previously to provide online enhancement, but a neural network based online enhancement has not been proposed previously. In this paper, we employ a unique architecture based on Long- Short Term Memory (LSTM) networks to enhance single channel speech online. The LSTM network is trained online in a novel way by minimizing the Stein’s unbiased risk estimate. This method of retraining helps the network to learn denoising without using a clean sample or ground truth. To avoid training for each and every sample we have used policy iteration with reward function based on ITU-T P.563, the widely-used single ended perceptual measure. The performance of this LSTM retraining can be observed with the increased PESQ of the enhanced speech by 0.53 on average. The proposed method also improves intelligibility which can be seen from the improvement in the metric STOI by 0.22.
Literature
1.
go back to reference Braun S, Tashev I (2020) Data augmentation and loss normalization for deep noise suppression. In: International conference on speech and computer. Springer, pp 79–86 Braun S, Tashev I (2020) Data augmentation and loss normalization for deep noise suppression. In: International conference on speech and computer. Springer, pp 79–86
2.
go back to reference Cohen I (2003) Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans Speech Audio Process 11(5):466–475 CrossRef Cohen I (2003) Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans Speech Audio Process 11(5):466–475 CrossRef
3.
go back to reference Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Sig Process 81(11):2403–2418 CrossRef Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Sig Process 81(11):2403–2418 CrossRef
4.
go back to reference Dean I, Robinson BL, Harper NS, McAlpine D (2008) Rapid neural adaptation to sound level statistics. J Neurosci 28(25):6430–6438 CrossRef Dean I, Robinson BL, Harper NS, McAlpine D (2008) Rapid neural adaptation to sound level statistics. J Neurosci 28(25):6430–6438 CrossRef
5.
go back to reference Eaton J, Gaubitch ND, Moore AH, Naylor PA (2015) The ace challenge–corpus description and performance evaluation. In: 2015 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). IEEE, pp 1–5 Eaton J, Gaubitch ND, Moore AH, Naylor PA (2015) The ace challenge–corpus description and performance evaluation. In: 2015 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). IEEE, pp 1–5
6.
go back to reference Fakoor R, He X, Tashev I, Zarar S (2017) Reinforcement learning to adapt speech enhancement to instantaneous input signal quality. arXiv preprint arXiv:​1711.​10791 Fakoor R, He X, Tashev I, Zarar S (2017) Reinforcement learning to adapt speech enhancement to instantaneous input signal quality. arXiv preprint arXiv:​1711.​10791
7.
go back to reference Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232 MathSciNetCrossRef Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232 MathSciNetCrossRef
8.
go back to reference Hadad E, Heese F, Vary P, Gannot S (2014) Multichannel audio database in various acoustic environments. In: 2014 14th international workshop on acoustic signal enhancement (IWAENC). IEEE, pp 313–317 Hadad E, Heese F, Vary P, Gannot S (2014) Multichannel audio database in various acoustic environments. In: 2014 14th international workshop on acoustic signal enhancement (IWAENC). IEEE, pp 313–317
9.
go back to reference Kala T, Shinozaki T (2018) Reinforcement learning of speech recognition system based on policy gradient and hypothesis selection. In: 2018 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 5759–5763 Kala T, Shinozaki T (2018) Reinforcement learning of speech recognition system based on policy gradient and hypothesis selection. In: 2018 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 5759–5763
10.
go back to reference Koizumi Y, Niwa K, Hioka Y, Kobayashi K, Haneda Y (2017) DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements. 2017 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 81–85 Koizumi Y, Niwa K, Hioka Y, Kobayashi K, Haneda Y (2017) DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements. 2017 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 81–85
11.
go back to reference Li R, Liu Y, Shi Y, Dong L, Cui W (2016) Ilmsaf based speech enhancement with DNN and noise classification. Speech Commun 85:53–70 CrossRef Li R, Liu Y, Shi Y, Dong L, Cui W (2016) Ilmsaf based speech enhancement with DNN and noise classification. Speech Commun 85:53–70 CrossRef
12.
go back to reference Liu B, Tao J, Wen Z, Mo F (2016) Speech enhancement based on analysis-synthesis framework with improved parameter domain enhancement. J Sig Process Syst 82(2):141–150 CrossRef Liu B, Tao J, Wen Z, Mo F (2016) Speech enhancement based on analysis-synthesis framework with improved parameter domain enhancement. J Sig Process Syst 82(2):141–150 CrossRef
13.
go back to reference Loizou PC (2013) Speech enhancement: theory and practice. CRC Press, Boca Raton CrossRef Loizou PC (2013) Speech enhancement: theory and practice. CRC Press, Boca Raton CrossRef
14.
go back to reference Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder. In: Interspeech, pp 436–440 Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder. In: Interspeech, pp 436–440
15.
go back to reference Mahmmod BM, Ramli AR, Baker T, Al-Obeidat F, Abdulhussain SH, Jassim WA (2019) Speech enhancement algorithm based on super-gaussian modeling and orthogonal polynomials. IEEE Access 7:103485–103504 CrossRef Mahmmod BM, Ramli AR, Baker T, Al-Obeidat F, Abdulhussain SH, Jassim WA (2019) Speech enhancement algorithm based on super-gaussian modeling and orthogonal polynomials. IEEE Access 7:103485–103504 CrossRef
16.
go back to reference Mai Q, He D, Hou Y, Huang Z (2011) A fast adaptive Kalman filtering algorithm for speech enhancement. In: 2011 IEEE international conference on automation science and engineering. IEEE, pp 327–332 Mai Q, He D, Hou Y, Huang Z (2011) A fast adaptive Kalman filtering algorithm for speech enhancement. In: 2011 IEEE international conference on automation science and engineering. IEEE, pp 327–332
17.
go back to reference Malfait L, Berger J, Kastner M (2006) P. 563-the itu-t standard for single-ended speech quality assessment. IEEE Trans Audio Speech Lang Process 14(6):1924–1934 CrossRef Malfait L, Berger J, Kastner M (2006) P. 563-the itu-t standard for single-ended speech quality assessment. IEEE Trans Audio Speech Lang Process 14(6):1924–1934 CrossRef
18.
go back to reference Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9(5):504–512 CrossRef Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9(5):504–512 CrossRef
19.
go back to reference Martin R (2006) Bias compensation methods for minimum statistics noise power spectral density estimation. Sig Process 86(6):1215–1229 CrossRef Martin R (2006) Bias compensation methods for minimum statistics noise power spectral density estimation. Sig Process 86(6):1215–1229 CrossRef
20.
go back to reference Mauler D, Martin R (2006) Noise power spectral density estimation on highly correlated data. In: Proceedings of the international workshop on acoustic echo and noise control, Paris Mauler D, Martin R (2006) Noise power spectral density estimation on highly correlated data. In: Proceedings of the international workshop on acoustic echo and noise control, Paris
22.
23.
go back to reference Muraka NR, Seelamantula CS (2011) A risk-estimation-based comparison of mean square error and itakura-saito distortion measures for speech enhancement. In: Twelfth annual conference of the international speech communication association Muraka NR, Seelamantula CS (2011) A risk-estimation-based comparison of mean square error and itakura-saito distortion measures for speech enhancement. In: Twelfth annual conference of the international speech communication association
24.
go back to reference Muraka NR, Seelamantula CS (2012) A risk-estimation-based formulation for speech enhancement and its relation to wiener filtering. In: 2012 international conference on signal processing and communications (SPCOM). IEEE, pp 1–5 Muraka NR, Seelamantula CS (2012) A risk-estimation-based formulation for speech enhancement and its relation to wiener filtering. In: 2012 international conference on signal processing and communications (SPCOM). IEEE, pp 1–5
25.
go back to reference Pandey A, Wang D (2019) A new framework for CNN-based speech enhancement in the time domain. IEEE/ACM Trans Audio Speech Lang Process 27(7):1179–1188 CrossRef Pandey A, Wang D (2019) A new framework for CNN-based speech enhancement in the time domain. IEEE/ACM Trans Audio Speech Lang Process 27(7):1179–1188 CrossRef
26.
go back to reference Pentina A, Lampert CH (2015) Lifelong learning with non-iid tasks. Adv Neural Inform Process Syst 28:1540–1548 Pentina A, Lampert CH (2015) Lifelong learning with non-iid tasks. Adv Neural Inform Process Syst 28:1540–1548
27.
go back to reference Ramani S, Blu T, Unser M (2008) Monte-Carlo sure: a black-box optimization of regularization parameters for general denoising algorithms. IEEE Trans Image Process 17(9):1540–1554 MathSciNetCrossRef Ramani S, Blu T, Unser M (2008) Monte-Carlo sure: a black-box optimization of regularization parameters for general denoising algorithms. IEEE Trans Image Process 17(9):1540–1554 MathSciNetCrossRef
28.
go back to reference Reddy CK, Beyrami E, Dubey H, Gopal V, Cheng R, Cutler R, Matusevych S, Aichner R, Aazami A, Braun S et al (2020) The interspeech 2020 deep noise suppression challenge: datasets, subjective speech quality and testing framework. arXiv preprint arXiv:​2001.​08662 Reddy CK, Beyrami E, Dubey H, Gopal V, Cheng R, Cutler R, Matusevych S, Aichner R, Aazami A, Braun S et al (2020) The interspeech 2020 deep noise suppression challenge: datasets, subjective speech quality and testing framework. arXiv preprint arXiv:​2001.​08662
29.
go back to reference Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE international conference on acoustics, speech, and signal processing, 2001. Proceedings (ICASSP’01). IEEE, vol 2, pp 749–752 Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE international conference on acoustics, speech, and signal processing, 2001. Proceedings (ICASSP’01). IEEE, vol 2, pp 749–752
30.
go back to reference Sadasivan J, Seelamantula CS, Muraka NR (2020) Speech enhancement using a risk estimation approach. Speech Commun 116:12–29 CrossRef Sadasivan J, Seelamantula CS, Muraka NR (2020) Speech enhancement using a risk estimation approach. Speech Commun 116:12–29 CrossRef
31.
go back to reference Silver DL, Yang Q, Li L (2013) Lifelong machine learning systems: beyond learning algorithms. In: 2013 AAAI spring symposium series, Citeseer Silver DL, Yang Q, Li L (2013) Lifelong machine learning systems: beyond learning algorithms. In: 2013 AAAI spring symposium series, Citeseer
33.
go back to reference Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136 CrossRef Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136 CrossRef
34.
go back to reference Thrun S, Mitchell TM (1995) Lifelong robot learning. Robot Auton Syst 15(1–2):25–46 CrossRef Thrun S, Mitchell TM (1995) Lifelong robot learning. Robot Auton Syst 15(1–2):25–46 CrossRef
35.
go back to reference Trentin E, Gori M (2006) Inversion-based nonlinear adaptation of noisy acoustic parameters for a neural/hmm speech recognizer. Neurocomputing 70(1–3):398–408 CrossRef Trentin E, Gori M (2006) Inversion-based nonlinear adaptation of noisy acoustic parameters for a neural/hmm speech recognizer. Neurocomputing 70(1–3):398–408 CrossRef
36.
go back to reference Valentini-Botinhao C, Yamagishi J (2018) Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans Audio Speech Lang Process 26:1420–1433 CrossRef Valentini-Botinhao C, Yamagishi J (2018) Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans Audio Speech Lang Process 26:1420–1433 CrossRef
38.
go back to reference Wen JY, Gaubitch ND, Habets EA, Myatt T, Naylor PA (2006) Evaluation of speech dereverberation algorithms using the Mardy database. In: Proceedings international workshop acoustics echo noise control (IWAENC), Citeseer Wen JY, Gaubitch ND, Habets EA, Myatt T, Naylor PA (2006) Evaluation of speech dereverberation algorithms using the Mardy database. In: Proceedings international workshop acoustics echo noise control (IWAENC), Citeseer
39.
go back to reference Xia B, Bao C (2014) Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun 60:13–29 CrossRef Xia B, Bao C (2014) Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun 60:13–29 CrossRef
40.
go back to reference Xu Y, Du J, Dai LR, Lee CH (2015) A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans Audio Speech Lang Process TASLP 23(1):7–19 CrossRef Xu Y, Du J, Dai LR, Lee CH (2015) A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans Audio Speech Lang Process TASLP 23(1):7–19 CrossRef
41.
go back to reference Zhang S, Wu Y, Che T, Lin Z, Memisevic R, Salakhutdinov R, Bengio Y (2016) Architectural complexity measures of recurrent neural networks. arXiv preprint arXiv:​1602.​08210 Zhang S, Wu Y, Che T, Lin Z, Memisevic R, Salakhutdinov R, Bengio Y (2016) Architectural complexity measures of recurrent neural networks. arXiv preprint arXiv:​1602.​08210
42.
go back to reference Zhussip M, Soltanayev S, Chun SY (2019) Extending Stein’s unbiased risk estimator to train deep denoisers with correlated pairs of noisy images. In: Advances in neural information processing systems, pp 1465–1475 Zhussip M, Soltanayev S, Chun SY (2019) Extending Stein’s unbiased risk estimator to train deep denoisers with correlated pairs of noisy images. In: Advances in neural information processing systems, pp 1465–1475
Metadata
Title
Online Speech Enhancement by Retraining of LSTM Using SURE Loss and Policy Iteration
Authors
Sriharsha Koundinya
Abhijit Karmakar
Publication date
30-05-2021
Publisher
Springer US
Published in
Neural Processing Letters / Issue 5/2021
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-021-10535-5

Other articles of this Issue 5/2021

Neural Processing Letters 5/2021 Go to the issue