Skip to main content
Erschienen in: Neural Processing Letters 5/2021

30.05.2021

Online Speech Enhancement by Retraining of LSTM Using SURE Loss and Policy Iteration

verfasst von: Sriharsha Koundinya, Abhijit Karmakar

Erschienen in: Neural Processing Letters | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speech enhancement is required for improving the quality and intelligibility in various applications such as recognition, hearing aids and other personal assistant devices. Due to the varying acoustic environments, online enhancement is a very significant aspect for its applicability in practical scenarios. This emphasizes the need to observe the environment and enhance the speech accordingly. Adaptive filters were used previously to provide online enhancement, but a neural network based online enhancement has not been proposed previously. In this paper, we employ a unique architecture based on Long- Short Term Memory (LSTM) networks to enhance single channel speech online. The LSTM network is trained online in a novel way by minimizing the Stein’s unbiased risk estimate. This method of retraining helps the network to learn denoising without using a clean sample or ground truth. To avoid training for each and every sample we have used policy iteration with reward function based on ITU-T P.563, the widely-used single ended perceptual measure. The performance of this LSTM retraining can be observed with the increased PESQ of the enhanced speech by 0.53 on average. The proposed method also improves intelligibility which can be seen from the improvement in the metric STOI by 0.22.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Braun S, Tashev I (2020) Data augmentation and loss normalization for deep noise suppression. In: International conference on speech and computer. Springer, pp 79–86 Braun S, Tashev I (2020) Data augmentation and loss normalization for deep noise suppression. In: International conference on speech and computer. Springer, pp 79–86
2.
Zurück zum Zitat Cohen I (2003) Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans Speech Audio Process 11(5):466–475CrossRef Cohen I (2003) Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans Speech Audio Process 11(5):466–475CrossRef
3.
Zurück zum Zitat Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Sig Process 81(11):2403–2418CrossRef Cohen I, Berdugo B (2001) Speech enhancement for non-stationary noise environments. Sig Process 81(11):2403–2418CrossRef
4.
Zurück zum Zitat Dean I, Robinson BL, Harper NS, McAlpine D (2008) Rapid neural adaptation to sound level statistics. J Neurosci 28(25):6430–6438CrossRef Dean I, Robinson BL, Harper NS, McAlpine D (2008) Rapid neural adaptation to sound level statistics. J Neurosci 28(25):6430–6438CrossRef
5.
Zurück zum Zitat Eaton J, Gaubitch ND, Moore AH, Naylor PA (2015) The ace challenge–corpus description and performance evaluation. In: 2015 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). IEEE, pp 1–5 Eaton J, Gaubitch ND, Moore AH, Naylor PA (2015) The ace challenge–corpus description and performance evaluation. In: 2015 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). IEEE, pp 1–5
6.
Zurück zum Zitat Fakoor R, He X, Tashev I, Zarar S (2017) Reinforcement learning to adapt speech enhancement to instantaneous input signal quality. arXiv preprint arXiv:1711.10791 Fakoor R, He X, Tashev I, Zarar S (2017) Reinforcement learning to adapt speech enhancement to instantaneous input signal quality. arXiv preprint arXiv:​1711.​10791
7.
Zurück zum Zitat Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232MathSciNetCrossRef Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2016) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232MathSciNetCrossRef
8.
Zurück zum Zitat Hadad E, Heese F, Vary P, Gannot S (2014) Multichannel audio database in various acoustic environments. In: 2014 14th international workshop on acoustic signal enhancement (IWAENC). IEEE, pp 313–317 Hadad E, Heese F, Vary P, Gannot S (2014) Multichannel audio database in various acoustic environments. In: 2014 14th international workshop on acoustic signal enhancement (IWAENC). IEEE, pp 313–317
9.
Zurück zum Zitat Kala T, Shinozaki T (2018) Reinforcement learning of speech recognition system based on policy gradient and hypothesis selection. In: 2018 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 5759–5763 Kala T, Shinozaki T (2018) Reinforcement learning of speech recognition system based on policy gradient and hypothesis selection. In: 2018 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 5759–5763
10.
Zurück zum Zitat Koizumi Y, Niwa K, Hioka Y, Kobayashi K, Haneda Y (2017) DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements. 2017 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 81–85 Koizumi Y, Niwa K, Hioka Y, Kobayashi K, Haneda Y (2017) DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements. 2017 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 81–85
11.
Zurück zum Zitat Li R, Liu Y, Shi Y, Dong L, Cui W (2016) Ilmsaf based speech enhancement with DNN and noise classification. Speech Commun 85:53–70CrossRef Li R, Liu Y, Shi Y, Dong L, Cui W (2016) Ilmsaf based speech enhancement with DNN and noise classification. Speech Commun 85:53–70CrossRef
12.
Zurück zum Zitat Liu B, Tao J, Wen Z, Mo F (2016) Speech enhancement based on analysis-synthesis framework with improved parameter domain enhancement. J Sig Process Syst 82(2):141–150CrossRef Liu B, Tao J, Wen Z, Mo F (2016) Speech enhancement based on analysis-synthesis framework with improved parameter domain enhancement. J Sig Process Syst 82(2):141–150CrossRef
13.
Zurück zum Zitat Loizou PC (2013) Speech enhancement: theory and practice. CRC Press, Boca RatonCrossRef Loizou PC (2013) Speech enhancement: theory and practice. CRC Press, Boca RatonCrossRef
14.
Zurück zum Zitat Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder. In: Interspeech, pp 436–440 Lu X, Tsao Y, Matsuda S, Hori C (2013) Speech enhancement based on deep denoising autoencoder. In: Interspeech, pp 436–440
15.
Zurück zum Zitat Mahmmod BM, Ramli AR, Baker T, Al-Obeidat F, Abdulhussain SH, Jassim WA (2019) Speech enhancement algorithm based on super-gaussian modeling and orthogonal polynomials. IEEE Access 7:103485–103504CrossRef Mahmmod BM, Ramli AR, Baker T, Al-Obeidat F, Abdulhussain SH, Jassim WA (2019) Speech enhancement algorithm based on super-gaussian modeling and orthogonal polynomials. IEEE Access 7:103485–103504CrossRef
16.
Zurück zum Zitat Mai Q, He D, Hou Y, Huang Z (2011) A fast adaptive Kalman filtering algorithm for speech enhancement. In: 2011 IEEE international conference on automation science and engineering. IEEE, pp 327–332 Mai Q, He D, Hou Y, Huang Z (2011) A fast adaptive Kalman filtering algorithm for speech enhancement. In: 2011 IEEE international conference on automation science and engineering. IEEE, pp 327–332
17.
Zurück zum Zitat Malfait L, Berger J, Kastner M (2006) P. 563-the itu-t standard for single-ended speech quality assessment. IEEE Trans Audio Speech Lang Process 14(6):1924–1934CrossRef Malfait L, Berger J, Kastner M (2006) P. 563-the itu-t standard for single-ended speech quality assessment. IEEE Trans Audio Speech Lang Process 14(6):1924–1934CrossRef
18.
Zurück zum Zitat Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9(5):504–512CrossRef Martin R (2001) Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans Speech Audio Process 9(5):504–512CrossRef
19.
Zurück zum Zitat Martin R (2006) Bias compensation methods for minimum statistics noise power spectral density estimation. Sig Process 86(6):1215–1229CrossRef Martin R (2006) Bias compensation methods for minimum statistics noise power spectral density estimation. Sig Process 86(6):1215–1229CrossRef
20.
Zurück zum Zitat Mauler D, Martin R (2006) Noise power spectral density estimation on highly correlated data. In: Proceedings of the international workshop on acoustic echo and noise control, Paris Mauler D, Martin R (2006) Noise power spectral density estimation on highly correlated data. In: Proceedings of the international workshop on acoustic echo and noise control, Paris
21.
22.
Zurück zum Zitat Metzler CA, Mousavi A, Heckel R, Baraniuk RG (2018) Unsupervised learning with stein’s unbiased risk estimator. arXiv preprint arXiv:1805.10531 Metzler CA, Mousavi A, Heckel R, Baraniuk RG (2018) Unsupervised learning with stein’s unbiased risk estimator. arXiv preprint arXiv:​1805.​10531
23.
Zurück zum Zitat Muraka NR, Seelamantula CS (2011) A risk-estimation-based comparison of mean square error and itakura-saito distortion measures for speech enhancement. In: Twelfth annual conference of the international speech communication association Muraka NR, Seelamantula CS (2011) A risk-estimation-based comparison of mean square error and itakura-saito distortion measures for speech enhancement. In: Twelfth annual conference of the international speech communication association
24.
Zurück zum Zitat Muraka NR, Seelamantula CS (2012) A risk-estimation-based formulation for speech enhancement and its relation to wiener filtering. In: 2012 international conference on signal processing and communications (SPCOM). IEEE, pp 1–5 Muraka NR, Seelamantula CS (2012) A risk-estimation-based formulation for speech enhancement and its relation to wiener filtering. In: 2012 international conference on signal processing and communications (SPCOM). IEEE, pp 1–5
25.
Zurück zum Zitat Pandey A, Wang D (2019) A new framework for CNN-based speech enhancement in the time domain. IEEE/ACM Trans Audio Speech Lang Process 27(7):1179–1188CrossRef Pandey A, Wang D (2019) A new framework for CNN-based speech enhancement in the time domain. IEEE/ACM Trans Audio Speech Lang Process 27(7):1179–1188CrossRef
26.
Zurück zum Zitat Pentina A, Lampert CH (2015) Lifelong learning with non-iid tasks. Adv Neural Inform Process Syst 28:1540–1548 Pentina A, Lampert CH (2015) Lifelong learning with non-iid tasks. Adv Neural Inform Process Syst 28:1540–1548
27.
Zurück zum Zitat Ramani S, Blu T, Unser M (2008) Monte-Carlo sure: a black-box optimization of regularization parameters for general denoising algorithms. IEEE Trans Image Process 17(9):1540–1554MathSciNetCrossRef Ramani S, Blu T, Unser M (2008) Monte-Carlo sure: a black-box optimization of regularization parameters for general denoising algorithms. IEEE Trans Image Process 17(9):1540–1554MathSciNetCrossRef
28.
Zurück zum Zitat Reddy CK, Beyrami E, Dubey H, Gopal V, Cheng R, Cutler R, Matusevych S, Aichner R, Aazami A, Braun S et al (2020) The interspeech 2020 deep noise suppression challenge: datasets, subjective speech quality and testing framework. arXiv preprint arXiv:2001.08662 Reddy CK, Beyrami E, Dubey H, Gopal V, Cheng R, Cutler R, Matusevych S, Aichner R, Aazami A, Braun S et al (2020) The interspeech 2020 deep noise suppression challenge: datasets, subjective speech quality and testing framework. arXiv preprint arXiv:​2001.​08662
29.
Zurück zum Zitat Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE international conference on acoustics, speech, and signal processing, 2001. Proceedings (ICASSP’01). IEEE, vol 2, pp 749–752 Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE international conference on acoustics, speech, and signal processing, 2001. Proceedings (ICASSP’01). IEEE, vol 2, pp 749–752
30.
Zurück zum Zitat Sadasivan J, Seelamantula CS, Muraka NR (2020) Speech enhancement using a risk estimation approach. Speech Commun 116:12–29CrossRef Sadasivan J, Seelamantula CS, Muraka NR (2020) Speech enhancement using a risk estimation approach. Speech Commun 116:12–29CrossRef
31.
Zurück zum Zitat Silver DL, Yang Q, Li L (2013) Lifelong machine learning systems: beyond learning algorithms. In: 2013 AAAI spring symposium series, Citeseer Silver DL, Yang Q, Li L (2013) Lifelong machine learning systems: beyond learning algorithms. In: 2013 AAAI spring symposium series, Citeseer
32.
33.
Zurück zum Zitat Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136CrossRef Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136CrossRef
34.
Zurück zum Zitat Thrun S, Mitchell TM (1995) Lifelong robot learning. Robot Auton Syst 15(1–2):25–46CrossRef Thrun S, Mitchell TM (1995) Lifelong robot learning. Robot Auton Syst 15(1–2):25–46CrossRef
35.
Zurück zum Zitat Trentin E, Gori M (2006) Inversion-based nonlinear adaptation of noisy acoustic parameters for a neural/hmm speech recognizer. Neurocomputing 70(1–3):398–408CrossRef Trentin E, Gori M (2006) Inversion-based nonlinear adaptation of noisy acoustic parameters for a neural/hmm speech recognizer. Neurocomputing 70(1–3):398–408CrossRef
36.
Zurück zum Zitat Valentini-Botinhao C, Yamagishi J (2018) Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans Audio Speech Lang Process 26:1420–1433CrossRef Valentini-Botinhao C, Yamagishi J (2018) Speech enhancement of noisy and reverberant speech for text-to-speech. IEEE/ACM Trans Audio Speech Lang Process 26:1420–1433CrossRef
38.
Zurück zum Zitat Wen JY, Gaubitch ND, Habets EA, Myatt T, Naylor PA (2006) Evaluation of speech dereverberation algorithms using the Mardy database. In: Proceedings international workshop acoustics echo noise control (IWAENC), Citeseer Wen JY, Gaubitch ND, Habets EA, Myatt T, Naylor PA (2006) Evaluation of speech dereverberation algorithms using the Mardy database. In: Proceedings international workshop acoustics echo noise control (IWAENC), Citeseer
39.
Zurück zum Zitat Xia B, Bao C (2014) Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun 60:13–29CrossRef Xia B, Bao C (2014) Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun 60:13–29CrossRef
40.
Zurück zum Zitat Xu Y, Du J, Dai LR, Lee CH (2015) A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans Audio Speech Lang Process TASLP 23(1):7–19CrossRef Xu Y, Du J, Dai LR, Lee CH (2015) A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans Audio Speech Lang Process TASLP 23(1):7–19CrossRef
41.
Zurück zum Zitat Zhang S, Wu Y, Che T, Lin Z, Memisevic R, Salakhutdinov R, Bengio Y (2016) Architectural complexity measures of recurrent neural networks. arXiv preprint arXiv:1602.08210 Zhang S, Wu Y, Che T, Lin Z, Memisevic R, Salakhutdinov R, Bengio Y (2016) Architectural complexity measures of recurrent neural networks. arXiv preprint arXiv:​1602.​08210
42.
Zurück zum Zitat Zhussip M, Soltanayev S, Chun SY (2019) Extending Stein’s unbiased risk estimator to train deep denoisers with correlated pairs of noisy images. In: Advances in neural information processing systems, pp 1465–1475 Zhussip M, Soltanayev S, Chun SY (2019) Extending Stein’s unbiased risk estimator to train deep denoisers with correlated pairs of noisy images. In: Advances in neural information processing systems, pp 1465–1475
Metadaten
Titel
Online Speech Enhancement by Retraining of LSTM Using SURE Loss and Policy Iteration
verfasst von
Sriharsha Koundinya
Abhijit Karmakar
Publikationsdatum
30.05.2021
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 5/2021
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-021-10535-5

Weitere Artikel der Ausgabe 5/2021

Neural Processing Letters 5/2021 Zur Ausgabe

Neuer Inhalt