Top

International Journal of Machine Learning and Cybernetics

Published in:

06-10-2020 | Original Article

A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments

Authors: Weili Zhou, Zhen Zhu

Published in: International Journal of Machine Learning and Cybernetics | Issue 4/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Speech quality evaluation (SQE) under complex noisy environment is important for audio processing systems and quality of service. Recently, the non-intrusive SQE is getting more and more attentive due to its efficient and ease of use. However, non-intrusive SQEs are expected to be underperformed the intrusive ones since it has no prior knowledge of the clean speech. In this paper, a novel quasi-clean speech reconstruction method for non-intrusive SQE is proposed. The method incorporates Bayesian NMF (BNMF) with deep neural network (DNN), which takes the advantages of both NMF and DNN. BNMF is utilized to calculate the basic spectro-temporal matrixes of target speech, and the obtained matrices are integrated into the DNN model as an individual layer. Then DNN is trained to learn the complex mapping between the target source and the mixture signal, and reconstruct the magnitude spectrograms of the quasi-clean speech. Finally, the reconstructed speech is regarded as the reference of the perceptual model to estimate the Mean opinion score of the tested noisy sample. The experiment results show that the proposed method outperforms the comparative non-intrusive SQE algorithms under challenging conditions in terms of objective measurement.

previous article A hybrid many-objective competitive swarm optimization algorithm for large-scale multirobot task allocation problem

next article An improved TODIM method based on the hesitant fuzzy psychological distance measure

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

inform now

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

inform now

Gierlich H, Heute U, Moeller S (2014) Advances in perceptual modeling of speech quality in telecommunications. In: 2014 ITG symposium on speech communication, Erlangen, pp 1–4

Wang J, Xie X, Li JX et al (2014) Research on audio quality evaluation standards. Inf Technol Stand 3:39–46

Zhou WL, Zhu Z (2019) A new online Bayesian NMF based quasi-clean speech reconstruction for non-intrusive voice quality evaluation. Neurocomputing 349:261–270CrossRef

Zhou WL, He QH (2015) Non-intrusive speech quality objective evaluation in high-noise environments. In: 2015 IEEE China summit and international conference on signal and information processing, Chengdu, pp 50–54

ITU-T Rec. (2001) P.862, Perceptual Evaluation of Speech Quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs

Ludovic M, Jens B, Martin K (2016) P.563-The ITU-T standard for single-ended speech quality assessment. IEEE Trans Audio Speech Lang Process 14:1924–1934

Rajesh KD, Arun K (2015) Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech. IET Signal Proc 9:638–646CrossRef

Sharma D, Meredith L, Lainez J, Barreda D, Naylor PA (2014) A non-intrusive PESQ measure. In: 2014 IEEE international conference on GlobalSIP, pp 975–978

Soni MH, Patil HA (2016) Novel subband autoencoder features for non-intrusive quality assessment of noise suppressed speech. In: 2016 conference of the international speech communication association on interspeech. IEEE, pp 3708–3712

10.

Fu SW, Tsao Y, Hwang HT et al (2018) Quality-net: an end-to-end non-intrusive speech quality assessment model based on BLSTM. arXiv preprint arXiv:1808.05344

11.

Zhou WL, Zhu Z, Liang PY (2019) Speech denoising using Bayesian NMF with online base update. Multimed Tools Appl 78(11):261–270

12.

Chen Y, Shi L, Feng Q et al (2014) Artifact suppressed dictionary learning for low-dose CT image processing. IEEE Trans Med Imaging 33(12):2271–2292CrossRef

13.

Chen Y, Zhang Y, Yang J et al (2018) Structure-adaptive fuzzy estimation for random-valued impulse noise suppression. IEEE Trans Circuits Syst Video Technol 28(2):414–427CrossRef

14.

Zhou WL, He QH, Wang YL et al (2017) Sparse representation-based quasi-clean speech construction for speech quality assessment under complex environments. IET Signal Proc 11:486–493CrossRef

15.

Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791CrossRef

16.

Le Roux J, Weninger F, Hershey JR (2015) Sparse NMF-half-baked or well done? Mitsubishi Elect. Res. Cambridge, Tech. Rep. TR2015-023

17.

Weninger F, Le Roux J, Hershey JR, Watanabe S (2014) Discriminative NMF and its application to single-channel source separation. In: 2014 conference of the international speech communication association on interspeech. IEEE, pp 865–869

18.

Ogrady PD, Pearlmutter BA (2008) Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing 72(1):88–101CrossRef

19.

Mysore GJ, Smaragdis P (2011) A non-negative approach to semisupervised separation of speech from noise with the use of temporal dynamics. In: 2011 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 1919–1926

20.

Schmidt MN, Larsen J (2008) Reduction of non-stationary noise using a non-negative latent variable decomposition. In: 2008 IEEE workshop on machine learning for signal process. IEEE, pp 486–491

21.

Mohammadiha N, Smaragdis P, Leijon A (2013) Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans Audio Speech Lang Process 21:2140–2151CrossRef

22.

Han K, Wang Y, Wang DL, Woods WS, Merks I, Zhang T (2015) Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans Audio Speech Lang Process 23(6):982–992CrossRef

23.

Wang Y, Narayanan A, Wang D (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process 22(12):1849–1858CrossRef

24.

Erdogan H, Hershey JR, Watanabe S, Roux JL (2015) Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: 2015 IEEE international conference on acoustics, speech signal process. IEEE, pp 708–712

25.

Williamson DS, Wang Y, Wang D (2016) Complex ratio masking for monaural speech separation. IEEE/ACM Trans Audio Speech Lang Process 24(3):483–492CrossRef

26.

Rethage D, Pons J, Serra X (2018) A wavenet for speech denoising. In: 2018 IEEE international conference on acoustics speech signal processing. IEEE, pp 1927–1930

27.

Pascual S, Bonafonte A, Serra J (2017) Segan: speech enhancement generative adversarial network. Proc Interspeech 2017:3642–3646CrossRef

28.

Soni MH, Shah N, Patil HA (2018) Time-frequency masking-based speech enhancement using generative adversarial network. In: 2018 IEEE international conference on acoustics, speech signal processing. IEEE, pp 1887–1890

29.

Wang Y, Wang D (2014) A structure-preserving training target for supervised speech separation. In: 2014 IEEE international conference on acoustics speech signal processing. IEEE, pp 6107–6111

30.

Kang TG, Kwon K, Shin JW, Kim NS (2015) NMF-based target source separation using deep neural network. IEEE Signal Process Lett 22(2):229–233CrossRef

31.

Mohammadiha N, Taghia J, Leijon A (2012) Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions. In: 2012 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4561–4564

32.

Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Comput Intell Neurosci 785152:17

33.

Martin R (2005) Speech enhancement based on minimum mean-square error estimation and supergaussian priorsm. IEEE Trans Audio Speech Lang Process 13(5):845–856CrossRef

34.

ITU-T P. Supplement-23 speech corpus. https://www.itu.int/net/itu-t/sigdb/genaudio/Pseries.htm#Psupp23. Accessed 1Jan 2019

35.

‘NOIZEUS speech corpus.https://ecs.utdallas.edu/loizou/speech/noizeus/. Accessed 11Oct 2018

36.

ITU-T Rec (1996) P.800: ‘Methods for subjective determination of transmissionquality

37.

‘Voice bank corpus’.https://www.infona.pl/resource/bwmeta1.element.ieee-art-000006709856/. Accessed 20Sept 2018

38.

‘TIMIT speech corpus’. https://catalog.ldc.upenn.edu/. Accessed 20Sept 2018

39.

‘NOISEX-92 database’. https://www.speech.cs.cmu.edu/. Accessed 1 Jan 2018

40.

Mohammadiha N, Leijon A Model order selection for nonnegative matrix factorization with application to speech enhancement. https://kth.diva-portal.org/smash/record.jsf?pid=diva2:447310. Accessed 15 Jan 2019

41.

Kwon K, Jong WS, Nam SK (2015) NMF-based speech enhancement using bases update. IEEE Sig Process Lett 22(4):450–454CrossRef

42.

Sunnydayal V, Kumar TK (2018) Speech enhancement using posterior regularized NMF with bases update. Comput Electr Eng 62:663–675CrossRef

Title: A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments
Authors: Weili Zhou
Zhen Zhu
Publication date: 06-10-2020
Publisher: Springer Berlin Heidelberg
Published in: International Journal of Machine Learning and Cybernetics / Issue 4/2021
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-020-01214-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Other articles of this Issue 4/2021

Knowledge-driven graph similarity for text classification

An improved TODIM method based on the hesitant fuzzy psychological distance measure

Balanced Graph-based regularized semi-supervised extreme learning machine for EEG classification

An algorithm based on valuation forecasting for game tree search

Dynamic fusion for ensemble of deep Q-network

Dense crowd counting based on adaptive scene division