Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 4/2021

06-10-2020 | Original Article

A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments

Authors: Weili Zhou, Zhen Zhu

Published in: International Journal of Machine Learning and Cybernetics | Issue 4/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech quality evaluation (SQE) under complex noisy environment is important for audio processing systems and quality of service. Recently, the non-intrusive SQE is getting more and more attentive due to its efficient and ease of use. However, non-intrusive SQEs are expected to be underperformed the intrusive ones since it has no prior knowledge of the clean speech. In this paper, a novel quasi-clean speech reconstruction method for non-intrusive SQE is proposed. The method incorporates Bayesian NMF (BNMF) with deep neural network (DNN), which takes the advantages of both NMF and DNN. BNMF is utilized to calculate the basic spectro-temporal matrixes of target speech, and the obtained matrices are integrated into the DNN model as an individual layer. Then DNN is trained to learn the complex mapping between the target source and the mixture signal, and reconstruct the magnitude spectrograms of the quasi-clean speech. Finally, the reconstructed speech is regarded as the reference of the perceptual model to estimate the Mean opinion score of the tested noisy sample. The experiment results show that the proposed method outperforms the comparative non-intrusive SQE algorithms under challenging conditions in terms of objective measurement.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Gierlich H, Heute U, Moeller S (2014) Advances in perceptual modeling of speech quality in telecommunications. In: 2014 ITG symposium on speech communication, Erlangen, pp 1–4 Gierlich H, Heute U, Moeller S (2014) Advances in perceptual modeling of speech quality in telecommunications. In: 2014 ITG symposium on speech communication, Erlangen, pp 1–4
2.
go back to reference Wang J, Xie X, Li JX et al (2014) Research on audio quality evaluation standards. Inf Technol Stand 3:39–46 Wang J, Xie X, Li JX et al (2014) Research on audio quality evaluation standards. Inf Technol Stand 3:39–46
3.
go back to reference Zhou WL, Zhu Z (2019) A new online Bayesian NMF based quasi-clean speech reconstruction for non-intrusive voice quality evaluation. Neurocomputing 349:261–270CrossRef Zhou WL, Zhu Z (2019) A new online Bayesian NMF based quasi-clean speech reconstruction for non-intrusive voice quality evaluation. Neurocomputing 349:261–270CrossRef
4.
go back to reference Zhou WL, He QH (2015) Non-intrusive speech quality objective evaluation in high-noise environments. In: 2015 IEEE China summit and international conference on signal and information processing, Chengdu, pp 50–54 Zhou WL, He QH (2015) Non-intrusive speech quality objective evaluation in high-noise environments. In: 2015 IEEE China summit and international conference on signal and information processing, Chengdu, pp 50–54
5.
go back to reference ITU-T Rec. (2001) P.862, Perceptual Evaluation of Speech Quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs ITU-T Rec. (2001) P.862, Perceptual Evaluation of Speech Quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs
6.
go back to reference Ludovic M, Jens B, Martin K (2016) P.563-The ITU-T standard for single-ended speech quality assessment. IEEE Trans Audio Speech Lang Process 14:1924–1934 Ludovic M, Jens B, Martin K (2016) P.563-The ITU-T standard for single-ended speech quality assessment. IEEE Trans Audio Speech Lang Process 14:1924–1934
7.
go back to reference Rajesh KD, Arun K (2015) Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech. IET Signal Proc 9:638–646CrossRef Rajesh KD, Arun K (2015) Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech. IET Signal Proc 9:638–646CrossRef
8.
go back to reference Sharma D, Meredith L, Lainez J, Barreda D, Naylor PA (2014) A non-intrusive PESQ measure. In: 2014 IEEE international conference on GlobalSIP, pp 975–978 Sharma D, Meredith L, Lainez J, Barreda D, Naylor PA (2014) A non-intrusive PESQ measure. In: 2014 IEEE international conference on GlobalSIP, pp 975–978
9.
go back to reference Soni MH, Patil HA (2016) Novel subband autoencoder features for non-intrusive quality assessment of noise suppressed speech. In: 2016 conference of the international speech communication association on interspeech. IEEE, pp 3708–3712 Soni MH, Patil HA (2016) Novel subband autoencoder features for non-intrusive quality assessment of noise suppressed speech. In: 2016 conference of the international speech communication association on interspeech. IEEE, pp 3708–3712
10.
go back to reference Fu SW, Tsao Y, Hwang HT et al (2018) Quality-net: an end-to-end non-intrusive speech quality assessment model based on BLSTM. arXiv preprint arXiv:1808.05344 Fu SW, Tsao Y, Hwang HT et al (2018) Quality-net: an end-to-end non-intrusive speech quality assessment model based on BLSTM. arXiv preprint arXiv:1808.05344
11.
go back to reference Zhou WL, Zhu Z, Liang PY (2019) Speech denoising using Bayesian NMF with online base update. Multimed Tools Appl 78(11):261–270 Zhou WL, Zhu Z, Liang PY (2019) Speech denoising using Bayesian NMF with online base update. Multimed Tools Appl 78(11):261–270
12.
go back to reference Chen Y, Shi L, Feng Q et al (2014) Artifact suppressed dictionary learning for low-dose CT image processing. IEEE Trans Med Imaging 33(12):2271–2292CrossRef Chen Y, Shi L, Feng Q et al (2014) Artifact suppressed dictionary learning for low-dose CT image processing. IEEE Trans Med Imaging 33(12):2271–2292CrossRef
13.
go back to reference Chen Y, Zhang Y, Yang J et al (2018) Structure-adaptive fuzzy estimation for random-valued impulse noise suppression. IEEE Trans Circuits Syst Video Technol 28(2):414–427CrossRef Chen Y, Zhang Y, Yang J et al (2018) Structure-adaptive fuzzy estimation for random-valued impulse noise suppression. IEEE Trans Circuits Syst Video Technol 28(2):414–427CrossRef
14.
go back to reference Zhou WL, He QH, Wang YL et al (2017) Sparse representation-based quasi-clean speech construction for speech quality assessment under complex environments. IET Signal Proc 11:486–493CrossRef Zhou WL, He QH, Wang YL et al (2017) Sparse representation-based quasi-clean speech construction for speech quality assessment under complex environments. IET Signal Proc 11:486–493CrossRef
15.
go back to reference Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791CrossRef Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791CrossRef
16.
go back to reference Le Roux J, Weninger F, Hershey JR (2015) Sparse NMF-half-baked or well done? Mitsubishi Elect. Res. Cambridge, Tech. Rep. TR2015-023 Le Roux J, Weninger F, Hershey JR (2015) Sparse NMF-half-baked or well done? Mitsubishi Elect. Res. Cambridge, Tech. Rep. TR2015-023
17.
go back to reference Weninger F, Le Roux J, Hershey JR, Watanabe S (2014) Discriminative NMF and its application to single-channel source separation. In: 2014 conference of the international speech communication association on interspeech. IEEE, pp 865–869 Weninger F, Le Roux J, Hershey JR, Watanabe S (2014) Discriminative NMF and its application to single-channel source separation. In: 2014 conference of the international speech communication association on interspeech. IEEE, pp 865–869
18.
go back to reference Ogrady PD, Pearlmutter BA (2008) Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing 72(1):88–101CrossRef Ogrady PD, Pearlmutter BA (2008) Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint. Neurocomputing 72(1):88–101CrossRef
19.
go back to reference Mysore GJ, Smaragdis P (2011) A non-negative approach to semisupervised separation of speech from noise with the use of temporal dynamics. In: 2011 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 1919–1926 Mysore GJ, Smaragdis P (2011) A non-negative approach to semisupervised separation of speech from noise with the use of temporal dynamics. In: 2011 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 1919–1926
20.
go back to reference Schmidt MN, Larsen J (2008) Reduction of non-stationary noise using a non-negative latent variable decomposition. In: 2008 IEEE workshop on machine learning for signal process. IEEE, pp 486–491 Schmidt MN, Larsen J (2008) Reduction of non-stationary noise using a non-negative latent variable decomposition. In: 2008 IEEE workshop on machine learning for signal process. IEEE, pp 486–491
21.
go back to reference Mohammadiha N, Smaragdis P, Leijon A (2013) Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans Audio Speech Lang Process 21:2140–2151CrossRef Mohammadiha N, Smaragdis P, Leijon A (2013) Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans Audio Speech Lang Process 21:2140–2151CrossRef
22.
go back to reference Han K, Wang Y, Wang DL, Woods WS, Merks I, Zhang T (2015) Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans Audio Speech Lang Process 23(6):982–992CrossRef Han K, Wang Y, Wang DL, Woods WS, Merks I, Zhang T (2015) Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans Audio Speech Lang Process 23(6):982–992CrossRef
23.
go back to reference Wang Y, Narayanan A, Wang D (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process 22(12):1849–1858CrossRef Wang Y, Narayanan A, Wang D (2014) On training targets for supervised speech separation. IEEE/ACM Trans Audio Speech Lang Process 22(12):1849–1858CrossRef
24.
go back to reference Erdogan H, Hershey JR, Watanabe S, Roux JL (2015) Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: 2015 IEEE international conference on acoustics, speech signal process. IEEE, pp 708–712 Erdogan H, Hershey JR, Watanabe S, Roux JL (2015) Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. In: 2015 IEEE international conference on acoustics, speech signal process. IEEE, pp 708–712
25.
go back to reference Williamson DS, Wang Y, Wang D (2016) Complex ratio masking for monaural speech separation. IEEE/ACM Trans Audio Speech Lang Process 24(3):483–492CrossRef Williamson DS, Wang Y, Wang D (2016) Complex ratio masking for monaural speech separation. IEEE/ACM Trans Audio Speech Lang Process 24(3):483–492CrossRef
26.
go back to reference Rethage D, Pons J, Serra X (2018) A wavenet for speech denoising. In: 2018 IEEE international conference on acoustics speech signal processing. IEEE, pp 1927–1930 Rethage D, Pons J, Serra X (2018) A wavenet for speech denoising. In: 2018 IEEE international conference on acoustics speech signal processing. IEEE, pp 1927–1930
27.
go back to reference Pascual S, Bonafonte A, Serra J (2017) Segan: speech enhancement generative adversarial network. Proc Interspeech 2017:3642–3646CrossRef Pascual S, Bonafonte A, Serra J (2017) Segan: speech enhancement generative adversarial network. Proc Interspeech 2017:3642–3646CrossRef
28.
go back to reference Soni MH, Shah N, Patil HA (2018) Time-frequency masking-based speech enhancement using generative adversarial network. In: 2018 IEEE international conference on acoustics, speech signal processing. IEEE, pp 1887–1890 Soni MH, Shah N, Patil HA (2018) Time-frequency masking-based speech enhancement using generative adversarial network. In: 2018 IEEE international conference on acoustics, speech signal processing. IEEE, pp 1887–1890
29.
go back to reference Wang Y, Wang D (2014) A structure-preserving training target for supervised speech separation. In: 2014 IEEE international conference on acoustics speech signal processing. IEEE, pp 6107–6111 Wang Y, Wang D (2014) A structure-preserving training target for supervised speech separation. In: 2014 IEEE international conference on acoustics speech signal processing. IEEE, pp 6107–6111
30.
go back to reference Kang TG, Kwon K, Shin JW, Kim NS (2015) NMF-based target source separation using deep neural network. IEEE Signal Process Lett 22(2):229–233CrossRef Kang TG, Kwon K, Shin JW, Kim NS (2015) NMF-based target source separation using deep neural network. IEEE Signal Process Lett 22(2):229–233CrossRef
31.
go back to reference Mohammadiha N, Taghia J, Leijon A (2012) Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions. In: 2012 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4561–4564 Mohammadiha N, Taghia J, Leijon A (2012) Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions. In: 2012 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4561–4564
32.
go back to reference Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Comput Intell Neurosci 785152:17 Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Comput Intell Neurosci 785152:17
33.
go back to reference Martin R (2005) Speech enhancement based on minimum mean-square error estimation and supergaussian priorsm. IEEE Trans Audio Speech Lang Process 13(5):845–856CrossRef Martin R (2005) Speech enhancement based on minimum mean-square error estimation and supergaussian priorsm. IEEE Trans Audio Speech Lang Process 13(5):845–856CrossRef
36.
go back to reference ITU-T Rec (1996) P.800: ‘Methods for subjective determination of transmissionquality ITU-T Rec (1996) P.800: ‘Methods for subjective determination of transmissionquality
41.
go back to reference Kwon K, Jong WS, Nam SK (2015) NMF-based speech enhancement using bases update. IEEE Sig Process Lett 22(4):450–454CrossRef Kwon K, Jong WS, Nam SK (2015) NMF-based speech enhancement using bases update. IEEE Sig Process Lett 22(4):450–454CrossRef
42.
go back to reference Sunnydayal V, Kumar TK (2018) Speech enhancement using posterior regularized NMF with bases update. Comput Electr Eng 62:663–675CrossRef Sunnydayal V, Kumar TK (2018) Speech enhancement using posterior regularized NMF with bases update. Comput Electr Eng 62:663–675CrossRef
Metadata
Title
A novel BNMF-DNN based speech reconstruction method for speech quality evaluation under complex environments
Authors
Weili Zhou
Zhen Zhu
Publication date
06-10-2020
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 4/2021
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-020-01214-3

Other articles of this Issue 4/2021

International Journal of Machine Learning and Cybernetics 4/2021 Go to the issue