Skip to main content
Top

2016 | OriginalPaper | Chapter

Optimizing the Objective Measure of Speech Quality in Monaural Speech Separation

Authors : M. Dharmalingam, M. C. John Wiselin, R. Rajavel

Published in: Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics

Publisher: Springer India

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Monaural speech separation based on computational auditory scene analysis (CASA) is a challenging problem in the field of signal processing. The Ideal Binary Mask (IBM) proposed by DeLiang Wang and colleague is considered as the benchmark in CASA. However, it introduces objectionable distortions called musical noise and moreover, the perceived speech quality is very poor at low SNR conditions. The main reason for the degradation of speech quality is binary masking, in which some part of speech is discarded during synthesis. In order to address this musical noise problem in IBM and improve the speech quality, this work proposes a new soft mask as the goal of CASA. The performance of the proposed soft mask is evaluated using perceptual evaluation of speech quality (PESQ). The IEEE speech corpus and NOISEX92 noises are used to conduct the experiment. The experimental results indicate the superior performance of the proposed soft mask as compared to the traditional IBM in the context of monaural speech separation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Loizou, P.C.: Speech Enhancement: Theory and Practice, 2nd edn, CRC Press (2013) Loizou, P.C.: Speech Enhancement: Theory and Practice, 2nd edn, CRC Press (2013)
2.
go back to reference Naik, G.R., Kumar, D.K.: An over view of independent component analysis and its applications. Informatica 35, 63–81 (2011)MATH Naik, G.R., Kumar, D.K.: An over view of independent component analysis and its applications. Informatica 35, 63–81 (2011)MATH
3.
go back to reference Grais, E., Erdogan, H.: Single channel speech music separation using nonnegative matrix factorization and spectral masks. In: The 17th International Conference on Digital Signal Processing, pp. 1–6. Island of Corfu, Greece (2011) Grais, E., Erdogan, H.: Single channel speech music separation using nonnegative matrix factorization and spectral masks. In: The 17th International Conference on Digital Signal Processing, pp. 1–6. Island of Corfu, Greece (2011)
4.
go back to reference Jang, G.J., Lee, T.W.: A probabilistic approach to single channel source separation. In: Proceedings of Adv. Neural Inf. Process. System, pp. 1173–1180 (2003) Jang, G.J., Lee, T.W.: A probabilistic approach to single channel source separation. In: Proceedings of Adv. Neural Inf. Process. System, pp. 1173–1180 (2003)
5.
go back to reference Bregman, A.S.: Auditory Scene Analysis. MIT Press, Cambridge (1990) Bregman, A.S.: Auditory Scene Analysis. MIT Press, Cambridge (1990)
6.
go back to reference Christopher, H., Toby, S., Tim B.: On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis. In: Naik, G. R., Wang, W. (eds) Blind Source Separation Advances in Theory, Algorithms and Applications. Signals and Communication Technology, pp. 369–393. Springer-Verlag, Heidelberg (2014) Christopher, H., Toby, S., Tim B.: On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis. In: Naik, G. R., Wang, W. (eds) Blind Source Separation Advances in Theory, Algorithms and Applications. Signals and Communication Technology, pp. 369–393. Springer-Verlag, Heidelberg (2014)
7.
go back to reference Radfar, M.H., Dansereau, R.M., Chan, W.Y.: Monaural speech separation based on gain adapted minimum mean square error estimation. J. Sign. Process Syst. 61, 21–37 (2010)CrossRef Radfar, M.H., Dansereau, R.M., Chan, W.Y.: Monaural speech separation based on gain adapted minimum mean square error estimation. J. Sign. Process Syst. 61, 21–37 (2010)CrossRef
8.
go back to reference Mowlaee, P., Saeidi, R., Martin, R.: Model-driven speech enhancement for multisource reverberant environment. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds.) Latent Variable Analysis and Signal Separation. Lecture Notes in Computer Science, vol. 7191, pp. 454–461. Springer-Verlag, Heidelberg (2012)CrossRef Mowlaee, P., Saeidi, R., Martin, R.: Model-driven speech enhancement for multisource reverberant environment. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds.) Latent Variable Analysis and Signal Separation. Lecture Notes in Computer Science, vol. 7191, pp. 454–461. Springer-Verlag, Heidelberg (2012)CrossRef
9.
go back to reference Wang, D.: On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi, P. (ed.) Speech Separation by Human and Machines, pp. 181–197. Kluwer Academic, Norwell (2005)CrossRef Wang, D.: On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi, P. (ed.) Speech Separation by Human and Machines, pp. 181–197. Kluwer Academic, Norwell (2005)CrossRef
10.
go back to reference Geravanchizadeh, M., Ahmadnia, R.: Monaural Speech Enhancement Based on Multi-threshold Masking. In: Naik, G. R., Wang, W. (eds) Blind Source Separation Advances in Theory, Algorithms and Applications. Signals and Communication Technology, pp. 369–393. Springer-Verlag, Heidelberg (2014) Geravanchizadeh, M., Ahmadnia, R.: Monaural Speech Enhancement Based on Multi-threshold Masking. In: Naik, G. R., Wang, W. (eds) Blind Source Separation Advances in Theory, Algorithms and Applications. Signals and Communication Technology, pp. 369–393. Springer-Verlag, Heidelberg (2014)
11.
go back to reference Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. J. Acoust. Soc. Am. 123(3), 1673–1682 (2008)CrossRef Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. J. Acoust. Soc. Am. 123(3), 1673–1682 (2008)CrossRef
12.
go back to reference Araki, S., Sawada, H., Mukai, R. Makino, S.: Blind sparse source separation with spatially smoothed time-frequency masking. In: International Workshop on Acoustic, Echo and Noise Control, Paris (2006) Araki, S., Sawada, H., Mukai, R. Makino, S.: Blind sparse source separation with spatially smoothed time-frequency masking. In: International Workshop on Acoustic, Echo and Noise Control, Paris (2006)
13.
go back to reference Cao, S., Li, L., Wu, X.: Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise. J. Acoust. Soc. Am. 129, 2227–2236 (2011)CrossRef Cao, S., Li, L., Wu, X.: Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise. J. Acoust. Soc. Am. 129, 2227–2236 (2011)CrossRef
14.
go back to reference Patterson R.D., Nimmo-Smith, I., Holdsworth J.: Rice P : An Efficient Auditory Filter bank Based on the Gammatone Function. Report No. 2341, MRC Applied Psychology Unit, Cambridge (1985) Patterson R.D., Nimmo-Smith, I., Holdsworth J.: Rice P : An Efficient Auditory Filter bank Based on the Gammatone Function. Report No. 2341, MRC Applied Psychology Unit, Cambridge (1985)
15.
go back to reference Rajavel, R., Sathidevi, P.S.: A new GA optimised reliability ratio based integration weight estimation scheme for decision fusion audio-visual speech recognition. Int. J. Sig. Imaging Syst. Eng. 4(2), 123–131 (2011)CrossRef Rajavel, R., Sathidevi, P.S.: A new GA optimised reliability ratio based integration weight estimation scheme for decision fusion audio-visual speech recognition. Int. J. Sig. Imaging Syst. Eng. 4(2), 123–131 (2011)CrossRef
16.
go back to reference Rothauser, E.H., Chapman, W.D., Guttman, N., Hecker, M.H.L., Nordby, K.S., Silbiger, H.R., Urbanek, G.E., Weinstock, M.: Ieee recommended practice for speech quality measurements. IEEE Trans. Audio Electro Acoust. 17, 225–246 (1969)CrossRef Rothauser, E.H., Chapman, W.D., Guttman, N., Hecker, M.H.L., Nordby, K.S., Silbiger, H.R., Urbanek, G.E., Weinstock, M.: Ieee recommended practice for speech quality measurements. IEEE Trans. Audio Electro Acoust. 17, 225–246 (1969)CrossRef
18.
go back to reference ITU-T: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs, Series P: Telephone Transmission Quality Recommendation P.862, ITU, 1.4. (2001) ITU-T: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs, Series P: Telephone Transmission Quality Recommendation P.862, ITU, 1.4. (2001)
Metadata
Title
Optimizing the Objective Measure of Speech Quality in Monaural Speech Separation
Authors
M. Dharmalingam
M. C. John Wiselin
R. Rajavel
Copyright Year
2016
Publisher
Springer India
DOI
https://doi.org/10.1007/978-81-322-2538-6_56

Premium Partner