Skip to main content

2016 | OriginalPaper | Buchkapitel

Optimizing the Objective Measure of Speech Quality in Monaural Speech Separation

verfasst von : M. Dharmalingam, M. C. John Wiselin, R. Rajavel

Erschienen in: Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Monaural speech separation based on computational auditory scene analysis (CASA) is a challenging problem in the field of signal processing. The Ideal Binary Mask (IBM) proposed by DeLiang Wang and colleague is considered as the benchmark in CASA. However, it introduces objectionable distortions called musical noise and moreover, the perceived speech quality is very poor at low SNR conditions. The main reason for the degradation of speech quality is binary masking, in which some part of speech is discarded during synthesis. In order to address this musical noise problem in IBM and improve the speech quality, this work proposes a new soft mask as the goal of CASA. The performance of the proposed soft mask is evaluated using perceptual evaluation of speech quality (PESQ). The IEEE speech corpus and NOISEX92 noises are used to conduct the experiment. The experimental results indicate the superior performance of the proposed soft mask as compared to the traditional IBM in the context of monaural speech separation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Loizou, P.C.: Speech Enhancement: Theory and Practice, 2nd edn, CRC Press (2013) Loizou, P.C.: Speech Enhancement: Theory and Practice, 2nd edn, CRC Press (2013)
2.
Zurück zum Zitat Naik, G.R., Kumar, D.K.: An over view of independent component analysis and its applications. Informatica 35, 63–81 (2011)MATH Naik, G.R., Kumar, D.K.: An over view of independent component analysis and its applications. Informatica 35, 63–81 (2011)MATH
3.
Zurück zum Zitat Grais, E., Erdogan, H.: Single channel speech music separation using nonnegative matrix factorization and spectral masks. In: The 17th International Conference on Digital Signal Processing, pp. 1–6. Island of Corfu, Greece (2011) Grais, E., Erdogan, H.: Single channel speech music separation using nonnegative matrix factorization and spectral masks. In: The 17th International Conference on Digital Signal Processing, pp. 1–6. Island of Corfu, Greece (2011)
4.
Zurück zum Zitat Jang, G.J., Lee, T.W.: A probabilistic approach to single channel source separation. In: Proceedings of Adv. Neural Inf. Process. System, pp. 1173–1180 (2003) Jang, G.J., Lee, T.W.: A probabilistic approach to single channel source separation. In: Proceedings of Adv. Neural Inf. Process. System, pp. 1173–1180 (2003)
5.
Zurück zum Zitat Bregman, A.S.: Auditory Scene Analysis. MIT Press, Cambridge (1990) Bregman, A.S.: Auditory Scene Analysis. MIT Press, Cambridge (1990)
6.
Zurück zum Zitat Christopher, H., Toby, S., Tim B.: On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis. In: Naik, G. R., Wang, W. (eds) Blind Source Separation Advances in Theory, Algorithms and Applications. Signals and Communication Technology, pp. 369–393. Springer-Verlag, Heidelberg (2014) Christopher, H., Toby, S., Tim B.: On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis. In: Naik, G. R., Wang, W. (eds) Blind Source Separation Advances in Theory, Algorithms and Applications. Signals and Communication Technology, pp. 369–393. Springer-Verlag, Heidelberg (2014)
7.
Zurück zum Zitat Radfar, M.H., Dansereau, R.M., Chan, W.Y.: Monaural speech separation based on gain adapted minimum mean square error estimation. J. Sign. Process Syst. 61, 21–37 (2010)CrossRef Radfar, M.H., Dansereau, R.M., Chan, W.Y.: Monaural speech separation based on gain adapted minimum mean square error estimation. J. Sign. Process Syst. 61, 21–37 (2010)CrossRef
8.
Zurück zum Zitat Mowlaee, P., Saeidi, R., Martin, R.: Model-driven speech enhancement for multisource reverberant environment. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds.) Latent Variable Analysis and Signal Separation. Lecture Notes in Computer Science, vol. 7191, pp. 454–461. Springer-Verlag, Heidelberg (2012)CrossRef Mowlaee, P., Saeidi, R., Martin, R.: Model-driven speech enhancement for multisource reverberant environment. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds.) Latent Variable Analysis and Signal Separation. Lecture Notes in Computer Science, vol. 7191, pp. 454–461. Springer-Verlag, Heidelberg (2012)CrossRef
9.
Zurück zum Zitat Wang, D.: On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi, P. (ed.) Speech Separation by Human and Machines, pp. 181–197. Kluwer Academic, Norwell (2005)CrossRef Wang, D.: On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi, P. (ed.) Speech Separation by Human and Machines, pp. 181–197. Kluwer Academic, Norwell (2005)CrossRef
10.
Zurück zum Zitat Geravanchizadeh, M., Ahmadnia, R.: Monaural Speech Enhancement Based on Multi-threshold Masking. In: Naik, G. R., Wang, W. (eds) Blind Source Separation Advances in Theory, Algorithms and Applications. Signals and Communication Technology, pp. 369–393. Springer-Verlag, Heidelberg (2014) Geravanchizadeh, M., Ahmadnia, R.: Monaural Speech Enhancement Based on Multi-threshold Masking. In: Naik, G. R., Wang, W. (eds) Blind Source Separation Advances in Theory, Algorithms and Applications. Signals and Communication Technology, pp. 369–393. Springer-Verlag, Heidelberg (2014)
11.
Zurück zum Zitat Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. J. Acoust. Soc. Am. 123(3), 1673–1682 (2008)CrossRef Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. J. Acoust. Soc. Am. 123(3), 1673–1682 (2008)CrossRef
12.
Zurück zum Zitat Araki, S., Sawada, H., Mukai, R. Makino, S.: Blind sparse source separation with spatially smoothed time-frequency masking. In: International Workshop on Acoustic, Echo and Noise Control, Paris (2006) Araki, S., Sawada, H., Mukai, R. Makino, S.: Blind sparse source separation with spatially smoothed time-frequency masking. In: International Workshop on Acoustic, Echo and Noise Control, Paris (2006)
13.
Zurück zum Zitat Cao, S., Li, L., Wu, X.: Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise. J. Acoust. Soc. Am. 129, 2227–2236 (2011)CrossRef Cao, S., Li, L., Wu, X.: Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise. J. Acoust. Soc. Am. 129, 2227–2236 (2011)CrossRef
14.
Zurück zum Zitat Patterson R.D., Nimmo-Smith, I., Holdsworth J.: Rice P : An Efficient Auditory Filter bank Based on the Gammatone Function. Report No. 2341, MRC Applied Psychology Unit, Cambridge (1985) Patterson R.D., Nimmo-Smith, I., Holdsworth J.: Rice P : An Efficient Auditory Filter bank Based on the Gammatone Function. Report No. 2341, MRC Applied Psychology Unit, Cambridge (1985)
15.
Zurück zum Zitat Rajavel, R., Sathidevi, P.S.: A new GA optimised reliability ratio based integration weight estimation scheme for decision fusion audio-visual speech recognition. Int. J. Sig. Imaging Syst. Eng. 4(2), 123–131 (2011)CrossRef Rajavel, R., Sathidevi, P.S.: A new GA optimised reliability ratio based integration weight estimation scheme for decision fusion audio-visual speech recognition. Int. J. Sig. Imaging Syst. Eng. 4(2), 123–131 (2011)CrossRef
16.
Zurück zum Zitat Rothauser, E.H., Chapman, W.D., Guttman, N., Hecker, M.H.L., Nordby, K.S., Silbiger, H.R., Urbanek, G.E., Weinstock, M.: Ieee recommended practice for speech quality measurements. IEEE Trans. Audio Electro Acoust. 17, 225–246 (1969)CrossRef Rothauser, E.H., Chapman, W.D., Guttman, N., Hecker, M.H.L., Nordby, K.S., Silbiger, H.R., Urbanek, G.E., Weinstock, M.: Ieee recommended practice for speech quality measurements. IEEE Trans. Audio Electro Acoust. 17, 225–246 (1969)CrossRef
18.
Zurück zum Zitat ITU-T: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs, Series P: Telephone Transmission Quality Recommendation P.862, ITU, 1.4. (2001) ITU-T: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs, Series P: Telephone Transmission Quality Recommendation P.862, ITU, 1.4. (2001)
Metadaten
Titel
Optimizing the Objective Measure of Speech Quality in Monaural Speech Separation
verfasst von
M. Dharmalingam
M. C. John Wiselin
R. Rajavel
Copyright-Jahr
2016
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2538-6_56

Premium Partner