Skip to main content

2014 | OriginalPaper | Buchkapitel

13. Monaural Speech Enhancement Based on Multi-threshold Masking

verfasst von : Masoud Geravanchizadeh, Reza Ahmadnia

Erschienen in: Blind Source Separation

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The ideal binary mask (IBM) has been assigned as a computational goal in computational auditory scene analysis (CASA) algorithms. Only time–frequency (T-F) units with local signal-to-noise ratio (SNR) exceeding a local criterion (LC) are assigned the binary value 1 in the binary mask. However, there are two problems with employing IBM in source separation applications. First, an optimum LC for a certain SNR may not be appropriate for other SNRs. Second, binary weighting may cause some parts or regions of the synthesized speech to be discarded at the output. If one employs variable weights, as opposed to the hard limiting weights (i.e., 0 or 1) taken in IBM, the above-mentioned problems can be solved considerably. In this chapter, a novel auditory-based mask, called ideal multi-threshold mask (IMM) is proposed which can be used in source separation applications. To show the potential capabilities of the new mask, a minimum mean-square error (MMSE)-based method is proposed to estimate IMM in the framework of monaural speech enhancement system. Various objective and subjective evaluation criteria show the superior performance of the new speech enhancement system as compared to a recently introduced enhancement technique.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bregman, A.S.: Auditory Scene Analysis. MIT, Cambridge (1955) Bregman, A.S.: Auditory Scene Analysis. MIT, Cambridge (1955)
2.
Zurück zum Zitat Wang D.L., Brown G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press, Hoboken (2006) Wang D.L., Brown G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press, Hoboken (2006)
3.
Zurück zum Zitat Wang D.L.: On ideal binary mask as the computational goal of auditory scene analysis. In: P. Divenyi (ed.) Speech Separation by Humans and Machines, pp. 181–197. Kluwer Academic Publishers, Norwell (2005) Wang D.L.: On ideal binary mask as the computational goal of auditory scene analysis. In: P. Divenyi (ed.) Speech Separation by Humans and Machines, pp. 181–197. Kluwer Academic Publishers, Norwell (2005)
4.
Zurück zum Zitat Patterson R.D., Nimmo-Smith I., Holdsworth J., Rice P.: An Efficient Auditory Filterbank Based on the Gammatone Function. Report No. 2341, MRC Applied Psychology Unit, Cambridge (1985) Patterson R.D., Nimmo-Smith I., Holdsworth J., Rice P.: An Efficient Auditory Filterbank Based on the Gammatone Function. Report No. 2341, MRC Applied Psychology Unit, Cambridge (1985)
5.
Zurück zum Zitat Brungart, D., Chang, P.S., Simpson, B.D., Wang, D.L.: Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. J. Acoust. Soc. Am. 120(6), 4007–4018 (2006)CrossRef Brungart, D., Chang, P.S., Simpson, B.D., Wang, D.L.: Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. J. Acoust. Soc. Am. 120(6), 4007–4018 (2006)CrossRef
6.
Zurück zum Zitat Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. J. Acoust. Soc. Am. 123(3), 1673–1682 (2008)CrossRef Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. J. Acoust. Soc. Am. 123(3), 1673–1682 (2008)CrossRef
7.
Zurück zum Zitat Sawada H., Araki S., Makino S.: A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 139–142 (2007) Sawada H., Araki S., Makino S.: A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 139–142 (2007)
8.
Zurück zum Zitat Mandel, M.I., Weiss, R.J., Ellis, D.P.W.: Model-based expectation-maximization source separation and localization. IEEE Trans. Audio Speech Lang. Process. 18(2), 382–394 (2010)CrossRef Mandel, M.I., Weiss, R.J., Ellis, D.P.W.: Model-based expectation-maximization source separation and localization. IEEE Trans. Audio Speech Lang. Process. 18(2), 382–394 (2010)CrossRef
9.
Zurück zum Zitat Alinaghi A., Wang W., Jackson P.J.B.: Spatial and coherence cues based time-frequency masking for binaural reverberant speech separation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2013) Alinaghi A., Wang W., Jackson P.J.B.: Spatial and coherence cues based time-frequency masking for binaural reverberant speech separation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2013)
10.
Zurück zum Zitat Anzalone, M.C., Calandruccio, L., Doherty, K.A., Carney, L.H.: Determination of the potential benefit of time-frequency gain manipulation. Ear Hear. 27(5), 480–492 (2006)CrossRef Anzalone, M.C., Calandruccio, L., Doherty, K.A., Carney, L.H.: Determination of the potential benefit of time-frequency gain manipulation. Ear Hear. 27(5), 480–492 (2006)CrossRef
11.
Zurück zum Zitat Cao, S., Li, L., Wu, X.: Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise. J. Acoust. Soc. Am. 129(4), 2227–2236 (2011)CrossRef Cao, S., Li, L., Wu, X.: Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise. J. Acoust. Soc. Am. 129(4), 2227–2236 (2011)CrossRef
12.
Zurück zum Zitat Fletcher, H.: Speech and Hearing in Communication. D. Van Nostrand Company, New York (1958) Fletcher, H.: Speech and Hearing in Communication. D. Van Nostrand Company, New York (1958)
13.
Zurück zum Zitat Dewey, G.: Relative Frequency of English Speech Sounds. Harvard University Press, Cambridge (1923)CrossRef Dewey, G.: Relative Frequency of English Speech Sounds. Harvard University Press, Cambridge (1923)CrossRef
14.
Zurück zum Zitat Rothauser, E.H., Chapman, W.D., Guttman, N., Hecker, M.H.L., Nordby, K.S., Silbiger, H.R., Urbanek, G.E., Weinstock, M.: Ieee recommended practice for speech quality measurements. IEEE Trans. Audio Electroacoust. 17, 225–246 (1969)CrossRef Rothauser, E.H., Chapman, W.D., Guttman, N., Hecker, M.H.L., Nordby, K.S., Silbiger, H.R., Urbanek, G.E., Weinstock, M.: Ieee recommended practice for speech quality measurements. IEEE Trans. Audio Electroacoust. 17, 225–246 (1969)CrossRef
16.
Zurück zum Zitat Ephraim, Y., Cohen, I.: Recent advancements in speech enhancement. In: Dorf, R.C. (ed.) The Electrical Engineering Handbook, 3rd edn. CRC Press, Boca Raton (2006) Ephraim, Y., Cohen, I.: Recent advancements in speech enhancement. In: Dorf, R.C. (ed.) The Electrical Engineering Handbook, 3rd edn. CRC Press, Boca Raton (2006)
17.
Zurück zum Zitat Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoustics Speech Signal Process. 32(6), 1109–1121 (1984)CrossRef Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoustics Speech Signal Process. 32(6), 1109–1121 (1984)CrossRef
18.
Zurück zum Zitat Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoustics Speech Signal Process. 23(2), 443–445 (1985)CrossRef Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoustics Speech Signal Process. 23(2), 443–445 (1985)CrossRef
19.
Zurück zum Zitat Fort, G., Lambert-Lacroix, S.: Classification using partial least squares with penalized logistic regression. Bioinformatics 21(7), 1104–1111 (2005)CrossRef Fort, G., Lambert-Lacroix, S.: Classification using partial least squares with penalized logistic regression. Bioinformatics 21(7), 1104–1111 (2005)CrossRef
20.
Zurück zum Zitat Weintraub M.: A Theory and Computational Model of Auditory Monaural Sound Separation, Ph.D. Thesis, Stanford University (1985) Weintraub M.: A Theory and Computational Model of Auditory Monaural Sound Separation, Ph.D. Thesis, Stanford University (1985)
21.
Zurück zum Zitat Hu, Y., Loizou, P.C.: Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)CrossRef Hu, Y., Loizou, P.C.: Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)CrossRef
22.
Zurück zum Zitat ITU-T.: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs, Series P: Telephone Transmission Quality Recommendation P.862, ITU, 1.4 (2001) ITU-T.: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs, Series P: Telephone Transmission Quality Recommendation P.862, ITU, 1.4 (2001)
23.
Zurück zum Zitat ITU-R.: Recommendation BS.1534-1: Method for the Subjective Assessment of Intermediate Quality Level of Coding Systems (2001) ITU-R.: Recommendation BS.1534-1: Method for the Subjective Assessment of Intermediate Quality Level of Coding Systems (2001)
25.
Zurück zum Zitat Hu K., Wang D.L.: SVM-based separation of unvoiced-voiced speech in cochannel conditions. In: IEEE International Conference on Acoustics, Speech, and Signal Processing ( ICASSP), pp. 4545–4548 (2012) Hu K., Wang D.L.: SVM-based separation of unvoiced-voiced speech in cochannel conditions. In: IEEE International Conference on Acoustics, Speech, and Signal Processing ( ICASSP), pp. 4545–4548 (2012)
26.
Zurück zum Zitat Wang, Y., Han, K., Wang, D.L.: Exploring monaural features for classification-based speech segregation. IEEE Trans. Audio Speech Lang. Process. 21(2), 270–279 (2013)CrossRef Wang, Y., Han, K., Wang, D.L.: Exploring monaural features for classification-based speech segregation. IEEE Trans. Audio Speech Lang. Process. 21(2), 270–279 (2013)CrossRef
Metadaten
Titel
Monaural Speech Enhancement Based on Multi-threshold Masking
verfasst von
Masoud Geravanchizadeh
Reza Ahmadnia
Copyright-Jahr
2014
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-55016-4_13

Neuer Inhalt