nach oben

Erschienen in:

2014 | OriginalPaper | Buchkapitel

13. Monaural Speech Enhancement Based on Multi-threshold Masking

verfasst von : Masoud Geravanchizadeh, Reza Ahmadnia

Erschienen in: Blind Source Separation

Verlag: Springer Berlin Heidelberg

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The ideal binary mask (IBM) has been assigned as a computational goal in computational auditory scene analysis (CASA) algorithms. Only time–frequency (T-F) units with local signal-to-noise ratio (SNR) exceeding a local criterion (LC) are assigned the binary value 1 in the binary mask. However, there are two problems with employing IBM in source separation applications. First, an optimum LC for a certain SNR may not be appropriate for other SNRs. Second, binary weighting may cause some parts or regions of the synthesized speech to be discarded at the output. If one employs variable weights, as opposed to the hard limiting weights (i.e., 0 or 1) taken in IBM, the above-mentioned problems can be solved considerably. In this chapter, a novel auditory-based mask, called ideal multi-threshold mask (IMM) is proposed which can be used in source separation applications. To show the potential capabilities of the new mask, a minimum mean-square error (MMSE)-based method is proposed to estimate IMM in the framework of monaural speech enhancement system. Various objective and subjective evaluation criteria show the superior performance of the new speech enhancement system as compared to a recently introduced enhancement technique.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis

Nächstes Kapitel REPET for Background/Foreground Separation in Audio

Bregman, A.S.: Auditory Scene Analysis. MIT, Cambridge (1955)

Wang D.L., Brown G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press, Hoboken (2006)

Wang D.L.: On ideal binary mask as the computational goal of auditory scene analysis. In: P. Divenyi (ed.) Speech Separation by Humans and Machines, pp. 181–197. Kluwer Academic Publishers, Norwell (2005)

Patterson R.D., Nimmo-Smith I., Holdsworth J., Rice P.: An Efficient Auditory Filterbank Based on the Gammatone Function. Report No. 2341, MRC Applied Psychology Unit, Cambridge (1985)

Brungart, D., Chang, P.S., Simpson, B.D., Wang, D.L.: Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. J. Acoust. Soc. Am. 120(6), 4007–4018 (2006)CrossRef

Li, N., Loizou, P.C.: Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. J. Acoust. Soc. Am. 123(3), 1673–1682 (2008)CrossRef

Sawada H., Araki S., Makino S.: A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 139–142 (2007)

Mandel, M.I., Weiss, R.J., Ellis, D.P.W.: Model-based expectation-maximization source separation and localization. IEEE Trans. Audio Speech Lang. Process. 18(2), 382–394 (2010)CrossRef

Alinaghi A., Wang W., Jackson P.J.B.: Spatial and coherence cues based time-frequency masking for binaural reverberant speech separation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2013)

10.

Anzalone, M.C., Calandruccio, L., Doherty, K.A., Carney, L.H.: Determination of the potential benefit of time-frequency gain manipulation. Ear Hear. 27(5), 480–492 (2006)CrossRef

11.

Cao, S., Li, L., Wu, X.: Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise. J. Acoust. Soc. Am. 129(4), 2227–2236 (2011)CrossRef

12.

Fletcher, H.: Speech and Hearing in Communication. D. Van Nostrand Company, New York (1958)

13.

Dewey, G.: Relative Frequency of English Speech Sounds. Harvard University Press, Cambridge (1923)CrossRef

14.

Rothauser, E.H., Chapman, W.D., Guttman, N., Hecker, M.H.L., Nordby, K.S., Silbiger, H.R., Urbanek, G.E., Weinstock, M.: Ieee recommended practice for speech quality measurements. IEEE Trans. Audio Electroacoust. 17, 225–246 (1969)CrossRef

15.

Noisex-92. http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html, (2014)

16.

Ephraim, Y., Cohen, I.: Recent advancements in speech enhancement. In: Dorf, R.C. (ed.) The Electrical Engineering Handbook, 3rd edn. CRC Press, Boca Raton (2006)

17.

Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoustics Speech Signal Process. 32(6), 1109–1121 (1984)CrossRef

18.

Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoustics Speech Signal Process. 23(2), 443–445 (1985)CrossRef

19.

Fort, G., Lambert-Lacroix, S.: Classification using partial least squares with penalized logistic regression. Bioinformatics 21(7), 1104–1111 (2005)CrossRef

20.

Weintraub M.: A Theory and Computational Model of Auditory Monaural Sound Separation, Ph.D. Thesis, Stanford University (1985)

21.

Hu, Y., Loizou, P.C.: Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)CrossRef

22.

ITU-T.: Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs, Series P: Telephone Transmission Quality Recommendation P.862, ITU, 1.4 (2001)

23.

ITU-R.: Recommendation BS.1534-1: Method for the Subjective Assessment of Intermediate Quality Level of Coding Systems (2001)

24.

Vincent E.: MUSHRAM: A MATLAB Interface for MUSHRA Listening Tests. Available on http://c4dm.eecs.qmul.ac.uk/downloads/, (2014)

25.

Hu K., Wang D.L.: SVM-based separation of unvoiced-voiced speech in cochannel conditions. In: IEEE International Conference on Acoustics, Speech, and Signal Processing ( ICASSP), pp. 4545–4548 (2012)

26.

Wang, Y., Han, K., Wang, D.L.: Exploring monaural features for classification-based speech segregation. IEEE Trans. Audio Speech Lang. Process. 21(2), 270–279 (2013)CrossRef

Titel: Monaural Speech Enhancement Based on Multi-threshold Masking
verfasst von: Masoud Geravanchizadeh
Reza Ahmadnia
Verlag: Springer Berlin Heidelberg
Buch: Blind Source Separation
Print ISBN: 978-3-642-55015-7

Electronic ISBN: 978-3-642-55016-4

Copyright-Jahr: 2014
DOI: https://doi.org/10.1007/978-3-642-55016-4_13

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.