nach oben

International Journal of Speech Technology

Erschienen in:

14.08.2015

Ideal binary masking for reducing convolutive noise

verfasst von: Nasir Saleem, Ehtasham Mustafa, Aamir Nawaz, Adnan Khan

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

It is important to know the degree to which convolutive noise disrupts the perceptual aspects of speech and its intelligibility. This paper presents the ideal binary masking criterion for reducing the convolutive noise (reverberation) and to improve the quality and intelligibility of speech. The noise is suppressed using ideal binary time–frequency masking that is based on signal-to-reverberation ratio (SRR) of individual time–frequency channels. All T–F channels with the SRR greater than pre-selected threshold are retained while others are eliminated. The performance of algorithm is evaluated using IEEE sentences corrupted with different degrees of reverberation times (RT₆₀) ranging from 0.3 to 2.0 s. The results indicate that with the increase of reverberation time, the intelligibility and perceptual aspects of speech decrease. Additional analyses indicated that ideal binary masking reduced the temporary envelope spreading effect introduced by the reverberation. The algorithm is evaluated with perceptual evaluation of speech quality, SNR_LOSS, log-likelihood-ratio and frequency weighted segmental signal-to-noise ratio.

Vorheriger Artikel i-Vectors in speech processing applications: a survey

Nächster Artikel Hybrid speech enhancement with empirical mode decomposition and spectral subtraction for efficient speaker identification

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Assmann, P. F., & Summerfield, Q. (2004). The perception of speech under adverse acoustic conditions. In S. Greenberg (Ed.), Speech processing in auditory system. A. N: W. A. Ainsworth.

Bolt, R. H., & MacDonald, A. D. (1949). Theory of speech masking by reverberation. Journal of the Acoustic Society of America, 21, 577–580.CrossRef

Furuya, K., & Kataoka, A. (2007). Robust speech dereverberation using multichannel blind deconvolution with spectral subtraction. IEEE Transactions on Audio, Speech, and Language Processing, 15, 1579–1591.CrossRef

Grundlehner, B., Lecocq, J., Balan, R., & Rosca, J. (2005). Performance assessment method for speech enhancement. In Proceedings of 1st annual, IEEE.

Haykin, S. (2000). Unsupervised adaptive filtering: Blind de-convolution (Vol. 2, pp. 1–12). New York: Wiley.

Huang, Y., Benesty, J., & Chen, J. (2007). De-reverberation. In J. Benesty, M. Sondhi, & Y. Huang (Eds.), Springer handbook of speech processing (pp. 929–943). New York: Springer.

Kjellberg, A. (2004). Effects of reverberation time on the cognitive load in speech communication: Theoretical considerations. Noise Health, 7, 11–21.

Kokkinakis, K., & Loizou, P. C. (2009). Selective-tap blind de-reverberation for two-microphone enhancement of reverberant speech. IEEE Signal Processing Letters, 16, 961–964.CrossRef

Krishnamoorthy, P., & Prasanna, S. R. (2009). Reverberant speech enhancement by temporal and spectral processing. IEEE Transactions on Audio, Speech, and Language Processing, 17, 253–266.CrossRef

Loizou, P. C. (2007). Speech enhancement: Theory and practice. In S. R. Quackenbush, T. P. Barnwell III, & M. A. Clement (Eds.), Objective—measures of speech quality (2nd ed.). Eaglewood Cliffs: Prentice Hall.

Ma, J., & Loizou, P. C. (2011). SNR loss: A new objective measure for predicting speech intelligibility of noise-suppressed speech. Speech Communication, 53(3), 340–354.CrossRef

Miyoshi, M., & Kaneda, Y. (1988). Inverse filtering of room acoustics. IEEE Transactions on Speech and Audio Processing, 36, 145–152.CrossRef

Nabelek, A. K., & Dagenais, P. A. (1986). Vowel errors in noise and in reverberation by hearing-impaired listeners. Journal of the Acoustic Society of America, 80, 741–748.CrossRef

Nabelek, A. K., & Letowski, T. R. (1988). Similarities of vowels in non-reverberant and reverberant fields. Journal of the Acoustic Society of America, 83, 1891–1899.CrossRef

Nabelek, A. K., Letowski, T. R., & Tucker, F. M. (1989). Reverberant overlap and self-masking in consonant identification. Journal of the Acoustic Society of America, 86, 1259–1265.CrossRef

Nabelek, A. K., & Picket, J. M. (1974). Monaural and binaural speech perception through hearing aids under noise and reverberation with normal and hearing-impaired listeners. Journal of Speech and Hearing Research, 17, 724–739.CrossRef

Neuman, A. C., Wroblewski, M., Hajicek, J., & Rubinstein, A. (2010). Combined effects of noise and reverberation on speech recognition performance of normal-hearing children and adults. Ear and Hearing, 31, 336–344.CrossRef

Rix, A.W., Hollier, M. P., Hekstra, A. P. & Beerends, J. G. (2001). Perceptual evaluation of speech quality (PESQ).

Roman, N., & Woodruff, J. (2013). Speech intelligibility in reverberation with ideal binary masking: Effects of early reflections and signal-to-noise ratio threshold. Journal of the Acoustical Society of America, 133, 1707–1717.CrossRef

Titel: Ideal binary masking for reducing convolutive noise
verfasst von: Nasir Saleem
Ehtasham Mustafa
Aamir Nawaz
Adnan Khan
Publikationsdatum: 14.08.2015
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2015
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-015-9298-0

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence_ieS/© Springer Fachmedien Wiesbaden GmbH, Search Icon, Banner Hanser, Strompreise/© vejaa / stock.adobe.com, Bunte Männchen, die Kunden darstelle, werden von einem riesigen Magneten angezogen. /© Oleksiy Mark, Dr. Daniel Schneider/© Fraunhofer IESE, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2015

Hybrid speech enhancement with empirical mode decomposition and spectral subtraction for efficient speaker identification

Automatic prominent syllable detection with machine learning classifiers

Efficient audio cryptosystem based on chaotic maps and double random phase encoding

A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language

Binary mask based method for enhancement of mixed noise speech of low SNR input

Four-stage feature selection to recognize emotion from speech signals

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.