nach oben

International Journal of Speech Technology

Erschienen in:

21.06.2022

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

verfasst von: Samia Abd El-Moneim, Eman Abd El-Mordy, M. A. Nassar, Moawad I. Dessouky, Nabil A. Ismail, Adel S. El-Fishawy, Sami El-Dolil, Ibrahim M. El-Dokany, Fathi E. Abd El-Samie

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Automatic Speaker Recognition (ASR) in mismatched conditions is a challenging task, since robust feature extraction and classification techniques are required. Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) is an efficient network that can learn to recognize speakers, text-independently, when the recording circumstances are similar. Unfortunately, when the recording circumstances differ, its performance degrades. In this paper, Radon projection of the spectrograms of speech signals is implemented to get the features, since Radon Transform (RT) has less sensitivity to noise and reverberation conditions. The Radon projection is implemented on the spectrograms of speech signals, and then 2-D Discrete Cosine Transform (DCT) is computed. This technique improves the system recognition accuracy, text-independently with less sensitivity to noise and reverberation effects. The ASR system performance with the proposed features is compared to that of the system that depends on Mel Frequency Cepstral Coefficients (MFCCs) and spectrum features. For noisy utterances at 25 dB, the recognition rate with the proposed feature reaches 80%, while it is 27% and 28% with MFCCs and spectrum, respectively. For reverberant speech, the recognition rate reaches 80.67% with the proposed features, while it reaches 54% and 62.67% with the MFCCs and spectrum, respectively.

Vorheriger Artikel Application of big data language recognition technology and GPU parallel computing in English teaching visualization system

Nächster Artikel Cancellable template generation for speaker recognition based on spectrogram patch selection and deep convolutional neural networks

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abd El-samie, F. E. (2011). Information security for automatic speaker identification. Springer briefs in electrical and computer engineering. New York: Springer, 2011.

Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., & Baskurt, A. (2010). Action classification in soccer videos with long short-term memory recurrent neural networks (pp. 154–159). Berlin: Springer-Verlag.

Campbell, J. P. (1997). Speaker recognition: A tutorial. In Proceedings of the IEEE, Vol. 85.

Das, A., Jena, M. R., & Barik, K. K. (2014). Mel-frequency cepstral coefficient (MFCC) a novel method for speaker recognition. Digital Technologies, 1(1), 1–3.

Dennis, J., Dat, T., & Li, H. (2011). Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Processing Letters, 18(2), 130–133.CrossRef

Harimi, A., Shahzadi, A., Ahmadyfard, A., & Yaghmaie, K. (2013). Speech emotion recognition using radon and discrete cosine transform based features from speech spectrogram. Journal of Intelligent Automation Systems. https://doi.org/10.22044/JIAS.2014.223

Joshi, D., Upadhayay, M. D., & Joshi, S. D. (2013). Robust language and speaker identification using image processing techniques combined with PCA. IEEE, pp. 213–218.

Kinoshita, K., et al. (2016). A summary of the reverb challenge: State-of-the-art and remaining challenges in reverberant speech processing. Journal on Advances in Signal Processing. https://doi.org/10.1186/s13634-016-0306-6CrossRef

Li, X., Wu, X. (2015). Modeling speaker variability using long short-term memory networks for speech recognition. In INTERSPEECH 2015, pp. 1086–1090, Sept 6–10.

Mohanan, N., Velmurugan, R., & Rao, P. (2018). A non-convolutive NMF model for speech dereverberation. In INTERSPEECH 2018, Indian Institute of Technology Bombay.

Parada, P. P., Sharma, D., Naylor, P. A., & van Waterschoot, T. (2014). Reverberant speech recognition: A phoneme analysis. In Proc. 2014 IEEE global conf. signal inf. process. (GlobalSIP '14), Atlanta, GA, USA, Dec. 2014, pp. 567–571.

Sharma, A., Singh, S. P., & Kumar, V. (2005). Text-independent speaker identification using back propagation MLP network classifier for a closed set of speaker. In: IEEE international symposium on signal processing and information technology. Allahabad: Indian Institute of Information Technology.

Sekar, K. (2012). “Performance analysis of text-independent speaker identification system”, International conference on modeling optimisation and computer. Procedia Engineering, 38, 1925–1934.CrossRef

Zazo, R., Diez, A. L., Dominguez, J. G., Toledano, D. T., & Rodriguez, J. G. (2016). Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE. https://doi.org/10.1371/journal.pone.0146917,Jan.29CrossRef

Titel: Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning
verfasst von: Samia Abd El-Moneim
Eman Abd El-Mordy
M. A. Nassar
Moawad I. Dessouky
Nabil A. Ismail
Adel S. El-Fishawy
Sami El-Dolil
Ibrahim M. El-Dokany
Fathi E. Abd El-Samie
Publikationsdatum: 21.06.2022
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-021-09880-6

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Gardiner von Trapp/© Alpega Group, Benny Hahn/© ZEP GmbH, Customer Experience/© © oatawa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2022

A low power reconfigurable ADC for bioimpedance monitroing system

Timbre features with MEDIAN values for compensating intra-speaker variability in speaker identification of whispering sound

Audio fingerprint analysis for speech processing using deep learning method

Infrared and visible image fusion using latent low rank technique for surveillance applications

Single-channel speech enhancement using implicit Wiener filter for high-quality speech communication

An adaptive speech signal processing for COVID-19 detection using deep learning approach

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.