Skip to main content
Erschienen in: International Journal of Speech Technology 3/2022

21.06.2022

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

verfasst von: Samia Abd El-Moneim, Eman Abd El-Mordy, M. A. Nassar, Moawad I. Dessouky, Nabil A. Ismail, Adel S. El-Fishawy, Sami El-Dolil, Ibrahim M. El-Dokany, Fathi E. Abd El-Samie

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic Speaker Recognition (ASR) in mismatched conditions is a challenging task, since robust feature extraction and classification techniques are required. Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) is an efficient network that can learn to recognize speakers, text-independently, when the recording circumstances are similar. Unfortunately, when the recording circumstances differ, its performance degrades. In this paper, Radon projection of the spectrograms of speech signals is implemented to get the features, since Radon Transform (RT) has less sensitivity to noise and reverberation conditions. The Radon projection is implemented on the spectrograms of speech signals, and then 2-D Discrete Cosine Transform (DCT) is computed. This technique improves the system recognition accuracy, text-independently with less sensitivity to noise and reverberation effects. The ASR system performance with the proposed features is compared to that of the system that depends on Mel Frequency Cepstral Coefficients (MFCCs) and spectrum features. For noisy utterances at 25 dB, the recognition rate with the proposed feature reaches 80%, while it is 27% and 28% with MFCCs and spectrum, respectively. For reverberant speech, the recognition rate reaches 80.67% with the proposed features, while it reaches 54% and 62.67% with the MFCCs and spectrum, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abd El-samie, F. E. (2011). Information security for automatic speaker identification. Springer briefs in electrical and computer engineering. New York: Springer, 2011. Abd El-samie, F. E. (2011). Information security for automatic speaker identification. Springer briefs in electrical and computer engineering. New York: Springer, 2011.
Zurück zum Zitat Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., & Baskurt, A. (2010). Action classification in soccer videos with long short-term memory recurrent neural networks (pp. 154–159). Berlin: Springer-Verlag. Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., & Baskurt, A. (2010). Action classification in soccer videos with long short-term memory recurrent neural networks (pp. 154–159). Berlin: Springer-Verlag.
Zurück zum Zitat Campbell, J. P. (1997). Speaker recognition: A tutorial. In Proceedings of the IEEE, Vol. 85. Campbell, J. P. (1997). Speaker recognition: A tutorial. In Proceedings of the IEEE, Vol. 85.
Zurück zum Zitat Das, A., Jena, M. R., & Barik, K. K. (2014). Mel-frequency cepstral coefficient (MFCC) a novel method for speaker recognition. Digital Technologies, 1(1), 1–3. Das, A., Jena, M. R., & Barik, K. K. (2014). Mel-frequency cepstral coefficient (MFCC) a novel method for speaker recognition. Digital Technologies, 1(1), 1–3.
Zurück zum Zitat Dennis, J., Dat, T., & Li, H. (2011). Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Processing Letters, 18(2), 130–133.CrossRef Dennis, J., Dat, T., & Li, H. (2011). Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Processing Letters, 18(2), 130–133.CrossRef
Zurück zum Zitat Joshi, D., Upadhayay, M. D., & Joshi, S. D. (2013). Robust language and speaker identification using image processing techniques combined with PCA. IEEE, pp. 213–218. Joshi, D., Upadhayay, M. D., & Joshi, S. D. (2013). Robust language and speaker identification using image processing techniques combined with PCA. IEEE, pp. 213–218.
Zurück zum Zitat Li, X., Wu, X. (2015). Modeling speaker variability using long short-term memory networks for speech recognition. In INTERSPEECH 2015, pp. 1086–1090, Sept 6–10. Li, X., Wu, X. (2015). Modeling speaker variability using long short-term memory networks for speech recognition. In INTERSPEECH 2015, pp. 1086–1090, Sept 6–10.
Zurück zum Zitat Mohanan, N., Velmurugan, R., & Rao, P. (2018). A non-convolutive NMF model for speech dereverberation. In INTERSPEECH 2018, Indian Institute of Technology Bombay. Mohanan, N., Velmurugan, R., & Rao, P. (2018). A non-convolutive NMF model for speech dereverberation. In INTERSPEECH 2018, Indian Institute of Technology Bombay.
Zurück zum Zitat Parada, P. P., Sharma, D., Naylor, P. A., & van Waterschoot, T. (2014). Reverberant speech recognition: A phoneme analysis. In Proc. 2014 IEEE global conf. signal inf. process. (GlobalSIP '14), Atlanta, GA, USA, Dec. 2014, pp. 567–571. Parada, P. P., Sharma, D., Naylor, P. A., & van Waterschoot, T. (2014). Reverberant speech recognition: A phoneme analysis. In Proc. 2014 IEEE global conf. signal inf. process. (GlobalSIP '14), Atlanta, GA, USA, Dec. 2014, pp. 567–571.
Zurück zum Zitat Sharma, A., Singh, S. P., & Kumar, V. (2005). Text-independent speaker identification using back propagation MLP network classifier for a closed set of speaker. In: IEEE international symposium on signal processing and information technology. Allahabad: Indian Institute of Information Technology. Sharma, A., Singh, S. P., & Kumar, V. (2005). Text-independent speaker identification using back propagation MLP network classifier for a closed set of speaker. In: IEEE international symposium on signal processing and information technology. Allahabad: Indian Institute of Information Technology.
Zurück zum Zitat Sekar, K. (2012). “Performance analysis of text-independent speaker identification system”, International conference on modeling optimisation and computer. Procedia Engineering, 38, 1925–1934.CrossRef Sekar, K. (2012). “Performance analysis of text-independent speaker identification system”, International conference on modeling optimisation and computer. Procedia Engineering, 38, 1925–1934.CrossRef
Metadaten
Titel
Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning
verfasst von
Samia Abd El-Moneim
Eman Abd El-Mordy
M. A. Nassar
Moawad I. Dessouky
Nabil A. Ismail
Adel S. El-Fishawy
Sami El-Dolil
Ibrahim M. El-Dokany
Fathi E. Abd El-Samie
Publikationsdatum
21.06.2022
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-021-09880-6

Weitere Artikel der Ausgabe 3/2022

International Journal of Speech Technology 3/2022 Zur Ausgabe

Neuer Inhalt