Text-independent speaker recognition using LSTM-RNN and speech enhancement

El-Moneim, Samia Abd; Nassar, M. A.; Dessouky, Moawad I.; Ismail, Nabil A.; El-Fishawy, Adel S.; Abd El-Samie, Fathi E.

doi:10.1007/s11042-019-08293-7

Text-independent speaker recognition using LSTM-RNN and speech enhancement

Published: 17 June 2020

Volume 79, pages 24013–24028, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Samia Abd El-Moneim¹,
M. A. Nassar²,
Moawad I. Dessouky²,
Nabil A. Ismail³,
Adel S. El-Fishawy² &
…
Fathi E. Abd El-Samie^2,4

1440 Accesses
39 Citations
3 Altmetric
Explore all metrics

Abstract

Speaker recognition revolution has lead to the inclusion of speaker recognition modules in several commercial products. Most published algorithms for speaker recognition focus on text-dependent speaker recognition. In contrast, text-independent speaker recognition is more advantageous as the client can talk freely to the system. In this paper, text-independent speaker recognition is considered in the presence of some degradation effects such as noise and reverberation. Mel-Frequency Cepstral Coefficients (MFCCs), spectrum and log-spectrum are used for feature extraction from the speech signals. These features are processed with the Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) as a classification tool to complete the speaker recognition task. The network learns to recognize the speakers efficiently in a text-independent manner, when the recording circumstances are the same. The recognition rate reaches 95.33% using MFCCs, while it is increased to 98.7% when using spectrum or log-spectrum. However, the system has some challenges to recognize speakers from different recording environments. Hence, different speech enhancement techniques, such as spectral subtraction and wavelet denoising, are used to improve the recognition performance to some extent. The proposed approach shows superiority, when compared to the algorithm of R. Togneri and D. Pullella (2011).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

Article 21 June 2022

Speaker Recognition Using Noise Robust Features and LSTM-RNN

Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks

References

Abd El-Samie FE (2011) Information security for automatic speaker identification. Springer briefs in electrical and computer engineering. Springer, New York
Baccouche M et al (2010) Action classification in soccer videos with long short-term memory recurrent neural networks. Springer-Verlag Berlin Heidelberg, pp 154–159
Bhattacharya G, Alam J, Stafylakis T, Kenny P (2016) Deep neural network based text-dependent speaker recognition: preliminary results. Odyssey 2016, pp 9–15
Campbell JP (1997) Speaker recognition: a tutorial. In: Proceedings of the IEEE, vol 85, no 9
Das A, Jena MR, Barik KK (2014) Mel-frequency cepstral coefficient (MFCC) a novel method for speaker recognition. Digital Technologies 1:1–3
Google Scholar
Dennis J, Dat T, Li H (2011) Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Processing Letters 18(2):130–133
Article Google Scholar
Dominguez JG et al (2014) Automatic language identification using long short-term memory recurrent neural networks. Inter Speech 2014:2155–2159
Google Scholar
Evans NWD, Mason JS, Liu WM, Fauve B (2005) On the fundamental limitations of spectra subtraction: an assessment by automatic speech recognition. IEEE, European Signal Processing Conference, 2005
Gish H, Schmidt M (1994) Text-independent speaker identification. IEEE Signal Process Mag 11:18–32
Article Google Scholar
Kaladharan N (2014) Speech enhancement by spectral subtraction method. International Journal of Computer Applications 96(13):45–48
Article Google Scholar
Karam M et al (2014) Noise removal in speech processing using spectral subtraction. Journal of Signal and Information Processing 5:32–41
Article Google Scholar
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52:12–40
Article Google Scholar
Kumari VSR, Devarakonda DK (2013) A wavelet based denoising of speech signal. Int J Eng Trends Technol 5(2):107–115
Google Scholar
Larsson J (2014) Optimizing text-independent speaker recognition using an LSTM neural network. Master Thesis in Robotics
Li KP, Wrench KH (1983) An approach to text-independent speaker recognition with short utterances. IEEE, pp 555–558
Mihov SG (2009) Denoising speech signals by wavelet transform. Annual Journal of Electronics
Nilufar S, Ray N, Islam Molla MK, Hirose K (2012) Spectrogram based features selection using multiple kernel learning for speech/music discrimination. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 501–504
Google Scholar
Parada PP et al (2014) Reverberant-speech-recognition:-A-phoneme-analysis. In: Proc. IEEE global Conf. Signal Inf. Process, pp 567–571
Google Scholar
Sant’Ana R et al (2006) Text-independent speaker recognition based on the Hurst parameter and the multidimensional fractional Brownian motion model. IEEE Trans Audio Speech Lang Process 14(3):931–940
Article Google Scholar
Seo Y, Huh J (2019) Automatic emotion-based music classification for supporting intelligent IoT applications, vol 8. Electronics, p 164
Sharma A, Singh SP, Kumar V (2005) Text-independent speaker identification using Back propagation MLP network classifier for a closed set of speaker. IEEE International Symposium on Signal Processing and Information Technology, 2005
Togneri R, Pullella D (2011) An overview of speaker identification: accuracy and robustness issues. IEEE Circuits and Systems Magazine 11:23–61
Article Google Scholar
Yegnanarayana B, Murthy PS (2000) Enhancement of reverberant speech using LP residual signal. IEEE Trans Speech Audio Processing 8:267–281
Article Google Scholar
Zazo R (2016) Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS One 11:e0146917
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Communications, Tanta High Institute of Engineering and Technology, Tanta, Egypt
Samia Abd El-Moneim
Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, 32952, Egypt
M. A. Nassar, Moawad I. Dessouky, Adel S. El-Fishawy & Fathi E. Abd El-Samie
Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, 32952, Egypt
Nabil A. Ismail
Department of Information Technology, College of Computer and Information sciences, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
Fathi E. Abd El-Samie

Authors

Samia Abd El-Moneim
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Nassar
View author publications
You can also search for this author in PubMed Google Scholar
Moawad I. Dessouky
View author publications
You can also search for this author in PubMed Google Scholar
Nabil A. Ismail
View author publications
You can also search for this author in PubMed Google Scholar
Adel S. El-Fishawy
View author publications
You can also search for this author in PubMed Google Scholar
Fathi E. Abd El-Samie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samia Abd El-Moneim.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

El-Moneim, S.A., Nassar, M.A., Dessouky, M.I. et al. Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimed Tools Appl 79, 24013–24028 (2020). https://doi.org/10.1007/s11042-019-08293-7

Download citation

Received: 22 November 2018
Revised: 18 August 2019
Accepted: 30 September 2019
Published: 17 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11042-019-08293-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text-independent speaker recognition using LSTM-RNN and speech enhancement

Abstract

Access this article

Similar content being viewed by others

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

Speaker Recognition Using Noise Robust Features and LSTM-RNN

Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text-independent speaker recognition using LSTM-RNN and speech enhancement

Abstract

Access this article

Similar content being viewed by others

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

Speaker Recognition Using Noise Robust Features and LSTM-RNN

Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation