Skip to main content

2022 | OriginalPaper | Buchkapitel

Deep Learning Approaches for Speech Analysis: A Critical Insight

verfasst von : Alisha Goyal, Advikaa Kapil, Sparsh Sharma, Garima Jaiswal, Arun Sharma

Erschienen in: Artificial Intelligence and Speech Technology

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The main objective of speaker recognition is to identify the voice of an authenticated and authorized individual by extracting features from their voices. The number of published techniques for speaker recognition algorithms is text-dependent. On the other hand, text-independent speech recognition appears to be more advantageous since the user can freely interact with the system. Several scholars have suggested a variety of strategies for detecting speakers, although these systems were difficult and inaccurate. Relying on WOA and Bi-LSTM, this research suggested a text-independent speaker identification algorithm. In presence of various degradation and voice effects, the sample signals were obtained from a available dataset. Following that, MFCC features are extracted from these signals, but only the most important characteristics are chosen from the available features by utilizing WOA to build a single feature set. The Bi-LSTM network receives this feature set and uses it for training and testing. In the MATLAB simulation software, the proposed model’s performance is assessed and compared to that of the standard model. Various dependent factors, like accuracy, sensitivity, specificity, precision, recall, and Fscore, were used to calculate the simulated outputs. The findings showed that the suggested model is more efficient and precise at recognizing speaker voices.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zilovic, M.S., Ramachandran, R.P., Mammone, R.J.: Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions. IEEE Trans. Speech Audio Process. 6, 260–267 (1998)CrossRef Zilovic, M.S., Ramachandran, R.P., Mammone, R.J.: Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions. IEEE Trans. Speech Audio Process. 6, 260–267 (1998)CrossRef
2.
Zurück zum Zitat Tranter, S., Reynolds, D.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14, 1557–1565 (2006)CrossRef Tranter, S., Reynolds, D.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14, 1557–1565 (2006)CrossRef
3.
Zurück zum Zitat Alexander, A., Botti, F., Dessimoz, D., Drygajlo, A.: The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. Forensic Sci. Int. 146S, 95–99 (2004)CrossRef Alexander, A., Botti, F., Dessimoz, D., Drygajlo, A.: The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. Forensic Sci. Int. 146S, 95–99 (2004)CrossRef
4.
Zurück zum Zitat Hansen, J., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. Sign. Process. Mag. IEEE 32, 74–99 (2015)CrossRef Hansen, J., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. Sign. Process. Mag. IEEE 32, 74–99 (2015)CrossRef
5.
Zurück zum Zitat Jothilakshmi, S., Gudivada, V.N.: Large scale data enabled evolution of spoken language research and applications. Elsevier 35, 301–340 (2016) Jothilakshmi, S., Gudivada, V.N.: Large scale data enabled evolution of spoken language research and applications. Elsevier 35, 301–340 (2016)
6.
Zurück zum Zitat Kekre, H., Kulkarni, V.: Closed set and open set Speaker Identification using amplitude distribution of different transforms. In: 2013 International Conference on Advances in Technology and Engineering, pp. 1–8 (2013) Kekre, H., Kulkarni, V.: Closed set and open set Speaker Identification using amplitude distribution of different transforms. In: 2013 International Conference on Advances in Technology and Engineering, pp. 1–8 (2013)
7.
Zurück zum Zitat Mathu, S., et al.: Speaker recognition system and its forensic implications. Open Access Scientific Reports (2013) Mathu, S., et al.: Speaker recognition system and its forensic implications. Open Access Scientific Reports (2013)
8.
Zurück zum Zitat Imdad, M.N., et al.: Speaker recognition in noisy environment. Int. J. Adv. Res. Comput. Sci. Electron. Eng. 1, 52–57 (2012) Imdad, M.N., et al.: Speaker recognition in noisy environment. Int. J. Adv. Res. Comput. Sci. Electron. Eng. 1, 52–57 (2012)
9.
Zurück zum Zitat Imam, S.A., et al.: Review: speaker recognition using automated systems. AGU Int. J. Eng. Technol. 5, 31–39 (2017) Imam, S.A., et al.: Review: speaker recognition using automated systems. AGU Int. J. Eng. Technol. 5, 31–39 (2017)
10.
Zurück zum Zitat Dhakal, P., Damacharla, P., Javaid, A.Y., Devabhaktuni, V.: A near real-time automatic speaker recognition architecture for voice-based user interface. Mach. Learn. Knowl. Extr. 1, 504–520 (2019)CrossRef Dhakal, P., Damacharla, P., Javaid, A.Y., Devabhaktuni, V.: A near real-time automatic speaker recognition architecture for voice-based user interface. Mach. Learn. Knowl. Extr. 1, 504–520 (2019)CrossRef
11.
Zurück zum Zitat Varun, S., Bansal, P.K.: A review on speaker recognition approaches and challenges. Int. J. Eng. Res. Technol. (IJERT) 2, 1581–1588 (2013) Varun, S., Bansal, P.K.: A review on speaker recognition approaches and challenges. Int. J. Eng. Res. Technol. (IJERT) 2, 1581–1588 (2013)
12.
Zurück zum Zitat Niemi-Laitinen, T., Saastamoinen, J., Kinnunen, T., Fränti, P.: Applying MFCC-based automatic speaker recognition to GSM and forensic data. In: Proceedings of the Second Baltic Conference on Human Language Technologies, pp. 317–322 (2005) Niemi-Laitinen, T., Saastamoinen, J., Kinnunen, T., Fränti, P.: Applying MFCC-based automatic speaker recognition to GSM and forensic data. In: Proceedings of the Second Baltic Conference on Human Language Technologies, pp. 317–322 (2005)
13.
Zurück zum Zitat Pfister, B., Beutler, R.: Estimating the weight of evidence in forensic speaker verification. In: Proceedings of the 8th European Conference on Speech Communication and Technology, pp. 701–704 (2003) Pfister, B., Beutler, R.: Estimating the weight of evidence in forensic speaker verification. In: Proceedings of the 8th European Conference on Speech Communication and Technology, pp. 701–704 (2003)
14.
Zurück zum Zitat Thiruvaran, T., Ambikairajah, E., Epps, J.: FM features for automatic forensic speaker recognition. In: Proceedings of the Interspeech 2008, pp. 1497–1500 (2008) Thiruvaran, T., Ambikairajah, E., Epps, J.: FM features for automatic forensic speaker recognition. In: Proceedings of the Interspeech 2008, pp. 1497–1500 (2008)
16.
Zurück zum Zitat Nayana, P.K., Mathew, D., Thomas, A.: Comparison of text independent speaker identification systems using GMM and i-Vector methods. Procedia Comput. Sci. 115, 47–54 (2017)CrossRef Nayana, P.K., Mathew, D., Thomas, A.: Comparison of text independent speaker identification systems using GMM and i-Vector methods. Procedia Comput. Sci. 115, 47–54 (2017)CrossRef
18.
Zurück zum Zitat Zhao, X., Wei, Y.: Speaker recognition based on deep learning. In: 2019 IEEE International Conference on Real-time Computing and Robotics (RCAR), pp. 283–287 (2019) Zhao, X., Wei, Y.: Speaker recognition based on deep learning. In: 2019 IEEE International Conference on Real-time Computing and Robotics (RCAR), pp. 283–287 (2019)
19.
Zurück zum Zitat Nammous, M.K., Saeed, K., Kobojek, P.: Using a small amount of text-independent speech data for a BiLSTM large-scale speaker identification approach. J. King Saud Univ.- Comput. Inf. Sci. (2020) Nammous, M.K., Saeed, K., Kobojek, P.: Using a small amount of text-independent speech data for a BiLSTM large-scale speaker identification approach. J. King Saud Univ.- Comput. Inf. Sci. (2020)
20.
21.
Zurück zum Zitat Shon, S., Tang, H., Glass, J.: Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model. In: 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1007–1013 (2018) Shon, S., Tang, H., Glass, J.: Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model. In: 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1007–1013 (2018)
22.
Zurück zum Zitat Jagiasi, R., Ghosalkar, S., Kulal, P., Bharambe, A.: CNN based speaker recognition in language and text-independent small scale system. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 176–179 (2019) Jagiasi, R., Ghosalkar, S., Kulal, P., Bharambe, A.: CNN based speaker recognition in language and text-independent small scale system. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 176–179 (2019)
23.
Zurück zum Zitat Mokgonyane, T.B., Sefara, T.J., Modipa, T.I., Mogale, M.M., Manamela, M.J., Manamela, P.J.: Automatic speaker recognition system based on machine learning algorithms. In: 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), pp. 141–146 (2019) Mokgonyane, T.B., Sefara, T.J., Modipa, T.I., Mogale, M.M., Manamela, M.J., Manamela, P.J.: Automatic speaker recognition system based on machine learning algorithms. In: 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), pp. 141–146 (2019)
25.
Zurück zum Zitat Mohammadi, M., Mohammadi, H.R.S.: Weighted I-vector based text-independent speaker verification system. In: 2019 27th Iranian Conference on Electrical Engineering (ICEE), pp. 1647–1653 (2019) Mohammadi, M., Mohammadi, H.R.S.: Weighted I-vector based text-independent speaker verification system. In: 2019 27th Iranian Conference on Electrical Engineering (ICEE), pp. 1647–1653 (2019)
26.
Zurück zum Zitat Huang, D., Mao, Q., Ma, Z., et al.: Latent discriminative representation learning for speaker recognition. Front Inform. Technol. Electron. Eng. 22, 697–708 (2021) Huang, D., Mao, Q., Ma, Z., et al.: Latent discriminative representation learning for speaker recognition. Front Inform. Technol. Electron. Eng. 22, 697–708 (2021)
Metadaten
Titel
Deep Learning Approaches for Speech Analysis: A Critical Insight
verfasst von
Alisha Goyal
Advikaa Kapil
Sparsh Sharma
Garima Jaiswal
Arun Sharma
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-95711-7_7

Premium Partner