Skip to main content

2024 | OriginalPaper | Buchkapitel

An Investigational Analysis of Automatic Speech Recognition on Deep Neural Networks and Gated Recurrent Unit Model

verfasst von : M. Soundarya, S. Anusuya

Erschienen in: Advances in Data-Driven Computing and Intelligent Systems

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For thousands of years, communication has played a crucial role in human existence, development, and globalization. Speech recognition has several uses, including biometric analysis, education, security, health care, and smart cities. Many scientists have spent years studying how machine learning may be applied to speech processing, particularly voice recognition. But in recent years, researchers have concentrated on ways to apply deep learning to problems involving human speech. In this post, we discuss our work using deep neural networks like CRNN and GRU to recognize audio samples in spoken language. Seven different classes of audio samples (Walk & footsteps, Kids speaking, Filling with water, Bass drum, Scissors, Clock, and Cough) were employed in Free Sound Datasets. Mel-spectral coefficients, along with other spectral and intensity-related factors, are among the feature parameters utilized for recognition. White noise and a retuned voice were employed as data augmentation. An average recognition rate of accuracy 93.25% and WER—Word Error Rate—of 7.84% were obtained by the GRU model, according to the findings.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Khamparia A, Gupta D, Nguyen NG, Khanna A, Pandey B, Tiwari P (2019) Sound classification using convolutional neural network and tensor deep stacking network. IEEE Access 7:7717–7727CrossRef Khamparia A, Gupta D, Nguyen NG, Khanna A, Pandey B, Tiwari P (2019) Sound classification using convolutional neural network and tensor deep stacking network. IEEE Access 7:7717–7727CrossRef
4.
Zurück zum Zitat Yu C, Kang M, Chen Y, Wu J, Zhao X (2020) Acoustic modeling based on deep learning for low-resource speech recognition-an overview. IEEE Access 8:163829–163843CrossRef Yu C, Kang M, Chen Y, Wu J, Zhao X (2020) Acoustic modeling based on deep learning for low-resource speech recognition-an overview. IEEE Access 8:163829–163843CrossRef
6.
9.
Zurück zum Zitat Ruiz B, Domingo P, Hernandez L (1999) A dual speech/speaker recognition using GMM in speaker identification and a HMM in keyword speech recognition. In: Proceedings IEEE 33rd annual 1999 international carnahan conference on security technology (Cat. No.99CH36303), Madrid, Spain, pp 251–254. https://doi.org/10.1109/CCST.1999.797922 Ruiz B, Domingo P, Hernandez L (1999) A dual speech/speaker recognition using GMM in speaker identification and a HMM in keyword speech recognition. In: Proceedings IEEE 33rd annual 1999 international carnahan conference on security technology (Cat. No.99CH36303), Madrid, Spain, pp 251–254. https://​doi.​org/​10.​1109/​CCST.​1999.​797922
14.
Zurück zum Zitat Zrar K. Abdul Abdulbasit KA-T (2022) Mel frequency cepstral coefficient and its applications: a review. IEEE Access 10:122136–122158 Zrar K. Abdul Abdulbasit KA-T (2022) Mel frequency cepstral coefficient and its applications: a review. IEEE Access 10:122136–122158
17.
Zurück zum Zitat Chouhan K, Singh A, Shrivastava A, Agrawal A, Shukla BD, Tomar PS (2021) Structural support vector machine for speech recognition classification with CNN approach. In: 2021 9th international conference on cyber and IT service management (CITSM), Bengkulu, Indonesia, pp 1–7. https://doi.org/10.1109/CITSM52892.2021.9588918 Chouhan K, Singh A, Shrivastava A, Agrawal A, Shukla BD, Tomar PS (2021) Structural support vector machine for speech recognition classification with CNN approach. In: 2021 9th international conference on cyber and IT service management (CITSM), Bengkulu, Indonesia, pp 1–7. https://​doi.​org/​10.​1109/​CITSM52892.​2021.​9588918
Metadaten
Titel
An Investigational Analysis of Automatic Speech Recognition on Deep Neural Networks and Gated Recurrent Unit Model
verfasst von
M. Soundarya
S. Anusuya
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-9521-9_4