nach oben

Erschienen in:

2022 | OriginalPaper | Buchkapitel

Investigation of CNN-Based Acoustic Modeling for Continuous Hindi Speech Recognition

verfasst von : Tripti Choudhary, Atul Bansal, Vishal Goyal

Erschienen in: IoT and Analytics for Sensor Networks

Verlag: Springer Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Recently, Convolutional Neural Network (CNN) gains more popularity over hybrid Deep Neural Network (DNN) and Hidden Markov Model (HMM) based acoustic models. CNN has the ability to deal with speech signals and it makes appropriate choice for the Automatic Speech Recognition (ASR) system. The sparse connectivity, weight sharing, and pooling allow CNN to handle a slight position shift in the frequency domain. This property helps to manage speaker and environment variations. CNN works well for speech recognition, but it was not appropriately examined for the Hindi speech recognition system. The activation functions and optimization techniques play a vital role in CNN to achieve high accuracy. In this work, we investigate the impact of various activation functions and optimization techniques in the Hindi-ASR system. All the experiments were performed on the Hindi speech dataset developed by TIFR, with the help of the Kaldi and Pytorch-Kaldi toolkit. The experiment results show that the ELU activation function with Rmsprop optimization techniques gives the best Word Error Rate (WER) 14.56%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel English Master AMMU: Advanced Spoken English Chatbot

Nächstes Kapitel Energy Conserving Techniques of Data Mining for Wireless Sensor Networks—A Review

Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2011). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on audio, speech, and language processing, 20(1), 30–42.CrossRef

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef

Mohamed, A. R., Dahl, G., Hinton, G. (2009) Deep belief networks for phone recognition. In Proceedings of Nips workshop on deep learning for speech recognition and related applications. vol. 1(9) (p. 39)

Passricha, V., & Aggarwal, R. K. (2019). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), 1261–1274.CrossRef

Glorot, X., Bengio, Y. (2010) Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249–256).

Rumellhart, D. E. (1986). Learning internal representations by error propagation. Parallel Distributed Processing, 1, 318–362.

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRef

Samudravijaya, K., Rao, P. V., Agrawal, S. S. (2000) Hindi speech database. In Sixth International Conference on Spoken Language Processing

Sahu, P., Dua, M., Kumar, A. (2018) Challenges and issues in adopting speech recognition. In Speech and Language Processing for Human-Machine Communications (pp. 209–215). Singapore: Springer.

10.

Kuamr, A., Dua, M., Choudhary, A. (2014) Implementation and performance evaluation of continuous Hindi speech recognition. In 2014 International Conference on Electronics and Communication Systems (ICECS) (pp. 1–5). India: IEEE.

11.

Kumar, A., Aggarwal, R. K. (2020) A time delay neural network acoustic modeling for Hindi speech recognition. In Advances in Data and Information Sciences (pp. 425–432). Singapore: Springer.

12.

Dua, M., Aggarwal, R. K., & Biswas, M. (2019). Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling. Neural Computing and Applications, 31(10), 6747–6755.CrossRef

13.

Dua, M., Aggarwal, R. K., & Biswas, M. (2019). GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. Journal of Ambient Intelligence and Humanized Computing, 10(6), 2301–2314.CrossRef

14.

Kumar, A., Aggarwal, R. K. (2020) A hybrid CNN-LiGRU acoustic modeling using raw waveform sincnet for Hindi ASR. Computer Science, 21(4).

15.

Kumar, A., Aggarwal, R. K. (2020) Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation. International Journal of Speech Technology, 1–12.

16.

Kumar, A., & Aggarwal, R. K. (2020). Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling. Journal of Intelligent Systems, 30(1), 165–179.MathSciNetCrossRef

17.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J. (2011) The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding 2011 (No. CONF). IEEE Signal Processing Society

18.

Ravanelli, M., Parcollet, T., Bengio, Y. (2019) The pytorch-kaldi speech recognition toolkit. In ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6465–6469). IEEE.

19.

Stolcke, A. (2002) SRILM-an extensible language modeling toolkit. In Seventh international conference on spoken language processing.

Titel: Investigation of CNN-Based Acoustic Modeling for Continuous Hindi Speech Recognition
verfasst von: Tripti Choudhary
Atul Bansal
Vishal Goyal
Verlag: Springer Singapore
Buch: IoT and Analytics for Sensor Networks
Print ISBN: 978-981-16-2918-1

Electronic ISBN: 978-981-16-2919-8

Copyright-Jahr: 2022
DOI: https://doi.org/10.1007/978-981-16-2919-8_38

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.