Top

Published in:

2022 | OriginalPaper | Chapter

Investigation of CNN-Based Acoustic Modeling for Continuous Hindi Speech Recognition

Authors : Tripti Choudhary, Atul Bansal, Vishal Goyal

Published in: IoT and Analytics for Sensor Networks

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Recently, Convolutional Neural Network (CNN) gains more popularity over hybrid Deep Neural Network (DNN) and Hidden Markov Model (HMM) based acoustic models. CNN has the ability to deal with speech signals and it makes appropriate choice for the Automatic Speech Recognition (ASR) system. The sparse connectivity, weight sharing, and pooling allow CNN to handle a slight position shift in the frequency domain. This property helps to manage speaker and environment variations. CNN works well for speech recognition, but it was not appropriately examined for the Hindi speech recognition system. The activation functions and optimization techniques play a vital role in CNN to achieve high accuracy. In this work, we investigate the impact of various activation functions and optimization techniques in the Hindi-ASR system. All the experiments were performed on the Hindi speech dataset developed by TIFR, with the help of the Kaldi and Pytorch-Kaldi toolkit. The experiment results show that the ELU activation function with Rmsprop optimization techniques gives the best Word Error Rate (WER) 14.56%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter English Master AMMU: Advanced Spoken English Chatbot

next chapter Energy Conserving Techniques of Data Mining for Wireless Sensor Networks—A Review

Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2011). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on audio, speech, and language processing, 20(1), 30–42.CrossRef

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef

Mohamed, A. R., Dahl, G., Hinton, G. (2009) Deep belief networks for phone recognition. In Proceedings of Nips workshop on deep learning for speech recognition and related applications. vol. 1(9) (p. 39)

Passricha, V., & Aggarwal, R. K. (2019). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), 1261–1274.CrossRef

Glorot, X., Bengio, Y. (2010) Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249–256).

Rumellhart, D. E. (1986). Learning internal representations by error propagation. Parallel Distributed Processing, 1, 318–362.

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRef

Samudravijaya, K., Rao, P. V., Agrawal, S. S. (2000) Hindi speech database. In Sixth International Conference on Spoken Language Processing

Sahu, P., Dua, M., Kumar, A. (2018) Challenges and issues in adopting speech recognition. In Speech and Language Processing for Human-Machine Communications (pp. 209–215). Singapore: Springer.

10.

Kuamr, A., Dua, M., Choudhary, A. (2014) Implementation and performance evaluation of continuous Hindi speech recognition. In 2014 International Conference on Electronics and Communication Systems (ICECS) (pp. 1–5). India: IEEE.

11.

Kumar, A., Aggarwal, R. K. (2020) A time delay neural network acoustic modeling for Hindi speech recognition. In Advances in Data and Information Sciences (pp. 425–432). Singapore: Springer.

12.

Dua, M., Aggarwal, R. K., & Biswas, M. (2019). Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling. Neural Computing and Applications, 31(10), 6747–6755.CrossRef

13.

Dua, M., Aggarwal, R. K., & Biswas, M. (2019). GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. Journal of Ambient Intelligence and Humanized Computing, 10(6), 2301–2314.CrossRef

14.

Kumar, A., Aggarwal, R. K. (2020) A hybrid CNN-LiGRU acoustic modeling using raw waveform sincnet for Hindi ASR. Computer Science, 21(4).

15.

Kumar, A., Aggarwal, R. K. (2020) Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation. International Journal of Speech Technology, 1–12.

16.

Kumar, A., & Aggarwal, R. K. (2020). Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling. Journal of Intelligent Systems, 30(1), 165–179.MathSciNetCrossRef

17.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J. (2011) The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding 2011 (No. CONF). IEEE Signal Processing Society

18.

Ravanelli, M., Parcollet, T., Bengio, Y. (2019) The pytorch-kaldi speech recognition toolkit. In ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6465–6469). IEEE.

19.

Stolcke, A. (2002) SRILM-an extensible language modeling toolkit. In Seventh international conference on spoken language processing.

Title: Investigation of CNN-Based Acoustic Modeling for Continuous Hindi Speech Recognition
Authors: Tripti Choudhary
Atul Bansal
Vishal Goyal
Publisher: Springer Singapore
Book: IoT and Analytics for Sensor Networks
Print ISBN: 978-981-16-2918-1

Electronic ISBN: 978-981-16-2919-8

Copyright Year: 2022
DOI: https://doi.org/10.1007/978-981-16-2919-8_38

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"