Skip to main content
Top

2022 | OriginalPaper | Chapter

Investigation of CNN-Based Acoustic Modeling for Continuous Hindi Speech Recognition

Authors : Tripti Choudhary, Atul Bansal, Vishal Goyal

Published in: IoT and Analytics for Sensor Networks

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recently, Convolutional Neural Network (CNN) gains more popularity over hybrid Deep Neural Network (DNN) and Hidden Markov Model (HMM) based acoustic models. CNN has the ability to deal with speech signals and it makes appropriate choice for the Automatic Speech Recognition (ASR) system. The sparse connectivity, weight sharing, and pooling allow CNN to handle a slight position shift in the frequency domain. This property helps to manage speaker and environment variations. CNN works well for speech recognition, but it was not appropriately examined for the Hindi speech recognition system. The activation functions and optimization techniques play a vital role in CNN to achieve high accuracy. In this work, we investigate the impact of various activation functions and optimization techniques in the Hindi-ASR system. All the experiments were performed on the Hindi speech dataset developed by TIFR, with the help of the Kaldi and Pytorch-Kaldi toolkit. The experiment results show that the ELU activation function with Rmsprop optimization techniques gives the best Word Error Rate (WER) 14.56%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2011). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on audio, speech, and language processing, 20(1), 30–42.CrossRef Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2011). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on audio, speech, and language processing, 20(1), 30–42.CrossRef
2.
go back to reference Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRef
3.
go back to reference Mohamed, A. R., Dahl, G., Hinton, G. (2009) Deep belief networks for phone recognition. In Proceedings of Nips workshop on deep learning for speech recognition and related applications. vol. 1(9) (p. 39) Mohamed, A. R., Dahl, G., Hinton, G. (2009) Deep belief networks for phone recognition. In Proceedings of Nips workshop on deep learning for speech recognition and related applications. vol. 1(9) (p. 39)
4.
go back to reference Passricha, V., & Aggarwal, R. K. (2019). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), 1261–1274.CrossRef Passricha, V., & Aggarwal, R. K. (2019). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), 1261–1274.CrossRef
5.
go back to reference Glorot, X., Bengio, Y. (2010) Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249–256). Glorot, X., Bengio, Y. (2010) Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249–256).
6.
go back to reference Rumellhart, D. E. (1986). Learning internal representations by error propagation. Parallel Distributed Processing, 1, 318–362. Rumellhart, D. E. (1986). Learning internal representations by error propagation. Parallel Distributed Processing, 1, 318–362.
7.
go back to reference LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRef LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRef
8.
go back to reference Samudravijaya, K., Rao, P. V., Agrawal, S. S. (2000) Hindi speech database. In Sixth International Conference on Spoken Language Processing Samudravijaya, K., Rao, P. V., Agrawal, S. S. (2000) Hindi speech database. In Sixth International Conference on Spoken Language Processing
9.
go back to reference Sahu, P., Dua, M., Kumar, A. (2018) Challenges and issues in adopting speech recognition. In Speech and Language Processing for Human-Machine Communications (pp. 209–215). Singapore: Springer. Sahu, P., Dua, M., Kumar, A. (2018) Challenges and issues in adopting speech recognition. In Speech and Language Processing for Human-Machine Communications (pp. 209–215). Singapore: Springer.
10.
go back to reference Kuamr, A., Dua, M., Choudhary, A. (2014) Implementation and performance evaluation of continuous Hindi speech recognition. In 2014 International Conference on Electronics and Communication Systems (ICECS) (pp. 1–5). India: IEEE. Kuamr, A., Dua, M., Choudhary, A. (2014) Implementation and performance evaluation of continuous Hindi speech recognition. In 2014 International Conference on Electronics and Communication Systems (ICECS) (pp. 1–5). India: IEEE.
11.
go back to reference Kumar, A., Aggarwal, R. K. (2020) A time delay neural network acoustic modeling for Hindi speech recognition. In Advances in Data and Information Sciences (pp. 425–432). Singapore: Springer. Kumar, A., Aggarwal, R. K. (2020) A time delay neural network acoustic modeling for Hindi speech recognition. In Advances in Data and Information Sciences (pp. 425–432). Singapore: Springer.
12.
go back to reference Dua, M., Aggarwal, R. K., & Biswas, M. (2019). Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling. Neural Computing and Applications, 31(10), 6747–6755.CrossRef Dua, M., Aggarwal, R. K., & Biswas, M. (2019). Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling. Neural Computing and Applications, 31(10), 6747–6755.CrossRef
13.
go back to reference Dua, M., Aggarwal, R. K., & Biswas, M. (2019). GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. Journal of Ambient Intelligence and Humanized Computing, 10(6), 2301–2314.CrossRef Dua, M., Aggarwal, R. K., & Biswas, M. (2019). GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. Journal of Ambient Intelligence and Humanized Computing, 10(6), 2301–2314.CrossRef
14.
go back to reference Kumar, A., Aggarwal, R. K. (2020) A hybrid CNN-LiGRU acoustic modeling using raw waveform sincnet for Hindi ASR. Computer Science, 21(4). Kumar, A., Aggarwal, R. K. (2020) A hybrid CNN-LiGRU acoustic modeling using raw waveform sincnet for Hindi ASR. Computer Science, 21(4).
15.
go back to reference Kumar, A., Aggarwal, R. K. (2020) Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation. International Journal of Speech Technology, 1–12. Kumar, A., Aggarwal, R. K. (2020) Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation. International Journal of Speech Technology, 1–12.
16.
go back to reference Kumar, A., & Aggarwal, R. K. (2020). Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling. Journal of Intelligent Systems, 30(1), 165–179.MathSciNetCrossRef Kumar, A., & Aggarwal, R. K. (2020). Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling. Journal of Intelligent Systems, 30(1), 165–179.MathSciNetCrossRef
17.
go back to reference Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J. (2011) The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding 2011 (No. CONF). IEEE Signal Processing Society Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J. (2011) The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding 2011 (No. CONF). IEEE Signal Processing Society
18.
go back to reference Ravanelli, M., Parcollet, T., Bengio, Y. (2019) The pytorch-kaldi speech recognition toolkit. In ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6465–6469). IEEE. Ravanelli, M., Parcollet, T., Bengio, Y. (2019) The pytorch-kaldi speech recognition toolkit. In ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6465–6469). IEEE.
19.
go back to reference Stolcke, A. (2002) SRILM-an extensible language modeling toolkit. In Seventh international conference on spoken language processing. Stolcke, A. (2002) SRILM-an extensible language modeling toolkit. In Seventh international conference on spoken language processing.
Metadata
Title
Investigation of CNN-Based Acoustic Modeling for Continuous Hindi Speech Recognition
Authors
Tripti Choudhary
Atul Bansal
Vishal Goyal
Copyright Year
2022
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-16-2919-8_38