Skip to main content
Erschienen in: International Journal of Speech Technology 2/2022

25.05.2022

Soft-computation based speech recognition system for Sylheti language

verfasst von: Gautam Chakraborty, Mridusmita Sharma, Navajit Saikia, Kandarpa Kumar Sarma

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The encouraging trend of usage of human machine interfaces in diverse areas has driven the evolution of Automatic Speech Recognition (ASR) systems during last two decades. Lately, the inclination has been towards the use of machine learning techniques for under-resourced human languages primarily to focus on designing of voice activated digital tool for a sizable portion of computer illiterate speakers. A vast majority of the works in this field have employed shallow models like conventional Artificial Neural Network and Hidden Markov Model in combination with Mel Frequency Cepstral Coefficients and other relevant features for the applications of speech recognition systems. Although these shallow models are found effective, but to minimize human intervention from the approach and also to yield the better system performance, recent research has focused to incorporate deep learning models for ASR applications especially for under-resourced languages. Sylheti language, a member of Indo-Aryan language group, is an under resourced language which has more than 10 million Sylheti speakers living across the world mostly in India and Bangladesh. Focusing on the need of an ASR model for Sylheti, this work aims to design a robust ASR model for an under resourced language Sylheti by employing state-of-the-art deep learning technique Convolutional Neural Network (CNN). To find out the best and suitable ASR model for Sylheti, certain ASR approaches are formulated and trained by Sylheti isolated and connected words. The specially configured ASR model based on CNN is trained with clean, and noisy speech data which are necessary for training and making the system robust. Thereafter, a comparative analysis is presented by configuring the ASR model by some shallow models like Feed-forward neural network, Recurrent neural Network, Hidden Markov model and Time Delay neural Network. Experimental results indicate that the proposed CNN based ASR system works well for Sylheti language and the performance accuracy obtained by the system is found to be satisfactory despite the system demonstrating certain training latency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alotaibi, Y. A., Alghamdi, M., & Alotaiby, F. (2010). Speech recognition system of Arabic alphabet based on a telephony Arabic corpus. In International conference on image and signal processing, (pp. 122–129). Springer. Alotaibi, Y. A., Alghamdi, M., & Alotaiby, F. (2010). Speech recognition system of Arabic alphabet based on a telephony Arabic corpus. In International conference on image and signal processing, (pp. 122–129). Springer.
Zurück zum Zitat Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100.CrossRef Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100.CrossRef
Zurück zum Zitat Bhardwaj, I., & Londhe, N. D. (2012). Hidden Markov model based isolated Hindi word recognition. In 2012 2nd International conference on power, control and embedded systems (pp. 1–6). IEEE. Bhardwaj, I., & Londhe, N. D. (2012). Hidden Markov model based isolated Hindi word recognition. In 2012 2nd International conference on power, control and embedded systems (pp. 1–6). IEEE.
Zurück zum Zitat Chakraborty, G., & Saikia, N. (2019). Speech recognition of isolated words using a new speech database in sylheti. International Journal of Recent Technology and Engineering, 8, 6259–6268. Chakraborty, G., & Saikia, N. (2019). Speech recognition of isolated words using a new speech database in sylheti. International Journal of Recent Technology and Engineering, 8, 6259–6268.
Zurück zum Zitat Cox, D. D., & Dean, T. (2014). Neural networks and neuroscience-inspired computer vision. Current Biology, 24, R921–R929.CrossRef Cox, D. D., & Dean, T. (2014). Neural networks and neuroscience-inspired computer vision. Current Biology, 24, R921–R929.CrossRef
Zurück zum Zitat Deka, B., Dey, A., & Nirmala, S. R. (2018). Assamese connected digit recognition system. International Journal of Research in Signal Processing, Computing and Communication System Design, 4, 9–12. Deka, B., Dey, A., & Nirmala, S. R. (2018). Assamese connected digit recognition system. International Journal of Research in Signal Processing, Computing and Communication System Design, 4, 9–12.
Zurück zum Zitat Deng, L., Li, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J. et al. (2013). Recent advances in deep learning for speech research at Microsoft. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8604–8608). IEEE. Deng, L., Li, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J. et al. (2013). Recent advances in deep learning for speech research at Microsoft. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8604–8608). IEEE.
Zurück zum Zitat Dhanashri, D., & Dhonde, S. (2017). Isolated word speech recognition system using deep neural networks. In Proceedings of the international conference on data engineering and communication technology (pp. 9–17). Springer. Dhanashri, D., & Dhonde, S. (2017). Isolated word speech recognition system using deep neural networks. In Proceedings of the international conference on data engineering and communication technology (pp. 9–17). Springer.
Zurück zum Zitat Fausett, L. V. (2006). Fundamentals of neural networks: architectures, algorithms and applications. Pearson Education India.MATH Fausett, L. V. (2006). Fundamentals of neural networks: architectures, algorithms and applications. Pearson Education India.MATH
Zurück zum Zitat Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, 20, 1–7.CrossRef Gevaert, W., Tsenov, G., & Mladenov, V. (2010). Neural networks used for speech recognition. Journal of Automatic Control, 20, 1–7.CrossRef
Zurück zum Zitat Goldberg, Y., Hirst, G., Liu, Y., & Zhang, M. (2018). Neural Network Methods for Natural Language Processing. Computational Linguistics, 44, 193–195.CrossRef Goldberg, Y., Hirst, G., Liu, Y., & Zhang, M. (2018). Neural Network Methods for Natural Language Processing. Computational Linguistics, 44, 193–195.CrossRef
Zurück zum Zitat Gope, A. (2018). The phoneme inventory of Sylheti: Acoustic evidences. Journal of Advanced Linguistic Studies, 7, 7–37. Gope, A. (2018). The phoneme inventory of Sylheti: Acoustic evidences. Journal of Advanced Linguistic Studies, 7, 7–37.
Zurück zum Zitat Hori, T., Hori, C., Watanabe, S., & Hershey, J. R. (2016). Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5990–5994). IEEE. Hori, T., Hori, C., Watanabe, S., & Hershey, J. R. (2016). Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5990–5994). IEEE.
Zurück zum Zitat Kimanuka, U. A., & Büyük, O. (2018). Turkish speech recognition based on deep neural networks. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 22, 319–329.CrossRef Kimanuka, U. A., & Büyük, O. (2018). Turkish speech recognition based on deep neural networks. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 22, 319–329.CrossRef
Zurück zum Zitat Nagajyothi, D., & Siddaiah, P. (2018). Speech recognition using convolutional neural networks. International Journal of Engineering and Technology, 7, 133.CrossRef Nagajyothi, D., & Siddaiah, P. (2018). Speech recognition using convolutional neural networks. International Journal of Engineering and Technology, 7, 133.CrossRef
Zurück zum Zitat Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., & Shaalan, K. (2019). Speech recognition using deep neural networks: A systematic review. IEEE Access, 7, 19143–19165.CrossRef Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., & Shaalan, K. (2019). Speech recognition using deep neural networks: A systematic review. IEEE Access, 7, 19143–19165.CrossRef
Zurück zum Zitat Padmanabhan, J., & Johnson Premkumar, M. J. (2015). Machine learning in automatic speech recognition: A survey. IETE Technical Review, 32, 240–251.CrossRef Padmanabhan, J., & Johnson Premkumar, M. J. (2015). Machine learning in automatic speech recognition: A survey. IETE Technical Review, 32, 240–251.CrossRef
Zurück zum Zitat Passricha, V., & Aggarwal, R. K. (2020). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), 1261–1274.CrossRef Passricha, V., & Aggarwal, R. K. (2020). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), 1261–1274.CrossRef
Zurück zum Zitat Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modelling of long temporal contexts. In Sixteenth annual conference of the International Speech Communication Association. Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modelling of long temporal contexts. In Sixteenth annual conference of the International Speech Communication Association.
Zurück zum Zitat Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. PTR Prentice-Hall. Inc. Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. PTR Prentice-Hall. Inc.
Zurück zum Zitat Sharma, M., Sarma, M., & Sarma, K. K. (2013). Recurrent neural network based approach to recognize Assamese vowels using experimentally derived acoustic-phonetic features. In 2013 1st international conference on emerging trends and applications in computer science (pp. 140–143).IEEE. Sharma, M., Sarma, M., & Sarma, K. K. (2013). Recurrent neural network based approach to recognize Assamese vowels using experimentally derived acoustic-phonetic features. In 2013 1st international conference on emerging trends and applications in computer science (pp. 140–143).IEEE.
Zurück zum Zitat Sharma, M., & Sarma, K. K. (2016). Learning aided mood and dialect recognition using telephonic speech. In 2016 International conference on accessibility to digital world (ICADW) (pp. 163–167). IEEE. Sharma, M., & Sarma, K. K. (2016). Learning aided mood and dialect recognition using telephonic speech. In 2016 International conference on accessibility to digital world (ICADW) (pp. 163–167). IEEE.
Zurück zum Zitat Sharma, M., & Sarma, K. K. (2017). Soft computation based spectral and temporal models of linguistically motivated Assamese telephonic conversation recognition. CSI Transactions on ICT, 5, 209–216.CrossRef Sharma, M., & Sarma, K. K. (2017). Soft computation based spectral and temporal models of linguistically motivated Assamese telephonic conversation recognition. CSI Transactions on ICT, 5, 209–216.CrossRef
Zurück zum Zitat Shrawankar, U., & Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: A comparative study. arXiv:1305.1145. Shrawankar, U., & Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: A comparative study. arXiv:​1305.​1145.
Zurück zum Zitat Sokolov, A., & Savchenko, A. V. (2019). Voice command recognition in intelligent systems using deep neural networks. In 2019 IEEE 17th world symposium on applied machine intelligence and informatics (SAMI) (pp.113–116). IEEE. Sokolov, A., & Savchenko, A. V. (2019). Voice command recognition in intelligent systems using deep neural networks. In 2019 IEEE 17th world symposium on applied machine intelligence and informatics (SAMI) (pp.113–116). IEEE.
Zurück zum Zitat Sumon, S. A., Chowdhury, J., Debnath, S., Mohammed, N., & Momen, S.(2018). Bangla short speech commands recognition using convolutional neural networks. In 2018 International conference on Bangla speech and language processing (ICBSLP), (pp. 1–6). Sumon, S. A., Chowdhury, J., Debnath, S., Mohammed, N., & Momen, S.(2018). Bangla short speech commands recognition using convolutional neural networks. In 2018 International conference on Bangla speech and language processing (ICBSLP), (pp. 1–6).
Zurück zum Zitat Telmem, M., & Ghanou, Y. (2020). A comparative study of HMMs and CNN acoustic model in Amazigh recognition system. Advances in Intelligence Systems & Computing A, 1076, 533–540. Telmem, M., & Ghanou, Y. (2020). A comparative study of HMMs and CNN acoustic model in Amazigh recognition system. Advances in Intelligence Systems & Computing A, 1076, 533–540.
Zurück zum Zitat Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328–339.CrossRef Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328–339.CrossRef
Zurück zum Zitat Wang, D., Wang, X., & Lv, S. (2019). End-to-end mandarin speech recognition combining CNN and BLSTM. Symmetry, 11, 644.CrossRef Wang, D., Wang, X., & Lv, S. (2019). End-to-end mandarin speech recognition combining CNN and BLSTM. Symmetry, 11, 644.CrossRef
Zurück zum Zitat Xie, Y., Le, L., Zhou, Y., & Raghavan, V. V. (2018). Deep learning for natural language processing. In Handbook of statistics (Vol. 38, pp. 317–328). Elsevier. Xie, Y., Le, L., Zhou, Y., & Raghavan, V. V. (2018). Deep learning for natural language processing. In Handbook of statistics (Vol. 38, pp. 317–328). Elsevier.
Zurück zum Zitat Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13, 55–75.CrossRef Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13, 55–75.CrossRef
Metadaten
Titel
Soft-computation based speech recognition system for Sylheti language
verfasst von
Gautam Chakraborty
Mridusmita Sharma
Navajit Saikia
Kandarpa Kumar Sarma
Publikationsdatum
25.05.2022
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-022-09976-7

Weitere Artikel der Ausgabe 2/2022

International Journal of Speech Technology 2/2022 Zur Ausgabe

Neuer Inhalt