Skip to main content

2022 | OriginalPaper | Buchkapitel

Survey on Automatic Speech Recognition Systems for Indic Languages

verfasst von : Nandini Sethi, Amita Dev

Erschienen in: Artificial Intelligence and Speech Technology

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

For the past few decades, Automatic Speech Recognition (ASR) has gained a wide range of interest among researchers. From just identifying the digits for a single speaker to authenticating the speaker has a long history of improvisations and experiments. Human’s Speech Recognition has been fascinating problem amongst speech and natural language processing researchers. Speech is the utmost vital and indispensable way of transferring information amongst the human beings. Numerous research works have been equipped in the field of speech processing and recognition in the last few decades. Accordingly, a review of various speech recognition approaches and techniques suitable for text identification from speech is conversed in this survey. The chief inspiration of this review is to discover the prevailing speech recognition approaches and techniques in such a way that the researchers of this field can incorporate entirely the essential parameters in their speech recognition system which helps in overcoming the limitations of existing systems. In this review, various challenges involved in speech recognition process are discussed and what can be the future directives for the researchers of this field is also discussed. The typical speech recognition trials were considered to determine which metrics should be involved in the system and which can be disregarded.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Thasleema, T.M., Kabeer, V., Narayanan, N.K.: Malayalam vowel recognition based on linear predictive coding parameters and k-NN algorithm. In: Proceedings of international conference on computational intelligence and multimedia applications (ICCIMA 2007), pp. 361–365 (2007) Thasleema, T.M., Kabeer, V., Narayanan, N.K.: Malayalam vowel recognition based on linear predictive coding parameters and k-NN algorithm. In: Proceedings of international conference on computational intelligence and multimedia applications (ICCIMA 2007), pp. 361–365 (2007)
2.
Zurück zum Zitat Sinha, S., Agrawal, S.S., Olsen, J.: Development of Hindi mobile communication text and speech corpus. In: Proceedings of O-COCODSA, pp. 30–35 (2011) Sinha, S., Agrawal, S.S., Olsen, J.: Development of Hindi mobile communication text and speech corpus. In: Proceedings of O-COCODSA, pp. 30–35 (2011)
3.
Zurück zum Zitat Dutta, K., Sarma, K.K.: Multiple feature extraction for RNN-based Assamese speech recognition for speech to text conversion application. In: Proceedings of the international conference on communications, devices and intelligent systems (CODIS), pp. 600–603 (2012) Dutta, K., Sarma, K.K.: Multiple feature extraction for RNN-based Assamese speech recognition for speech to text conversion application. In: Proceedings of the international conference on communications, devices and intelligent systems (CODIS), pp. 600–603 (2012)
4.
Zurück zum Zitat Kaur, A., Singh, A.: Optimizing feature extraction techniques constituting phone-based modelling on connected words for Punjabi automatic speech recognition. In: Proceedings of the 2nd International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, pp. 2104–2108 (2016b) Kaur, A., Singh, A.: Optimizing feature extraction techniques constituting phone-based modelling on connected words for Punjabi automatic speech recognition. In: Proceedings of the 2nd International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, pp. 2104–2108 (2016b)
5.
Zurück zum Zitat Kadyan, V., Mantri, A., Aggarwal, R.K., Singh, A.: A comparative study of deep neural network-based Punjabi—ASR system. Int. J. Speech Technol. 22(1), 111–119 (2018)CrossRef Kadyan, V., Mantri, A., Aggarwal, R.K., Singh, A.: A comparative study of deep neural network-based Punjabi—ASR system. Int. J. Speech Technol. 22(1), 111–119 (2018)CrossRef
6.
Zurück zum Zitat Venkateswarlu, R.L.K., Teja, R.R., Kumari, R.V.: Developing efficient speech recognition system for Telugu letter recognition. In: Proceedings of International Conference on Computing, Communication and Applications, pp. 1–6 (2012) Venkateswarlu, R.L.K., Teja, R.R., Kumari, R.V.: Developing efficient speech recognition system for Telugu letter recognition. In: Proceedings of International Conference on Computing, Communication and Applications, pp. 1–6 (2012)
7.
Zurück zum Zitat Kumar, A., Dua, M., Choudhary, A.: Implementation and performance evaluation of continuous Hindi speech recognition. In: Proceedings of International Conference on Electronics and Communication Systems (ICECS), pp. 1–5 (2014a) Kumar, A., Dua, M., Choudhary, A.: Implementation and performance evaluation of continuous Hindi speech recognition. In: Proceedings of International Conference on Electronics and Communication Systems (ICECS), pp. 1–5 (2014a)
9.
Zurück zum Zitat Bhowmik, T., Chowdhury, A., Mandal, S.K.D.: Deep neural network-based place and manner of articulation detection and classifcation for Bengali continuous speech. Procedia Comput. Sci. 125, 895–901 (2018)CrossRef Bhowmik, T., Chowdhury, A., Mandal, S.K.D.: Deep neural network-based place and manner of articulation detection and classifcation for Bengali continuous speech. Procedia Comput. Sci. 125, 895–901 (2018)CrossRef
10.
Zurück zum Zitat Mohamed, F.K., Lajish, V.L.: Nonlinear speech analysis and modeling for Malayalam vowel recognition. Procedia Comput. Sci. 93, 676–682 (2016)CrossRef Mohamed, F.K., Lajish, V.L.: Nonlinear speech analysis and modeling for Malayalam vowel recognition. Procedia Comput. Sci. 93, 676–682 (2016)CrossRef
11.
Zurück zum Zitat Chellapriyadharshini, M., Tofy, A., Srinivasa, R.K.M., Ramasubramanian, V.: Semi-supervised and active-learning scenarios: efficient acoustic model refinement for a low resource Indian language. In: Computer and Languages, pp. 1041–1045 (2018) Chellapriyadharshini, M., Tofy, A., Srinivasa, R.K.M., Ramasubramanian, V.: Semi-supervised and active-learning scenarios: efficient acoustic model refinement for a low resource Indian language. In: Computer and Languages, pp. 1041–1045 (2018)
13.
Zurück zum Zitat Darekar, R.V., Dhande, A.P.: Emotion recognition from Marathi speech database using adaptive artifcial neural network. Biol. Inspired Cognit. Archit. 23, 35–42 (2018) Darekar, R.V., Dhande, A.P.: Emotion recognition from Marathi speech database using adaptive artifcial neural network. Biol. Inspired Cognit. Archit. 23, 35–42 (2018)
14.
Zurück zum Zitat Kurian, C., Balakrishnan, K.: Speech recognition of Malayalam numbers. In: Proceedings of the World Congress on Nature and Biologically Inspired Computing, pp. 1475–1479 (2009) Kurian, C., Balakrishnan, K.: Speech recognition of Malayalam numbers. In: Proceedings of the World Congress on Nature and Biologically Inspired Computing, pp. 1475–1479 (2009)
15.
Zurück zum Zitat Paul, A.K., Das, D., Kamal, M.: Bangla speech recognition system using LPC and ANN. In: Proceedings of the 7th International Conference on Advances in Pattern Recognition, pp. 171–174 (2009) Paul, A.K., Das, D., Kamal, M.: Bangla speech recognition system using LPC and ANN. In: Proceedings of the 7th International Conference on Advances in Pattern Recognition, pp. 171–174 (2009)
16.
Zurück zum Zitat Sarma, B.D., Sarmah, P., Lalhminghlui, W., Prasanna, S.M.: Detection of Mizo tones. In: Proceedings of Sixteenth Annual Conference of the International Speech Communication Association, pp. 934–937 (2015) Sarma, B.D., Sarmah, P., Lalhminghlui, W., Prasanna, S.M.: Detection of Mizo tones. In: Proceedings of Sixteenth Annual Conference of the International Speech Communication Association, pp. 934–937 (2015)
17.
Zurück zum Zitat Sukumar, A.R., Shah, A.F., Anto, P.B.: Isolated question words recognition from speech queries by using artifcial neural networks. In: Proceedings of international conference on computing communication and networking technologies, pp. 1–4 (2010) Sukumar, A.R., Shah, A.F., Anto, P.B.: Isolated question words recognition from speech queries by using artifcial neural networks. In: Proceedings of international conference on computing communication and networking technologies, pp. 1–4 (2010)
18.
Zurück zum Zitat Bhuvanagirir, K., Kopparapu, S.K.: Mixed language speech recognition without explicit identifcation of language. Am. J. Sig. Process. 2(5), 92–97 (2012)CrossRef Bhuvanagirir, K., Kopparapu, S.K.: Mixed language speech recognition without explicit identifcation of language. Am. J. Sig. Process. 2(5), 92–97 (2012)CrossRef
19.
Zurück zum Zitat Das, B., Mandal, S., Mitra, P.: Bengali speech corpus for continuous automatic speech recognition system. In: Proceedings of the International Conference on Speech Database and Assessments, pp. 51–55 (2011) Das, B., Mandal, S., Mitra, P.: Bengali speech corpus for continuous automatic speech recognition system. In: Proceedings of the International Conference on Speech Database and Assessments, pp. 51–55 (2011)
20.
Zurück zum Zitat Sarma, B.D., Sarma, M., Sarma, M., Prasanna, S.R.M.: Development of Assamese phonetic engine: some issues. In: Proceedings of the annual IEEE India Conference (INDICON), pp. 1–6 (2013) Sarma, B.D., Sarma, M., Sarma, M., Prasanna, S.R.M.: Development of Assamese phonetic engine: some issues. In: Proceedings of the annual IEEE India Conference (INDICON), pp. 1–6 (2013)
21.
Zurück zum Zitat Kumar, S.B.S., Rao, K.S., Pati, D.: Phonetic and prosodically rich transcribed speech corpus in Indian languages: Bengali and Odia. In: Proceedings of International Conference Oriental COCOSDA held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), pp. 1–5 (2013a) Kumar, S.B.S., Rao, K.S., Pati, D.: Phonetic and prosodically rich transcribed speech corpus in Indian languages: Bengali and Odia. In: Proceedings of International Conference Oriental COCOSDA held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), pp. 1–5 (2013a)
22.
Zurück zum Zitat Patil, P.P., Pardeshi, S.A.: Devnagari phoneme recognition system. In: Proceedings of the Fourth International Conference on Advances in Computing and Communications (ICACC), pp. 5–8 (2014b) Patil, P.P., Pardeshi, S.A.: Devnagari phoneme recognition system. In: Proceedings of the Fourth International Conference on Advances in Computing and Communications (ICACC), pp. 5–8 (2014b)
23.
Zurück zum Zitat Patil, P.P., Pardeshi, S.A.: Marathi connected word speech recognition system. In: Proceedings of the First International Conference on Networks and Soft Computing (ICNSC), pp. 314–318 (2014a) Patil, P.P., Pardeshi, S.A.: Marathi connected word speech recognition system. In: Proceedings of the First International Conference on Networks and Soft Computing (ICNSC), pp. 314–318 (2014a)
24.
Zurück zum Zitat Hemakumar, G., Punitha, P.: Automatic segmentation of Kannada speech signal into syllables and sub-words: noised and noiseless signals. Int. J. Sci. Eng. Res. 5(1), 1707–1711 (2014) Hemakumar, G., Punitha, P.: Automatic segmentation of Kannada speech signal into syllables and sub-words: noised and noiseless signals. Int. J. Sci. Eng. Res. 5(1), 1707–1711 (2014)
25.
Zurück zum Zitat Patil, V.V., Rao, P.: Detection of phonemic aspiration for spoken Hindi pronunciation evaluation. J. Phon. 54, 202–221 (2016)CrossRef Patil, V.V., Rao, P.: Detection of phonemic aspiration for spoken Hindi pronunciation evaluation. J. Phon. 54, 202–221 (2016)CrossRef
26.
Zurück zum Zitat Dua, M., Aggarwal, R.K., Biswas, M.: Discriminative training using heterogeneous feature vector for Hindi automatic speech recognition system. In: Proceedings of International Conference on Computer and Applications (ICCA), pp. 158–162 (2017) Dua, M., Aggarwal, R.K., Biswas, M.: Discriminative training using heterogeneous feature vector for Hindi automatic speech recognition system. In: Proceedings of International Conference on Computer and Applications (ICCA), pp. 158–162 (2017)
27.
Zurück zum Zitat Kannadaguli, P., Bhat, V.: A comparison of Bayesian and HMM based approaches in machine learning for emotion detection in native Kannada speaker. In: Proceedings of the IEEMA Engineer infinite conference (eTechNxT), pp. 1–6 (2018) Kannadaguli, P., Bhat, V.: A comparison of Bayesian and HMM based approaches in machine learning for emotion detection in native Kannada speaker. In: Proceedings of the IEEMA Engineer infinite conference (eTechNxT), pp. 1–6 (2018)
28.
Zurück zum Zitat Pulugundla, B., et al.: BUT system for low resource Indian language ASR. In: Interspeech, pp. 3182–3186 (2018) Pulugundla, B., et al.: BUT system for low resource Indian language ASR. In: Interspeech, pp. 3182–3186 (2018)
30.
Zurück zum Zitat Samudravijaya, K., Rao, P.V.S., Agrawal, S.S.: Hindi speech database. In: Proceedings of the International Conference on Spoken Language Processing, pp. 456–464 (2002) Samudravijaya, K., Rao, P.V.S., Agrawal, S.S.: Hindi speech database. In: Proceedings of the International Conference on Spoken Language Processing, pp. 456–464 (2002)
31.
Zurück zum Zitat Fathima, N., Patel, T., Mahima, C., Iyengar, A.: TDNN-based multilingual speech recognition system for low resource Indian languages. In: Proceedings of the Inter-speech, pp. 3197–3201 (2018) Fathima, N., Patel, T., Mahima, C., Iyengar, A.: TDNN-based multilingual speech recognition system for low resource Indian languages. In: Proceedings of the Inter-speech, pp. 3197–3201 (2018)
32.
Zurück zum Zitat Pandey, L., Nathwani, K.: LSTM based attentive fusion of spectral and prosodic information for keyword spotting in Hindi language. In: Interspeech, pp 112–116 (2018) Pandey, L., Nathwani, K.: LSTM based attentive fusion of spectral and prosodic information for keyword spotting in Hindi language. In: Interspeech, pp 112–116 (2018)
33.
Zurück zum Zitat Pal, M., Roy, R., Khan, S., Bepari, M.S., Basu, J.: PannoMulloKathan: voice enabled mobile app for agricultural commodity price dissemination in Bengali language. In: Interspeech, pp. 1491–1492 (2018) Pal, M., Roy, R., Khan, S., Bepari, M.S., Basu, J.: PannoMulloKathan: voice enabled mobile app for agricultural commodity price dissemination in Bengali language. In: Interspeech, pp. 1491–1492 (2018)
37.
Zurück zum Zitat Bhatt, S., Dev, A., Jain, A.: Effects of the dynamic and energy-based feature extraction on Hindi speech recognition. Recent Adv. Comput. Sci. Commun. 14(5), 1422–1430 (2021)CrossRef Bhatt, S., Dev, A., Jain, A.: Effects of the dynamic and energy-based feature extraction on Hindi speech recognition. Recent Adv. Comput. Sci. Commun. 14(5), 1422–1430 (2021)CrossRef
39.
Zurück zum Zitat Kumari, R., Dev, A., Kumar, A.: Automatic segmentation of Hindi speech into syllable-like units. Int. J. Adv. Comput. Sci. Appl. 11(5), 400–406 (2020) Kumari, R., Dev, A., Kumar, A.: Automatic segmentation of Hindi speech into syllable-like units. Int. J. Adv. Comput. Sci. Appl. 11(5), 400–406 (2020)
42.
Zurück zum Zitat Kumari, R., Dev, A., Kumar, A.: An efficient adaptive artificial neural network-based text to speech synthesizer for Hindi language. Multimedia Tools Appl. 80(16), 24669–24695 (2021)CrossRef Kumari, R., Dev, A., Kumar, A.: An efficient adaptive artificial neural network-based text to speech synthesizer for Hindi language. Multimedia Tools Appl. 80(16), 24669–24695 (2021)CrossRef
43.
Zurück zum Zitat Sethi, N., Prajapati, D.K.: Text-independent voice authentication system using MFCC features. In: Gupta, D., Khanna, A., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds.) International Conference on Innovative Computing and Communications. AISC, vol. 1165, pp. 567–577. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5113-0_45CrossRef Sethi, N., Prajapati, D.K.: Text-independent voice authentication system using MFCC features. In: Gupta, D., Khanna, A., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds.) International Conference on Innovative Computing and Communications. AISC, vol. 1165, pp. 567–577. Springer, Singapore (2021). https://​doi.​org/​10.​1007/​978-981-15-5113-0_​45CrossRef
44.
Zurück zum Zitat Sethi, N., Kumar, A., Swami, R.: Automated web development: theme detection and code generation using Mix-NLP. In: ACM International Conference Proceeding Series, p. a45 (2019) Sethi, N., Kumar, A., Swami, R.: Automated web development: theme detection and code generation using Mix-NLP. In: ACM International Conference Proceeding Series, p. a45 (2019)
45.
Zurück zum Zitat Sethi, D., Sethi, N., Gambhir, P., Anand, R.: E-Pandit: automated voice-based system for religious puja's. In: ICRITO 2020 - IEEE 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions), pp. 174–181, 9197831 (2020) Sethi, D., Sethi, N., Gambhir, P., Anand, R.: E-Pandit: automated voice-based system for religious puja's. In: ICRITO 2020 - IEEE 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions), pp. 174–181, 9197831 (2020)
46.
Zurück zum Zitat Sethi, N., Agrawal, P., Madaan, V., Singh, S.K., Kumar, A.: Automated title generation in English language using NLP. Int. J. Control Theor. Appl. 9(Specialissue11), 5159–5168 (2016) Sethi, N., Agrawal, P., Madaan, V., Singh, S.K., Kumar, A.: Automated title generation in English language using NLP. Int. J. Control Theor. Appl. 9(Specialissue11), 5159–5168 (2016)
Metadaten
Titel
Survey on Automatic Speech Recognition Systems for Indic Languages
verfasst von
Nandini Sethi
Amita Dev
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-95711-7_8

Premium Partner