Skip to main content
Top
Published in: International Journal of Speech Technology 1/2019

12-01-2019

Development and analysis of multilingual phone recognition systems using Indian languages

Authors: K. E. Manjunath, Dinesh Babu Jayagopi, K. Sreenivasa Rao, V. Ramasubramanian

Published in: International Journal of Speech Technology | Issue 1/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, the development of Multilingual Phone Recognition System (Multi-PRS) using four Indian languages—Kannada, Telugu, Bengali, and Odia—is described. Multi-PRS is an universal Phone Recognition System (PRS), which performs the phone recognition independent of any language. International phonetic alphabets based transcription is used for grouping the acoustically similar phonetic units from multiple languages. Multilingual phone recognisers for Indian languages are studied using two broad groups namely—Dravidian languages and Indo-Aryan languages. Dravidian and Indo-Aryan languages are grouped separately to develop Bilingual PRSs. We have explored both HMMs and DNNs for developing PRSs under both context-dependent and context-independent setups. The state-of-the-art DNNs have outperformed the HMMs. The performance of Multi-PRSs is analysed and compared with that of the monolingual PRSs. The advantages of Multi-PRSs over monolingual PRSs are discussed. Further, we have developed tandem Multi-PRSs using phone posteriors as tandem features to improve the performance of the baseline Multi-PRSs. It is found that the tandem Multi-PRSs have outperformed the baseline Multi-PRSs in all the cases.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
Literature
go back to reference Corredor-Ardoy, C. et al. (1998). Multilingual phone recognition of spontaneous telephone speech. In ICASSP, pp. 413–416. Corredor-Ardoy, C. et al. (1998). Multilingual phone recognition of spontaneous telephone speech. In ICASSP, pp. 413–416.
go back to reference Frankel, J., Magimai-Doss, M., King, S., Livescu, K., & Cetin, O. (2007). Articulatory feature classifiers trained on 2000 hours of telephone speech. In Interspeech. Frankel, J., Magimai-Doss, M., King, S., Livescu, K., & Cetin, O. (2007). Articulatory feature classifiers trained on 2000 hours of telephone speech. In Interspeech.
go back to reference Gangashetty, S. V., Chandra Sekhar, C., & Yegnanarayana, B. (2005) Spotting multilingual consonant-vowel units of speech using neural network models. In International conference on non-linear speech processing (NOLISP), pp. 303–317. Gangashetty, S. V., Chandra Sekhar, C., & Yegnanarayana, B. (2005) Spotting multilingual consonant-vowel units of speech using neural network models. In International conference on non-linear speech processing (NOLISP), pp. 303–317.
go back to reference Golla V. (2011). California Indian languages. London: University of California Press—Language Arts & Disciplines Golla V. (2011). California Indian languages. London: University of California Press—Language Arts & Disciplines
go back to reference Hermansky, H., Ellis, D. P., & Sharma, S. (2000). Tandem connectionist feature extraction for conventional HMM systems. In IEEE international conference on acoustics, speech and signal processing (ICASSP), vol. 3, pp. 1635–1638. Hermansky, H., Ellis, D. P., & Sharma, S. (2000). Tandem connectionist feature extraction for conventional HMM systems. In IEEE international conference on acoustics, speech and signal processing (ICASSP), vol. 3, pp. 1635–1638.
go back to reference Ketabdar, H., & Bourlard, H. (2008). Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 4065–4068. Ketabdar, H., & Bourlard, H. (2008). Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 4065–4068.
go back to reference Kiran, R. R., Kumar, S. S., Manjunath, K. E., Satapathy, B., Chaturvedi, A., Pati, D., et al. (2013). Automatic phonetic and prosodic transcription for Indian languages: Bengali and Odia. In 10th International conference on natural language processing (ICON). Kiran, R. R., Kumar, S. S., Manjunath, K. E., Satapathy, B., Chaturvedi, A., Pati, D., et al. (2013). Automatic phonetic and prosodic transcription for Indian languages: Bengali and Odia. In 10th International conference on natural language processing (ICON).
go back to reference Madhavi, M. C., Sharma, S., & Patil, H. A. (2014). Development of language resources for speech application in Gujarati and Marathi. In IEEE International conference on asian language processing (IALP), vol. 1, pp. 115–118. Madhavi, M. C., Sharma, S., & Patil, H. A. (2014). Development of language resources for speech application in Gujarati and Marathi. In IEEE International conference on asian language processing (IALP), vol. 1, pp. 115–118.
go back to reference Manjunath, K. E., & Sreenivasa Rao, K. S. (2014). Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali. In IEEE national conference on communications (NCC). Manjunath, K. E., & Sreenivasa Rao, K. S. (2014). Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali. In IEEE national conference on communications (NCC).
go back to reference Manjunath, K. E., Sreenivasa Rao, K. S., & Jayagopi, D. B. (2017). Development of multilingual phone recognition system for Indian languages. In IEEE international conference on signal processing, informatics, communication and energy systems (SPICES). Manjunath, K. E., Sreenivasa Rao, K. S., & Jayagopi, D. B. (2017). Development of multilingual phone recognition system for Indian languages. In IEEE international conference on signal processing, informatics, communication and energy systems (SPICES).
go back to reference Manjunath, K. E., Sreenivasa Rao, K. S., Jayagopi, D. B., & Ramasubramanian, V. (2018). Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion. In INTERSPEECH. Manjunath, K. E., Sreenivasa Rao, K. S., Jayagopi, D. B., & Ramasubramanian, V. (2018). Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion. In INTERSPEECH.
go back to reference Mohan, A., Rose, R., Ghalehjegh, S. H., & Umesh, S. (2014). Acoustic modelling for speech recognition in Indian languages inan agricultural commodities task domain. Speech Communication, 56, 167–180.CrossRef Mohan, A., Rose, R., Ghalehjegh, S. H., & Umesh, S. (2014). Acoustic modelling for speech recognition in Indian languages inan agricultural commodities task domain. Speech Communication, 56, 167–180.CrossRef
go back to reference Muller, M., Stuker, S., & Waibel, A. (2016). Towards improving low-resource speech recognition using articulatory and language features. In International workshop on spoken language translation (IWSLT), pp. 1–7. Muller, M., Stuker, S., & Waibel, A. (2016). Towards improving low-resource speech recognition using articulatory and language features. In International workshop on spoken language translation (IWSLT), pp. 1–7.
go back to reference Muller, M., & Waibel, A. (2015). Using language adaptive deep neural networks for improved multilingual speech recognition. In International workshop on spoken language translation (IWSLT). Muller, M., & Waibel, A. (2015). Using language adaptive deep neural networks for improved multilingual speech recognition. In International workshop on spoken language translation (IWSLT).
go back to reference Pinto, J., Garimella, S., Magimai-Doss, M., Hermansky, H., & Bourlard, H. (2011). Analysis of MLP-based hierarchical phoneme posterior probability estimator. IEEE transactions on audio, speech, and language processing, 19(2), 225–241.CrossRef Pinto, J., Garimella, S., Magimai-Doss, M., Hermansky, H., & Bourlard, H. (2011). Analysis of MLP-based hierarchical phoneme posterior probability estimator. IEEE transactions on audio, speech, and language processing, 19(2), 225–241.CrossRef
go back to reference Rabiner, L., Juang, B., & Yegnanarayana, B. (2008). Fundamentals of speech recognition. London: Pearson Education. Rabiner, L., Juang, B., & Yegnanarayana, B. (2008). Fundamentals of speech recognition. London: Pearson Education.
go back to reference Riedhammer, K. T., Bocklet, T., Ghoshal, A., & Povey, D. (2012). Revisiting semi-continuous hidden Markov models. In ICASSP, pp. 4721– 4724. Riedhammer, K. T., Bocklet, T., Ghoshal, A., & Povey, D. (2012). Revisiting semi-continuous hidden Markov models. In ICASSP, pp. 4721– 4724.
go back to reference Santhosh Kumar, C., Mohandas, V. P., & Haizhou, L. (2005). Multilingual speech recognition: A unified approach. In Interspeech. Santhosh Kumar, C., Mohandas, V. P., & Haizhou, L. (2005). Multilingual speech recognition: A unified approach. In Interspeech.
go back to reference Sarma, B. D., Sarma, M., Sarma, M., & Prasanna, S. R. M. (2013). Development of assamese phonetic engine: Some issues. In IEEE INDICON, pp. 1–6. Sarma, B. D., Sarma, M., Sarma, M., & Prasanna, S. R. M. (2013). Development of assamese phonetic engine: Some issues. In IEEE INDICON, pp. 1–6.
go back to reference Schultz, T., & Kirchhoff, K. (2006). Multilingual speech processing. Cambridge: Academic Press. Schultz, T., & Kirchhoff, K. (2006). Multilingual speech processing. Cambridge: Academic Press.
go back to reference Schultz, T., & Waibel, A. (1998a). Language independent and language adaptive large vocabulary speech recognition. In International conference on spoken language processing (ICSLP), pp. 1819–1822. Schultz, T., & Waibel, A. (1998a). Language independent and language adaptive large vocabulary speech recognition. In International conference on spoken language processing (ICSLP), pp. 1819–1822.
go back to reference Schultz, T., & Waibel, A. (1998b). Multilingual and crosslingual speech recognition. In Proceedings of DARPA workshop on broadcast news transcription and understanding, pp. 259–262. Schultz, T., & Waibel, A. (1998b). Multilingual and crosslingual speech recognition. In Proceedings of DARPA workshop on broadcast news transcription and understanding, pp. 259–262.
go back to reference Schultz, T., & Waibel, A. (2001). Language independent and language adaptive acoustic modeling for speech recognition. Speech Communication, 35, 31–51.CrossRefMATH Schultz, T., & Waibel, A. (2001). Language independent and language adaptive acoustic modeling for speech recognition. Speech Communication, 35, 31–51.CrossRefMATH
go back to reference Shridhara, M. V., Banahatti, B. K., Narthan, L., Karjigi, V., & Kumaraswamy, R. (2013). Development of Kannada speech corpus for prosodically guided phonetic search engine. In O-COCOSDA, pp. 1–6. Shridhara, M. V., Banahatti, B. K., Narthan, L., Karjigi, V., & Kumaraswamy, R. (2013). Development of Kannada speech corpus for prosodically guided phonetic search engine. In O-COCOSDA, pp. 1–6.
go back to reference Siniscalchi, S. M., Lyu, D., Svendsen, T., & Lee, C. (2012). Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data. IEEE Transactions on Acoustics, Speech, and Signal Processing, 20(3), 875–887. Siniscalchi, S. M., Lyu, D., Svendsen, T., & Lee, C. (2012). Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data. IEEE Transactions on Acoustics, Speech, and Signal Processing, 20(3), 875–887.
go back to reference Sunil Kumar, S. B., Sreenivasa Rao, K., & Pati, D. (2013). Phonetic and prosodically rich transcribed speech corpus in Indian languages : Bengali and Odia. In Sixteenth International Oriental COCOSDA. Sunil Kumar, S. B., Sreenivasa Rao, K., & Pati, D. (2013). Phonetic and prosodically rich transcribed speech corpus in Indian languages : Bengali and Odia. In Sixteenth International Oriental COCOSDA.
go back to reference Vuppala, A. K., Yadav, J., Chakrabarti, S., & Sreenivasa Rao, K. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech and Language Processing, 20, 1894–1903.CrossRef Vuppala, A. K., Yadav, J., Chakrabarti, S., & Sreenivasa Rao, K. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech and Language Processing, 20, 1894–1903.CrossRef
go back to reference Zhang, X., Trmal, J., Povey, D., & Khudanpur, S. (2014). Improving deep neural network acoustic models using generalized maxout networks. In ICASSP, pp. 215–219. Zhang, X., Trmal, J., Povey, D., & Khudanpur, S. (2014). Improving deep neural network acoustic models using generalized maxout networks. In ICASSP, pp. 215–219.
Metadata
Title
Development and analysis of multilingual phone recognition systems using Indian languages
Authors
K. E. Manjunath
Dinesh Babu Jayagopi
K. Sreenivasa Rao
V. Ramasubramanian
Publication date
12-01-2019
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2019
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-09589-z

Other articles of this Issue 1/2019

International Journal of Speech Technology 1/2019 Go to the issue