Skip to main content
Top
Published in: International Journal of Speech Technology 1/2019

28-01-2019

Development and analysis of Punjabi ASR system for mobile phones under different acoustic models

Authors: Puneet Mittal, Navdeep Singh

Published in: International Journal of Speech Technology | Issue 1/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech technology is widely gaining importance in our daily life. Speech based mobile phone applications are becoming popular in masses due to their usability and ease of access. Speech technology is helping people, with disabilities like blindness and physical abnormalities, to access and control mobile phone applications through voice, without using keypad or touchpad. Punjabi is one of the widely spoken language in various parts of the world. In this paper, an automatic speech recognition (ASR) system for mobile phone applications in Punjabi has been proposed and implemented for four different acoustic models- context independent, context dependent untied, context dependent tied, and context dependent deleted interpolation models. The proposed ASR is evaluated at 4, 16, 32 and 64 GMMs for performance analysis in terms of parameters like accuracy, word error rate and storage space required. It is observed that context dependent untied models outperform others by having better accuracy and lower word error rate, while context independent models require less storage space than others. The choice of fruitful acoustic model depends upon the available storage space as well as desired recognition accuracy. Mobile phones having limited resources may use context independent models, while context dependent untied models can be used to develop ASR system for high end mobile phones.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Adda-Decker, M., Adda, G., Gauvain, J., & Lamel, L. (1999). Large vocabulary speech recognition in French. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (pp. 45–48 vol.1). IEEE. https://doi.org/10.1109/ICASSP.1999.758058. Adda-Decker, M., Adda, G., Gauvain, J., & Lamel, L. (1999). Large vocabulary speech recognition in French. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (pp. 45–48 vol.1). IEEE. https://​doi.​org/​10.​1109/​ICASSP.​1999.​758058.
go back to reference Beaufays, F., & Weintraub, M. & Yochai Konig. (1999). Discriminative mixture weight estimation for large Gaussian mixture models. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (pp. 337–340 vol.1). IEEE. https://doi.org/10.1109/ICASSP.1999.758131. Beaufays, F., & Weintraub, M. & Yochai Konig. (1999). Discriminative mixture weight estimation for large Gaussian mixture models. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (pp. 337–340 vol.1). IEEE. https://​doi.​org/​10.​1109/​ICASSP.​1999.​758131.
go back to reference Beulen, K., Bransch, E., & Ney, H. (1997). State tying for context dependent phoneme models. In European Conference on Speech Comnumicution and Technology (pp. 1179–1182). Beulen, K., Bransch, E., & Ney, H. (1997). State tying for context dependent phoneme models. In European Conference on Speech Comnumicution and Technology (pp. 1179–1182).
go back to reference Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.MathSciNetCrossRefMATH Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.MathSciNetCrossRefMATH
go back to reference Dua, M., Kadyan, V., Aggarwal, R. K., & Dua, S. (2012). Punjabi speech to text system for connected words. In Fourth International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom2012) (pp. 206–209). Institution of Engineering and Technology. https://doi.org/10.1049/cp.2012.2528. Dua, M., Kadyan, V., Aggarwal, R. K., & Dua, S. (2012). Punjabi speech to text system for connected words. In Fourth International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom2012) (pp. 206–209). Institution of Engineering and Technology. https://​doi.​org/​10.​1049/​cp.​2012.​2528.
go back to reference Huang, X. D., Hwang, M.-Y., Li, J., & Mahajan, M. (n.d.). Deleted interpolation and density sharing for continuous hidden Markov models. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (Vol. 2, pp. 885–888). IEEE. https://doi.org/10.1109/ICASSP.1996.543263. Huang, X. D., Hwang, M.-Y., Li, J., & Mahajan, M. (n.d.). Deleted interpolation and density sharing for continuous hidden Markov models. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (Vol. 2, pp. 885–888). IEEE. https://​doi.​org/​10.​1109/​ICASSP.​1996.​543263.
go back to reference Huang, X. D., & Jack, M. A. (1988). Hidden Markov modelling of speech based on a semicontinuous model. Electronics Letters, 24(1), 6–7.CrossRef Huang, X. D., & Jack, M. A. (1988). Hidden Markov modelling of speech based on a semicontinuous model. Electronics Letters, 24(1), 6–7.CrossRef
go back to reference Patel, H. N., & Virparia, P. V. (2011). A Small Vocabulary Speech Recognition for Gujarati. International Journal of Advanced Research in Computer Science, 2(1), 208–210. Patel, H. N., & Virparia, P. V. (2011). A Small Vocabulary Speech Recognition for Gujarati. International Journal of Advanced Research in Computer Science, 2(1), 208–210.
go back to reference Walha, R., Drira, F., El-Abed, H., and A. M. A (2012). On developing an automatic speech recognition system for standard arabic language. International Journal of Electrical and Computer Engineering, 6(10), 1138–1143. Walha, R., Drira, F., El-Abed, H., and A. M. A (2012). On developing an automatic speech recognition system for standard arabic language. International Journal of Electrical and Computer Engineering, 6(10), 1138–1143.
go back to reference Yang, H., Oehlke, C., & Meinel, C. (2011). German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings. In 2011 10th IEEE/ACIS International Conference on Computer and Information Science (pp. 201–206). IEEE. https://doi.org/10.1109/ICIS.2011.38. Yang, H., Oehlke, C., & Meinel, C. (2011). German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings. In 2011 10th IEEE/ACIS International Conference on Computer and Information Science (pp. 201–206). IEEE. https://​doi.​org/​10.​1109/​ICIS.​2011.​38.
Metadata
Title
Development and analysis of Punjabi ASR system for mobile phones under different acoustic models
Authors
Puneet Mittal
Navdeep Singh
Publication date
28-01-2019
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2019
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-019-09593-x

Other articles of this Issue 1/2019

International Journal of Speech Technology 1/2019 Go to the issue