Weitere Artikel dieser Ausgabe durch Wischen aufrufen
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Speech technology is widely gaining importance in our daily life. Speech based mobile phone applications are becoming popular in masses due to their usability and ease of access. Speech technology is helping people, with disabilities like blindness and physical abnormalities, to access and control mobile phone applications through voice, without using keypad or touchpad. Punjabi is one of the widely spoken language in various parts of the world. In this paper, an automatic speech recognition (ASR) system for mobile phone applications in Punjabi has been proposed and implemented for four different acoustic models- context independent, context dependent untied, context dependent tied, and context dependent deleted interpolation models. The proposed ASR is evaluated at 4, 16, 32 and 64 GMMs for performance analysis in terms of parameters like accuracy, word error rate and storage space required. It is observed that context dependent untied models outperform others by having better accuracy and lower word error rate, while context independent models require less storage space than others. The choice of fruitful acoustic model depends upon the available storage space as well as desired recognition accuracy. Mobile phones having limited resources may use context independent models, while context dependent untied models can be used to develop ASR system for high end mobile phones.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Acoustic Model Types – CMUSphinx Open Source Speech Recognition. (n.d.). Retrieved March 16, 2018 from https://cmusphinx.github.io/wiki/acousticmodeltypes/.
Adda-Decker, M., Adda, G., Gauvain, J., & Lamel, L. (1999). Large vocabulary speech recognition in French. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (pp. 45–48 vol.1). IEEE. https://doi.org/10.1109/ICASSP.1999.758058.
Aggarwal, R. K., & Dave, M. (2011). Discriminative techniques for hindi speech recognition system (pp. 261–266). Berlin: Springer. https://doi.org/10.1007/978-3-642-19403-0_45.
Beaufays, F., & Weintraub, M. & Yochai Konig. (1999). Discriminative mixture weight estimation for large Gaussian mixture models. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (pp. 337–340 vol.1). IEEE. https://doi.org/10.1109/ICASSP.1999.758131.
Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100. https://doi.org/10.1016/J.SPECOM.2013.07.008. CrossRef
Beulen, K., Bransch, E., & Ney, H. (1997). State tying for context dependent phoneme models. In European Conference on Speech Comnumicution and Technology (pp. 1179–1182).
Dua, M., Kadyan, V., Aggarwal, R. K., & Dua, S. (2012). Punjabi speech to text system for connected words. In Fourth International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom2012) (pp. 206–209). Institution of Engineering and Technology. https://doi.org/10.1049/cp.2012.2528.
Ferreiros, J., & Pardo, J. M. (1999). Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations. Speech Communication, 29(1), 65–76. https://doi.org/10.1016/S0167-6393(99)00013-8. CrossRef
Hasnat, M. A., Mowla, J., & Khan, M. (n.d.). Isolated and continuous bangla speech recognition: implementation, performance and application perspective. Retrieved January 3, 2018 from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.372&rep=rep1&type=pdf.
History of Punjabi Language & Gurmukhi Alphabet | Trumbull, CT Patch. (n.d.). Retrieved January 4, 2018 from https://patch.com/connecticut/trumbull/history-of-punjabi-language--gurmukhi-alphabet.
Huang, X. D., Hwang, M.-Y., Li, J., & Mahajan, M. (n.d.). Deleted interpolation and density sharing for continuous hidden Markov models. In 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings (Vol. 2, pp. 885–888). IEEE. https://doi.org/10.1109/ICASSP.1996.543263.
Huang, X. D., & Jack, M. A. (1988). Hidden Markov modelling of speech based on a semicontinuous model. Electronics Letters, 24(1), 6–7. CrossRef
Huang, X. D., & Jack, M. A. (1990). Semi-continuous hidden Markov models for speech signals. Readings in speech recognition. San Francisco: Morgan Kaufmann Publishers Inc. Retrieved January 4, 2018 from https://dl.acm.org/citation.cfm?id=108259.
Lučić, B., Ostrogonac, S., Vujnović Sedlar, N., & Sečujski, M. (2015). Educational applications for blind and partially sighted pupils based on speech technologies for Serbian. The Scientific World Journal. 2015. https://doi.org/10.1155/2015/839252.
Nkosi, M., Manamela, M., & Gasela, N. (n.d.). Creating a pronunciation dictionary for automatic speech recognition -a morphological approach. Retrieved January 3, 2018 from http://www.satnac.org.za/proceedings/2011/papers/Network_Services/176.pdf.
Patel, H. N., & Virparia, P. V. (2011). A Small Vocabulary Speech Recognition for Gujarati. International Journal of Advanced Research in Computer Science, 2(1), 208–210.
Persian Influence on Punjabi (Shahmukhi and Gurumukhi) Language | Universal Urdu Post. (n.d.). Retrieved March 16, 2018 from http://universalurdupost.com/english-articles/12-01-2016/33581.
Pronunciation guide for English and Academic English Dictionaries at OxfordLearnersDictionaries.com. (n.d.). Retrieved March 16, 2018 from https://www.oxfordlearnersdictionaries.com/about/pronunciation_english.html.
Punjabi/Phonetics - Wikibooks, open books for an open world. (n.d.). Retrieved March 16, 2018 from https://en.wikibooks.org/wiki/Punjabi/Phonetics.
Radeck-Arneth, S., Milde, B., Lange, A., Gouvêa, E., Radomski, S., Mühlhäuser, M., & Biemann, C. (2015). Open source german distant speech recognition: corpus and acoustic model (pp. 480–488). Cham: Springer. https://doi.org/10.1007/978-3-319-24033-6_54.
Ruan, S., Wobbrock, J. O., Liou, K., Ng, A., & Landay, J. (2016). Speech is 3 × faster than typing for english and mandarin text entry on mobile devices. Retrieved January 3, 2018 from http://arxiv.org/abs/1608.07323.
Shackle, C. (n.d.). Punjabi language | Britannica.com. Retrieved March 16, 2018 from https://www.britannica.com/topic/Punjabi-language.
Smart Voice Recorder for Android - Download. (n.d.). Retrieved January 4, 2018 from https://smart-voice-recorder.en.softonic.com/android.
The World Factbook — Central Intelligence Agency. (n.d.). Retrieved March 16, 2018 from https://www.cia.gov/library/publications/the-worldfactbook/fields/2098.html.
Training an acoustic model for CMUSphinx – CMUSphinx Open Source Speech Recognition. (n.d.). Retrieved March 16, 2018 from https://cmusphinx.github.io/wiki/tutorialam/.
Walha, R., Drira, F., El-Abed, H., and A. M. A (2012). On developing an automatic speech recognition system for standard arabic language. International Journal of Electrical and Computer Engineering, 6(10), 1138–1143.
Why your smartphone won’t be your next PC | Digital Trends. (n.d.). Retrieved January 4, 2018 from https://www.digitaltrends.com/computing/why-your-smartphone-wont-be-your-next-pc/.
Yang, H., Oehlke, C., & Meinel, C. (2011). German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings. In 2011 10th IEEE/ ACIS International Conference on Computer and Information Science (pp. 201–206). IEEE. https://doi.org/10.1109/ICIS.2011.38.
- Development and analysis of Punjabi ASR system for mobile phones under different acoustic models
- Springer US
International Journal of Speech Technology
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
Neuer Inhalt/© Filograph | Getty Images | iStock