ABSTRACT
The accent with which words are spoken can have a strong effect on the performance of a speech recognition system. In a multilingual country such as South Africa where English is not the first language of most citizens, the need to address this issue is critical when building speech-based systems. In this project we trained two sets of hidden Markov Models for isolated word English speech. The first set of models was trained with native English speakers and the second set was trained with non-native speakers from a representative sample of major South African accent groups. We compared the recognition accuracies of the two sets of models and found that the models trained with accented English performed better. This preliminary research indicates that there is merit to committing resources to the task of accented training.
- Atal, B. 1995. Speech Technology in 2001: New Research Directions. In Proceedings of the National Academy of Sciences of the United States of America. 92, 22, 10046--10051.Google ScholarCross Ref
- Bahl, L., Jelinek, F., and Mercer, R. 1983. A Maximum Likelihood Approach to Continuous Speech Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-5, 2, 179--190.Google ScholarDigital Library
- Cai, J., Bouselmi, G., Laprie, Y., and Haton J. 2009. Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition. Computer Speech and Language. 23, 147--164. Google ScholarDigital Library
- Chen, J., and Jang, J. 2008. TRUES: Tone recognition using extended segments. ACM Trans. Asian Lang. Inform. Process. 7, 3, Article 10 (August). Google ScholarDigital Library
- Durling, S., and Lumsden, J. 2008. Speech Recognition use in Healthcare Applications. In Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia (Linz Austria, 2008). 473--478. Google ScholarDigital Library
- Govender, N., Barnard, E., and Davel, M. 2007. Pitch Modelling for the Nguni Languages. South African Computer Journal. 38, 28--39.Google Scholar
- Huang, X. 1992. Minimizing Speaker Variation Effects for Speaker-Independent Speech Recognition. In Proceeding of the Workshop on Speech and Natural Language (Harriman New York, 1992). 191--196. Google ScholarDigital Library
- Jeong, M., and Lee, G. 2008. Improving Speech Recognition and Understanding Using Error-Corrective Re-ranking. ACM Trans. Asian Lang. Inform. Process. 7, 1, Article 2 (February). Google ScholarDigital Library
- Koumpis, K., and Renals, S. 2005. Automatic Summarization of Voicemail Messages Using Lexical and Prosodic Features. ACM Transactions on Speech and Language Processing. 2, 1, Article 1 (February). Google ScholarDigital Library
- Lee, T., Lau, W., Wong, Y., and Ching, P. 2002. Using Tone Information in Cantonese Continuous Speech Recognition. ACM Trans. Asian Lang. Inform. Process. 1, 83--102. Google ScholarDigital Library
- Levinson, S. 1995. Speech Recognition Technology: A Critique. In Proceedings of the National Academy of Sciences of the United States of America. 92, 22, 9953--9955.Google ScholarCross Ref
- Lippman, R. 1997. Speech Recognition by Machines and Humans. Speech Communication. 22, 1--15. Google ScholarDigital Library
- Markhoul, J., and Schwartz, R. 1995. State of the Art in Continuous Speech Recognition. In Proceedings of the National Academy of Sciences of the United States of America. 92, 22, 9956--9963.Google ScholarCross Ref
- Morales, N., Toledano, D., Hansen, J., and Garrido, J. 2009. Feature Compensation Techniques for ASR on Band-Limited Speech. IEEE Transaction on Audio, Speech and Language Processing. 17, 4, 758--774. Google ScholarDigital Library
- Mosur, R. 1996. Efficient Algorithms for Speech Recognition. PhD thesis, Carnegie Mellon University, May 1996. CMU-CS-96-143.Google Scholar
- Rabiner, L. 1989. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. In Proceedings of the IEEE. 77, 2, 257--286.Google ScholarCross Ref
- Rabiner, L., and Juang, B. 1993. Fundamentals of Speech Recognition. Prentice Hall. Google ScholarDigital Library
- Roux, J., Botha, E., and du Preez, J. 2000. Developing a Multilingual Telephone Based Information System in African Languages. Second International Language Resources and Evaluation Conference. (Athens Greece, 2000).Google Scholar
- Smit, W., and Barnard, E. 2009. Continuous Speech Recognition with Sparse Coding. Computer Speech and Language. 23, 200--219. Google ScholarDigital Library
- Spencer, A. 1996. Phonology: Theory and Description. Blackwell Publishers: Great Britain.Google Scholar
- Van der Merwe, I., Van der Merwe J. 2006. Linguistic Atlas of South Africa: Language in Space and Time. Sun Press: StellenboschGoogle Scholar
- Xie, H., Andreae, P., Zhang, M., and Warren, P. 2004. Learning Models for English Speech Recognition. In Proceedings of Conferences in Research and Practice in Information Technology (Dunedin New Zealand, 2004). 26, 323--329. Google ScholarDigital Library
- Young, S., Evermann, G., Gales, M., Hain, T., et al. 2009. The HTK Book. Cambridge University Engineering Department: CambridgeGoogle Scholar
- Zerbian, S and Barnard, E. 2008. Phonetics of Intonation in South African Bantu languages. Southern African Linguistics and Applied Language Studies. 26(2), 235--254.Google ScholarCross Ref
Index Terms
- The impact of accents on automatic recognition of South African English speech: a preliminary investigation
Recommendations
Phoneme and tonal accent recognition for Thai speech
Highlights► Phoneme recognition with soft phoneme segmentation procedure for Thai speech. ► Recognition system classifies phonemes using discrete hidden Markov models. ► MFPLP is better than MFCC as features in phoneme ...
AbstractIn this paper, we investigate the application of a phoneme recognition system with a soft phoneme segmentation procedure for Thai speech. In addition, we propose a new method to classify the tonal accent of a syllable. The recognition ...
Using prosody to improve automatic speech recognition
In this paper acoustic processing and modelling of the supra-segmental characteristics of speech is addressed, with the aim of incorporating advanced syntactic and semantic level processing of spoken language for speech recognition/understanding tasks. ...
Cued Speech automatic recognition in normal-hearing and deaf subjects
This article discusses the automatic recognition of Cued Speech in French based on hidden Markov models (HMMs). Cued Speech is a visual mode which, by using hand shapes in different positions and in combination with lip patterns of speech, makes all the ...
Comments