Weitere Artikel dieser Ausgabe durch Wischen aufrufen
Automatic speech recognition (ASR) is a computerized interface which allows humans to communicate with machine in a way of its natural conversation. ASR has wide range of applications in various fields such as language development in young children, telecommunications, as an assistive device for hearing impaired etc. Performance of ASR system is greatly influenced by the database used for its implementation. In this paper, we are discussing about building a speech corpus for a rare but important Indian dialect Chhattisgarhi. This speech corpus consists of 100 unique isolated words and four speech scripts aggregating 67 sentences, recorded from total 478 native speakers. These words were selected from English to Chhattisgarhi dictionary published by Chhattisgarh Rajbhasha Aayog and scripts from Chhattisgarhi literature and newspaper articles. This dataset has been collected travelling over 60% geographical area of the Chhattisgarh state. Finally, a valuable speech corpus for the first time have been prepared for Chhattisgarhi with an aim to enhance the speech research. The successful extermination of speech recognition for both isolated and continuous speech samples have been demonstrated on the prepared database.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Anumanchipalli, G., Chitturi, R., Joshi, S., Kumar, R., Singh, S. P., Sitaram, R., & Kishore, S. (2008). Development of Indian Language Speech Databases for Large Vocabulary Speech Recognition Systems. In Proceedings of International Conference on Speech and Computer (SPECOM), (July). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.132.7224&rep=rep1&type=pdf.
Colin, P. M. (1991). The Indo-Aryan Languages. Melbourne: Cambridge University Press.
Danubianu, M., Popa, V., & Tobolcea, I. (2012). Unsupervised information-based feature selection for speech therapy optimization by data mining techniques. In Proceedings of the Seventh International Multi-Conference on Computing in the Global Information Technology ICCGI, pp. 29–39.
Doddington, G. R., Przybocki, M. A., Martin, A. F., & Reynolds, D. A. (2000). The NIST speaker recognition evaluation overview: Methodology systems, results, perspective. Speech Communications, 31, 225–254. CrossRef
Gish, H., & Schmidt, M. (1994). Text-independent speaker identification. IEEE Signal Processing Magazine, 11(4), 18–32 CrossRef
Hasnat, M., Mowla, J., & Khan, M. (2007). Isolated and continuous Bangla speech recognition: implementation, performance and application perspective.
Herms, R., Seelig, L., Münch, S., & Eibl, M. (2016). A corpus of read and spontaneous upper saxon german speech for ASR evaluation. In LREC.
Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification.
Jain, A. K., Mao, J., & Mohiuddin, K. M. (1996). Artificial neural networks: A tutorial. Computer, 29(3), 31–44. CrossRef
Junqua, J. C., Fincke, S. C., & Field, K. (1999). The Lombard effect: A reflex to better communicate with other in noise. In Proceedings of international conference on acoustics, speech, and signal processing, ICASSP’ 99 (Vol. 4, pp. 2083–2086).
Kersta, L. G. (1996). Voiceprint classification, part 2. Journal of theAcoustical Society of America (A), 37, 1217. CrossRef
Kurian, C. (2015). A review on speech corpus development for automatic speech recognition in Indian languages. International Journal of Advanced Networking and Applications. 06(06), 2556–2558, ISSN: 0975-0290.
Li, K. P., Dammann, J. E., & Chapman, W. D. (1966). Experimental studies in speaker verification using an adaptive system. Journal of the Acoustical Society of America, 40(5), 966–978. CrossRef
Linguistic Data Consortium. (1992). http://www.ldc.upenn.edu/.
Linguistic Data Consortium for Indian Languages. (2008). http://www.ldcil.org.
Lombard, E. (1911). Le signe de l’élévation de la voix. Annales de Maladies Oreille, Larynx, Nez, Pharynx, 37, pp. 101–119.
Londhe, N. D., Ahirwal, M. K., & Lodha, P. (2016). Machine learning paradigms for speech recognition of an Indian dialect. In International Conference on Communication and Signal Processing, ICCSP 2016, pp. 780–786. https://doi.org/10.1109/ICCSP.2016.7754251.
Magdum, D., Dubey, M. S., Patil, T., Shah, R., Belhe, S., & Kulkarni, M. (2015). Methodology for designing and creating Hindi speech corpus. In International Conference on Signal Processing and Communication Engineering Systems— Proceedings of SPACES 2015, in Association with IEEE, pp. 336–339. https://doi.org/10.1109/SPACES.2015.7058279.
Møller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6(4), 525–533. CrossRef
Norton, R. (2002). The evolving biometric marketplace to 2006. Biometric Technology Today, 10(9), 7–8. CrossRef
Pathak, R., & Dewangan, S. (2014). Natural Language Chhattisgarhi: A literature survey. International Journal of Engineering Trends and Technology (IJETT), 12(2), pp. 113–117.
Radha, V. (2012). Speaker independent isolated speech recognition system for Tamil language using HMM. Procedia Engineering, 30, 1097–1102. CrossRef
Rahman, M. M., & Bhuiyan, M. A. A. (2012). Continuous Bangla speech segmentation using short-term speech features extraction approaches. International Journal of Advanced Computer Sciences and Applications, 3(11), 131–138.
Reynolds, D. (1995). Speaker Identification and Verification using Gaussian Mixture Speaker Models. Speech Communication. 17, 91–108. CrossRef
Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2, 639–643. CrossRef
Reynolds, D. A. (1996). The effects of handset variability on speaker recognition performance: Experiment on the Switchboard corpus. In Proceedings of international conference on acoustics, speech, and signal processing, ICASSP’ 96, pp. 113–116.
Reynolds, D. A. (1997). HTIMIT and LLHDB: Speech corpora for the study of handset transducer effects. In Proceedings of international conference on acoustics, speech, and signal processing, ICASSP’ 97, pp. 1535–1538, Munich, Germany.
Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, ICASSP’ 02, Vol. IV, pp. 4072–4075.
Sambur, M. R. (1975). Selection of acoustic features for speaker identification. IEEE Transactions on Acoustics, Speech and Signal Processing, 23, 176–182. CrossRef
Sarika, H., Achary, K. K., & Shetty, S. (2012). Isolated word recognition for Kannada language using support vector machine. Wireless Networks and Computational Intelligence, Berlin: Springer, pp. 262–269.
Schwartz, R., Chow, Y., Roucos, S., Krasner, M., & Makhoul, J. (1984). Improved hidden Markov modeling of phonemes for continuous speech recognition. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’84. (Vol. 9, pp. 21–24).
Su, L. B., Li, K. P., & Fu, K. S. (1974). Identification of speakers by use of nasal co-articulation. Journal of the Acoustical Society of America, 56, 1876–1882. CrossRef
Subodh, K. (2002). The Indian Encyclopaedia: La Behmen-Maheya. New Delhi: Cosmo Publications. p. 4220, ISBN 978-81-7755-271-3.
Verma, N. D. (2009). Chattīsagaṛhī bhāshā kā udvikāsa. Chhattisgarh Rajya Hindi Granth Academy.
Vimal Krishnan, V. R., Jayakumar, A., & Babu Anto, P. (2008). Speech recognition of isolated Malayalam words using wavelet features and Artificial Neural Network. In Proceedings—4th IEEE International Symposium on Electronic Design, Test and Applications, DELTA 2008, pp. 240–243. https://doi.org/10.1109/DELTA.2008.88.
Vimala, C. (2012). A review on speech recognition challenges and approaches. World of Computer Science and Information Technology Journal, 2(1), 2221–2741.
Wolf, J. J. (1972). Efficient acoustic parameters for speaker recognition. Journal of the Acoustical Society of America, 51, 2030–2043. CrossRef
Zograph, G. A. (1960). Languages of South Asia, 1960 ( translated by G.L. Campbell, 1982). London: Routledge.
- Chhattisgarhi speech corpus for research and development in automatic speech recognition
Narendra D. Londhe
Ghanahshyam B. Kshirsagar
- Springer US