Abstract
This article describes a postprocessing strategy for online, handwritten, isolated Tamil words. Contributions have been made with regard to two issues hardly addressed in the online Indic word recognition literature, namely, use of (1) language models exploiting the idiosyncrasies of Indic scripts and (2) expert classifiers for the disambiguation of confused symbols.
The input word is first segmented into its individual symbols, which are recognized using a primary support vector machine (SVM) classifier. Thereafter, we enhance the recognition accuracy by utilizing (i) a bigram language model at the symbol or character level and (ii) expert classifiers for reevaluating and disambiguating the different sets of confused symbols. The symbol-level bigram model is used in a traditional Viterbi framework. The concept of a character comprising multiple symbols is unique to Dravidian languages such as Tamil. This multi-symbol feature of Tamil characters has been exploited in proposing a novel, prefix-tree-based character-level bigram model that does not use Viterbi search; rather it reduces the search space for each input symbol based on its left context.
For disambiguating confused symbols, a dynamic time-warping approach is proposed to automatically identify the parts of the online trace that discriminates between the confused classes. Fine classification of these regions by dedicated expert SVMs reduces the extent of confusions between such symbols. The integration of segmentation, prefix-tree-based language model and disambiguation of confused symbols is presented on a set of 15,000 handwritten isolated online Tamil words. Our results show recognition accuracies of 93.0% and 81.6% at the symbol and word level, respectively, as compared to the baseline classifier performance of 88.4% and 65.1%, respectively.
- Aparna, K. G. and Ramakrishnan, A. G. 2002. A complete Tamil optical character recognition system. In Proceedings of the International Conference on Workshop on Document Analysis Systems. 53--57. Google ScholarDigital Library
- Aparna, K. H., Subramanian, V., Kasirajan, M., Prakash, G. V., Chakravarthy, V. S., and Madhvanath, S. 2004. Online handwriting recognition for Tamil. In Proceedings of the International Workshop of Frontiers in Handwriting Recognition. 438--443. Google ScholarDigital Library
- Bharath, A. and Madhvanath, S. 2007. Hidden Markov models for online handwritten Tamil word recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 506--510. Google ScholarDigital Library
- Bharath, A. and Madhvanath, S. 2009. Online handwriting recognition for Indic scripts. In Guide to OCR for Indic Scripts. Advances in Pattern Recognition, Springer-Verlag, London, 209--234.Google Scholar
- Bharath, A. and Madhvanath, S. 2012. HMM-based lexicon-driven and lexicon-free word recognition for online handwritten Indic scripts. IEEE Trans. Pattern Anal. Machine Intell. 34, 4, 670--682. Google ScholarDigital Library
- Chang, C. C. and Lin, C. J. 2011. Lib-SVM : A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1--39. Google ScholarDigital Library
- Deepu, V., Madhvanath, S., and Ramakrishnan, A. G. 2004. Principal component analysis for online handwritten character recognition. In Proceedings of the International Conference on Pattern Recognition. 327--330. Google ScholarDigital Library
- Guyon, I., Schomaker, L., Plamondon, R., Liberman, M., and Janet, S. 1994. Unipen project of on-line data exchange and recognizer benchmarks. In Proceedings of the International Conference on Pattern Recognition. 29--33.Google Scholar
- Huang, X., Reddy, R., and Acero, A. 2001. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Pearson Hall. Google ScholarDigital Library
- Joshi, N., Sita, G., Ramakrishnan, A. G., and Madhavanath, S. 2004. Comparison of elastic matching algorithms for online Tamil handwritten character recognition. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 444--449. Google ScholarDigital Library
- Kiran, S., Prasad, K. S., Kunwar, R., and Ramakrishnan, A. G. 2010. Comparison of HMM and SDTW for Tamil handwritten character recognition. In Proceedings of the International Conference on Signal Processing and Communications (SPCOM’10). 1--4.Google Scholar
- Leung, K. C. and Leung, C. H. 2010. Recognition of handwritten Chinese characters by critical region analysis. Pattern Recog. 43, 3, 949--961. Google ScholarDigital Library
- Li, Y. X. and Tan, C. L. 2004a. An empirical study of statistical language models for contextual post-processing of Chinese script recognition. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 257--262. Google ScholarDigital Library
- Li, Y. X. and Tan, C. L. 2004b. Influence of language models and candidate set size on contextual post-processing of Chinese script recognition. In Proceedings of the International Conference on Pattern Recognition. 537--540. Google ScholarDigital Library
- Madhvanath, S. and Lucas, S. M. 2006. IWFHR 2006 online Tamil handwritten character recognition competition. In Proceedings of the International Conference on Frontiers in Handwriting Recognition.Google Scholar
- Marti, U. V. and Bunke, H. 2000. Unconstrained handwriting recognition: Language models, perplexity and system performance. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 463--468.Google Scholar
- Nethravathi, B., Archana, C. P., Shashikiran, K., Ramakrishnan, A. G., and Kumar, V. 2010. Creation of a huge annotated database for Tamil and Kannada OHR. In Proceedings of the International Conference on Frontiers in Handwriting Recognition. 415--420. Google ScholarDigital Library
- Niels, R. and Vuurpijl, L. 2005. Dynamic time warping applied to Tamil character recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 730--734. Google ScholarDigital Library
- Perraud, F., Viard-Gaudin, C., Morin, E., and Lallican, P. M. 2003. N-gram and n-class models for on line handwriting recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 1053--1059. Google ScholarDigital Library
- Prasanth, L., Babu, J., Sharma, R., Rao, P., and Dinesh, M. 2007. Elastic matching of online handwritten Tamil and Telugu scripts using local features. In Proceedings of the International Conference on Document Analysis and Recognition. 1028--1032. Google ScholarDigital Library
- Quiniou, S. and Anquetil, E. 2006. A priori and a posteriori integration and combination of language models in an on-line handwritten sentence recognition system. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 403--408.Google Scholar
- Quiniou, S., Anquetil, E., and Carbonnel, S. 2005. Statistical language models for on-line handwritten sentence recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 516--520. Google ScholarDigital Library
- Rabiner, L. and Juang, B. 1986. An introduction to hidden Markov models. IEEE ASSP Mag. 3, 1, 4--16.Google ScholarCross Ref
- Raghavendra, B. S., Narayanan, C. K., Sita, G., Ramakrishnan, A. G., and Sriganesh, M. 2005. Prototype learning methods for online handwriting recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 287--291. Google ScholarDigital Library
- Rahman, A. F. R. and Fairhurst, M. C. 1997. Selective partition algorithm for finding regions of maximum pairwise dissimilarity among statistical class models. Pattern Recog. Lett. 18, 7, 605--611. Google ScholarDigital Library
- Sundaram, S. and Ramakrishnan, A. G. 2008. Two dimensional principal component analysis for online Tamil character recognition. In Proceedings of the International Conference on Frontiers in Handwriting Recognition. 88--93.Google Scholar
- Sundaram, S. and Ramakrishnan, A. G. 2009. An improved online Tamil character recognition engine using post-processing methods. In Proceedings of the International Conference on Document Analysis and Recognition. 1216--1220. Google ScholarDigital Library
- Sundaram, S. and Ramakrishnan, A. G. 2010. Attention feedback based robust segmentation of online handwritten words. Indian Patent Office Reference No. 03974/CHE.Google Scholar
- Sundaram, S. and Ramakrishnan, A. G. 2013. Attention-feedback based robust segmentation of online handwritten isolated Tamil words. ACM Trans. Asian Lang. Inform. Process. 12, 1, 1--25. Google ScholarDigital Library
- Sundaresan, C. S. and Keerthi, S. S. 1999. A study of representations for pen based handwriting recognition of Tamil characters. In Proceedings of the International Conference on Document Analysis and Recognition. 422--425. Google ScholarDigital Library
- Swethalakshmi, H., Sekhar, C. C., and Chakravarthy, V. S. 2007. Spatiostructural features for recognition of online handwritten characters in Devanagari and Tamil scripts. In Proceedings of the International Conference on Artificial Neural Networks. 230--239. Google ScholarDigital Library
- Toselli, A. H., Pastor, M., and Vidal, E. 2007. On-line handwriting recognition system for Tamil handwritten characters. In Proceedings of Pattern Recognition and Image Analysis. 370--377. Google ScholarDigital Library
- Vinciarelli, A., Bengio, S., and Bunke, H. 2004. Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Machine Intell. 26, 6, 709--720. Google ScholarDigital Library
- Vuurpijl, L., Schomaker, L., and Erp, M. V. 2003. Architectures for detecting and solving conflicts: Two-stage classification and support vector classifiers. Int. J. Document Anal. Recog. 5, 4, 213--223. Google ScholarDigital Library
- Xu, B., Huang, K., and Liu, C.-L. 2010. Similar handwritten Chinese characters recognition by critical region selection based on average symmetric uncertainty. In Proceedings of the International Conference on Frontiers in Handwriting Recognition. 527--532. Google ScholarDigital Library
- Zimmermann, M. and Bunke, H. 2004. Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition. In Proceedings of the International Conference on Pattern Recognition. 203--208. Google ScholarDigital Library
Index Terms
- Bigram Language Models and Reevaluation Strategy for Improved Recognition of Online Handwritten Tamil Words
Recommendations
Attention-Feedback Based Robust Segmentation of Online Handwritten Isolated Tamil Words
In this article, we propose a lexicon-free, script-dependent approach to segment online handwritten isolated Tamil words into its constituent symbols. Our proposed segmentation strategy comprises two modules, namely the (1) Dominant Overlap Criterion ...
Language models for online handwritten Tamil word recognition
DAR '12: Proceeding of the workshop on Document Analysis and RecognitionN-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on ...
Performance enhancement of online handwritten Tamil symbol recognition with reevaluation techniques
In this article, we aim at reducing the error rate of the online Tamil symbol recognition system by employing multiple experts to reevaluate certain decisions of the primary support vector machine classifier. Motivated by the relatively high percentage ...
Comments