skip to main content
note

Bigram Language Models and Reevaluation Strategy for Improved Recognition of Online Handwritten Tamil Words

Published:20 April 2015Publication History
Skip Abstract Section

Abstract

This article describes a postprocessing strategy for online, handwritten, isolated Tamil words. Contributions have been made with regard to two issues hardly addressed in the online Indic word recognition literature, namely, use of (1) language models exploiting the idiosyncrasies of Indic scripts and (2) expert classifiers for the disambiguation of confused symbols.

The input word is first segmented into its individual symbols, which are recognized using a primary support vector machine (SVM) classifier. Thereafter, we enhance the recognition accuracy by utilizing (i) a bigram language model at the symbol or character level and (ii) expert classifiers for reevaluating and disambiguating the different sets of confused symbols. The symbol-level bigram model is used in a traditional Viterbi framework. The concept of a character comprising multiple symbols is unique to Dravidian languages such as Tamil. This multi-symbol feature of Tamil characters has been exploited in proposing a novel, prefix-tree-based character-level bigram model that does not use Viterbi search; rather it reduces the search space for each input symbol based on its left context.

For disambiguating confused symbols, a dynamic time-warping approach is proposed to automatically identify the parts of the online trace that discriminates between the confused classes. Fine classification of these regions by dedicated expert SVMs reduces the extent of confusions between such symbols. The integration of segmentation, prefix-tree-based language model and disambiguation of confused symbols is presented on a set of 15,000 handwritten isolated online Tamil words. Our results show recognition accuracies of 93.0% and 81.6% at the symbol and word level, respectively, as compared to the baseline classifier performance of 88.4% and 65.1%, respectively.

References

  1. Aparna, K. G. and Ramakrishnan, A. G. 2002. A complete Tamil optical character recognition system. In Proceedings of the International Conference on Workshop on Document Analysis Systems. 53--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aparna, K. H., Subramanian, V., Kasirajan, M., Prakash, G. V., Chakravarthy, V. S., and Madhvanath, S. 2004. Online handwriting recognition for Tamil. In Proceedings of the International Workshop of Frontiers in Handwriting Recognition. 438--443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bharath, A. and Madhvanath, S. 2007. Hidden Markov models for online handwritten Tamil word recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 506--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bharath, A. and Madhvanath, S. 2009. Online handwriting recognition for Indic scripts. In Guide to OCR for Indic Scripts. Advances in Pattern Recognition, Springer-Verlag, London, 209--234.Google ScholarGoogle Scholar
  5. Bharath, A. and Madhvanath, S. 2012. HMM-based lexicon-driven and lexicon-free word recognition for online handwritten Indic scripts. IEEE Trans. Pattern Anal. Machine Intell. 34, 4, 670--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chang, C. C. and Lin, C. J. 2011. Lib-SVM : A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Deepu, V., Madhvanath, S., and Ramakrishnan, A. G. 2004. Principal component analysis for online handwritten character recognition. In Proceedings of the International Conference on Pattern Recognition. 327--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Guyon, I., Schomaker, L., Plamondon, R., Liberman, M., and Janet, S. 1994. Unipen project of on-line data exchange and recognizer benchmarks. In Proceedings of the International Conference on Pattern Recognition. 29--33.Google ScholarGoogle Scholar
  9. Huang, X., Reddy, R., and Acero, A. 2001. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Pearson Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joshi, N., Sita, G., Ramakrishnan, A. G., and Madhavanath, S. 2004. Comparison of elastic matching algorithms for online Tamil handwritten character recognition. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 444--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kiran, S., Prasad, K. S., Kunwar, R., and Ramakrishnan, A. G. 2010. Comparison of HMM and SDTW for Tamil handwritten character recognition. In Proceedings of the International Conference on Signal Processing and Communications (SPCOM’10). 1--4.Google ScholarGoogle Scholar
  12. Leung, K. C. and Leung, C. H. 2010. Recognition of handwritten Chinese characters by critical region analysis. Pattern Recog. 43, 3, 949--961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Li, Y. X. and Tan, C. L. 2004a. An empirical study of statistical language models for contextual post-processing of Chinese script recognition. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 257--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Li, Y. X. and Tan, C. L. 2004b. Influence of language models and candidate set size on contextual post-processing of Chinese script recognition. In Proceedings of the International Conference on Pattern Recognition. 537--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Madhvanath, S. and Lucas, S. M. 2006. IWFHR 2006 online Tamil handwritten character recognition competition. In Proceedings of the International Conference on Frontiers in Handwriting Recognition.Google ScholarGoogle Scholar
  16. Marti, U. V. and Bunke, H. 2000. Unconstrained handwriting recognition: Language models, perplexity and system performance. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 463--468.Google ScholarGoogle Scholar
  17. Nethravathi, B., Archana, C. P., Shashikiran, K., Ramakrishnan, A. G., and Kumar, V. 2010. Creation of a huge annotated database for Tamil and Kannada OHR. In Proceedings of the International Conference on Frontiers in Handwriting Recognition. 415--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Niels, R. and Vuurpijl, L. 2005. Dynamic time warping applied to Tamil character recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 730--734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Perraud, F., Viard-Gaudin, C., Morin, E., and Lallican, P. M. 2003. N-gram and n-class models for on line handwriting recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 1053--1059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Prasanth, L., Babu, J., Sharma, R., Rao, P., and Dinesh, M. 2007. Elastic matching of online handwritten Tamil and Telugu scripts using local features. In Proceedings of the International Conference on Document Analysis and Recognition. 1028--1032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Quiniou, S. and Anquetil, E. 2006. A priori and a posteriori integration and combination of language models in an on-line handwritten sentence recognition system. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 403--408.Google ScholarGoogle Scholar
  22. Quiniou, S., Anquetil, E., and Carbonnel, S. 2005. Statistical language models for on-line handwritten sentence recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 516--520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rabiner, L. and Juang, B. 1986. An introduction to hidden Markov models. IEEE ASSP Mag. 3, 1, 4--16.Google ScholarGoogle ScholarCross RefCross Ref
  24. Raghavendra, B. S., Narayanan, C. K., Sita, G., Ramakrishnan, A. G., and Sriganesh, M. 2005. Prototype learning methods for online handwriting recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 287--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rahman, A. F. R. and Fairhurst, M. C. 1997. Selective partition algorithm for finding regions of maximum pairwise dissimilarity among statistical class models. Pattern Recog. Lett. 18, 7, 605--611. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sundaram, S. and Ramakrishnan, A. G. 2008. Two dimensional principal component analysis for online Tamil character recognition. In Proceedings of the International Conference on Frontiers in Handwriting Recognition. 88--93.Google ScholarGoogle Scholar
  27. Sundaram, S. and Ramakrishnan, A. G. 2009. An improved online Tamil character recognition engine using post-processing methods. In Proceedings of the International Conference on Document Analysis and Recognition. 1216--1220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sundaram, S. and Ramakrishnan, A. G. 2010. Attention feedback based robust segmentation of online handwritten words. Indian Patent Office Reference No. 03974/CHE.Google ScholarGoogle Scholar
  29. Sundaram, S. and Ramakrishnan, A. G. 2013. Attention-feedback based robust segmentation of online handwritten isolated Tamil words. ACM Trans. Asian Lang. Inform. Process. 12, 1, 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sundaresan, C. S. and Keerthi, S. S. 1999. A study of representations for pen based handwriting recognition of Tamil characters. In Proceedings of the International Conference on Document Analysis and Recognition. 422--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Swethalakshmi, H., Sekhar, C. C., and Chakravarthy, V. S. 2007. Spatiostructural features for recognition of online handwritten characters in Devanagari and Tamil scripts. In Proceedings of the International Conference on Artificial Neural Networks. 230--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Toselli, A. H., Pastor, M., and Vidal, E. 2007. On-line handwriting recognition system for Tamil handwritten characters. In Proceedings of Pattern Recognition and Image Analysis. 370--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vinciarelli, A., Bengio, S., and Bunke, H. 2004. Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Machine Intell. 26, 6, 709--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Vuurpijl, L., Schomaker, L., and Erp, M. V. 2003. Architectures for detecting and solving conflicts: Two-stage classification and support vector classifiers. Int. J. Document Anal. Recog. 5, 4, 213--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Xu, B., Huang, K., and Liu, C.-L. 2010. Similar handwritten Chinese characters recognition by critical region selection based on average symmetric uncertainty. In Proceedings of the International Conference on Frontiers in Handwriting Recognition. 527--532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zimmermann, M. and Bunke, H. 2004. Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition. In Proceedings of the International Conference on Pattern Recognition. 203--208. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bigram Language Models and Reevaluation Strategy for Improved Recognition of Online Handwritten Tamil Words

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 14, Issue 2
        March 2015
        96 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/2764912
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 April 2015
        • Accepted: 1 September 2014
        • Revised: 1 July 2014
        • Received: 1 October 2012
        Published in tallip Volume 14, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • note
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader