note

Bigram Language Models and Reevaluation Strategy for Improved Recognition of Online Handwritten Tamil Words

Authors:
Suresh Sundaram

Indian Institute of Technology, Guwahati

Indian Institute of Technology, Guwahati
View Profile

,
A. G. Ramakrishnan

Indian Institute of Science

Indian Institute of Science
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 14 Issue 2Article No.: 8pp 1–28https://doi.org/10.1145/2671014

Published:20 April 2015Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

This article describes a postprocessing strategy for online, handwritten, isolated Tamil words. Contributions have been made with regard to two issues hardly addressed in the online Indic word recognition literature, namely, use of (1) language models exploiting the idiosyncrasies of Indic scripts and (2) expert classifiers for the disambiguation of confused symbols.

The input word is first segmented into its individual symbols, which are recognized using a primary support vector machine (SVM) classifier. Thereafter, we enhance the recognition accuracy by utilizing (i) a bigram language model at the symbol or character level and (ii) expert classifiers for reevaluating and disambiguating the different sets of confused symbols. The symbol-level bigram model is used in a traditional Viterbi framework. The concept of a character comprising multiple symbols is unique to Dravidian languages such as Tamil. This multi-symbol feature of Tamil characters has been exploited in proposing a novel, prefix-tree-based character-level bigram model that does not use Viterbi search; rather it reduces the search space for each input symbol based on its left context.

For disambiguating confused symbols, a dynamic time-warping approach is proposed to automatically identify the parts of the online trace that discriminates between the confused classes. Fine classification of these regions by dedicated expert SVMs reduces the extent of confusions between such symbols. The integration of segmentation, prefix-tree-based language model and disambiguation of confused symbols is presented on a set of 15,000 handwritten isolated online Tamil words. Our results show recognition accuracies of 93.0% and 81.6% at the symbol and word level, respectively, as compared to the baseline classifier performance of 88.4% and 65.1%, respectively.

References

Aparna, K. G. and Ramakrishnan, A. G. 2002. A complete Tamil optical character recognition system. In Proceedings of the International Conference on Workshop on Document Analysis Systems. 53--57. Google ScholarDigital Library
Aparna, K. H., Subramanian, V., Kasirajan, M., Prakash, G. V., Chakravarthy, V. S., and Madhvanath, S. 2004. Online handwriting recognition for Tamil. In Proceedings of the International Workshop of Frontiers in Handwriting Recognition. 438--443. Google ScholarDigital Library
Bharath, A. and Madhvanath, S. 2007. Hidden Markov models for online handwritten Tamil word recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 506--510. Google ScholarDigital Library
Bharath, A. and Madhvanath, S. 2009. Online handwriting recognition for Indic scripts. In Guide to OCR for Indic Scripts. Advances in Pattern Recognition, Springer-Verlag, London, 209--234.Google Scholar
Bharath, A. and Madhvanath, S. 2012. HMM-based lexicon-driven and lexicon-free word recognition for online handwritten Indic scripts. IEEE Trans. Pattern Anal. Machine Intell. 34, 4, 670--682. Google ScholarDigital Library
Chang, C. C. and Lin, C. J. 2011. Lib-SVM : A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1--39. Google ScholarDigital Library
Deepu, V., Madhvanath, S., and Ramakrishnan, A. G. 2004. Principal component analysis for online handwritten character recognition. In Proceedings of the International Conference on Pattern Recognition. 327--330. Google ScholarDigital Library
Guyon, I., Schomaker, L., Plamondon, R., Liberman, M., and Janet, S. 1994. Unipen project of on-line data exchange and recognizer benchmarks. In Proceedings of the International Conference on Pattern Recognition. 29--33.Google Scholar
Huang, X., Reddy, R., and Acero, A. 2001. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Pearson Hall. Google ScholarDigital Library
Joshi, N., Sita, G., Ramakrishnan, A. G., and Madhavanath, S. 2004. Comparison of elastic matching algorithms for online Tamil handwritten character recognition. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 444--449. Google ScholarDigital Library
Kiran, S., Prasad, K. S., Kunwar, R., and Ramakrishnan, A. G. 2010. Comparison of HMM and SDTW for Tamil handwritten character recognition. In Proceedings of the International Conference on Signal Processing and Communications (SPCOM’10). 1--4.Google Scholar
Leung, K. C. and Leung, C. H. 2010. Recognition of handwritten Chinese characters by critical region analysis. Pattern Recog. 43, 3, 949--961. Google ScholarDigital Library
Li, Y. X. and Tan, C. L. 2004a. An empirical study of statistical language models for contextual post-processing of Chinese script recognition. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 257--262. Google ScholarDigital Library
Li, Y. X. and Tan, C. L. 2004b. Influence of language models and candidate set size on contextual post-processing of Chinese script recognition. In Proceedings of the International Conference on Pattern Recognition. 537--540. Google ScholarDigital Library
Madhvanath, S. and Lucas, S. M. 2006. IWFHR 2006 online Tamil handwritten character recognition competition. In Proceedings of the International Conference on Frontiers in Handwriting Recognition.Google Scholar
Marti, U. V. and Bunke, H. 2000. Unconstrained handwriting recognition: Language models, perplexity and system performance. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 463--468.Google Scholar
Nethravathi, B., Archana, C. P., Shashikiran, K., Ramakrishnan, A. G., and Kumar, V. 2010. Creation of a huge annotated database for Tamil and Kannada OHR. In Proceedings of the International Conference on Frontiers in Handwriting Recognition. 415--420. Google ScholarDigital Library
Niels, R. and Vuurpijl, L. 2005. Dynamic time warping applied to Tamil character recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 730--734. Google ScholarDigital Library
Perraud, F., Viard-Gaudin, C., Morin, E., and Lallican, P. M. 2003. N-gram and n-class models for on line handwriting recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 1053--1059. Google ScholarDigital Library
Prasanth, L., Babu, J., Sharma, R., Rao, P., and Dinesh, M. 2007. Elastic matching of online handwritten Tamil and Telugu scripts using local features. In Proceedings of the International Conference on Document Analysis and Recognition. 1028--1032. Google ScholarDigital Library
Quiniou, S. and Anquetil, E. 2006. A priori and a posteriori integration and combination of language models in an on-line handwritten sentence recognition system. In Proceedings of the International Workshop on Frontiers in Handwriting Recognition. 403--408.Google Scholar
Quiniou, S., Anquetil, E., and Carbonnel, S. 2005. Statistical language models for on-line handwritten sentence recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 516--520. Google ScholarDigital Library
Rabiner, L. and Juang, B. 1986. An introduction to hidden Markov models. IEEE ASSP Mag. 3, 1, 4--16.Google ScholarCross Ref
Raghavendra, B. S., Narayanan, C. K., Sita, G., Ramakrishnan, A. G., and Sriganesh, M. 2005. Prototype learning methods for online handwriting recognition. In Proceedings of the International Conference on Document Analysis and Recognition. 287--291. Google ScholarDigital Library
Rahman, A. F. R. and Fairhurst, M. C. 1997. Selective partition algorithm for finding regions of maximum pairwise dissimilarity among statistical class models. Pattern Recog. Lett. 18, 7, 605--611. Google ScholarDigital Library
Sundaram, S. and Ramakrishnan, A. G. 2008. Two dimensional principal component analysis for online Tamil character recognition. In Proceedings of the International Conference on Frontiers in Handwriting Recognition. 88--93.Google Scholar
Sundaram, S. and Ramakrishnan, A. G. 2009. An improved online Tamil character recognition engine using post-processing methods. In Proceedings of the International Conference on Document Analysis and Recognition. 1216--1220. Google ScholarDigital Library
Sundaram, S. and Ramakrishnan, A. G. 2010. Attention feedback based robust segmentation of online handwritten words. Indian Patent Office Reference No. 03974/CHE.Google Scholar
Sundaram, S. and Ramakrishnan, A. G. 2013. Attention-feedback based robust segmentation of online handwritten isolated Tamil words. ACM Trans. Asian Lang. Inform. Process. 12, 1, 1--25. Google ScholarDigital Library
Sundaresan, C. S. and Keerthi, S. S. 1999. A study of representations for pen based handwriting recognition of Tamil characters. In Proceedings of the International Conference on Document Analysis and Recognition. 422--425. Google ScholarDigital Library
Swethalakshmi, H., Sekhar, C. C., and Chakravarthy, V. S. 2007. Spatiostructural features for recognition of online handwritten characters in Devanagari and Tamil scripts. In Proceedings of the International Conference on Artificial Neural Networks. 230--239. Google ScholarDigital Library
Toselli, A. H., Pastor, M., and Vidal, E. 2007. On-line handwriting recognition system for Tamil handwritten characters. In Proceedings of Pattern Recognition and Image Analysis. 370--377. Google ScholarDigital Library
Vinciarelli, A., Bengio, S., and Bunke, H. 2004. Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Machine Intell. 26, 6, 709--720. Google ScholarDigital Library
Vuurpijl, L., Schomaker, L., and Erp, M. V. 2003. Architectures for detecting and solving conflicts: Two-stage classification and support vector classifiers. Int. J. Document Anal. Recog. 5, 4, 213--223. Google ScholarDigital Library
Xu, B., Huang, K., and Liu, C.-L. 2010. Similar handwritten Chinese characters recognition by critical region selection based on average symmetric uncertainty. In Proceedings of the International Conference on Frontiers in Handwriting Recognition. 527--532. Google ScholarDigital Library
Zimmermann, M. and Bunke, H. 2004. Optimizing the integration of a statistical language model in HMM based offline handwritten text recognition. In Proceedings of the International Conference on Pattern Recognition. 203--208. Google ScholarDigital Library

Index Terms

Bigram Language Models and Reevaluation Strategy for Improved Recognition of Online Handwritten Tamil Words
1. Applied computing
  1. Document management and text processing
    1. Document capture
2. Computing methodologies
  1. Machine learning

Recommendations

Attention-Feedback Based Robust Segmentation of Online Handwritten Isolated Tamil Words

In this article, we propose a lexicon-free, script-dependent approach to segment online handwritten isolated Tamil words into its constituent symbols. Our proposed segmentation strategy comprises two modules, namely the (1) Dominant Overlap Criterion ...
Read More
Language models for online handwritten Tamil word recognition
DAR '12: Proceeding of the workshop on Document Analysis and Recognition

N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on ...
Read More
Performance enhancement of online handwritten Tamil symbol recognition with reevaluation techniques

In this article, we aim at reducing the error rate of the online Tamil symbol recognition system by employing multiple experts to reevaluate certain decisions of the primary support vector machine classifier. Motivated by the relatively high percentage ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 14, Issue 2
March 2015
96 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/2764912
Editor:
Richard Sproat
Google, Inc., USA
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 April 2015
- Accepted: 1 September 2014
- Revised: 1 July 2014
- Received: 1 October 2012
Published in tallip Volume 14, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Online Tamil words
expert classifiers
language models
reevaluation
support vector machines (SVM)
Qualifiers
- note
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 188
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Bigram Language Models and Reevaluation Strategy for Improved Recognition of Online Handwritten Tamil Words

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Attention-Feedback Based Robust Segmentation of Online Handwritten Isolated Tamil Words

Language models for online handwritten Tamil word recognition

Performance enhancement of online handwritten Tamil symbol recognition with reevaluation techniques

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Bigram Language Models and Reevaluation Strategy for Improved Recognition of Online Handwritten Tamil Words

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Attention-Feedback Based Robust Segmentation of Online Handwritten Isolated Tamil Words

Language models for online handwritten Tamil word recognition

Performance enhancement of online handwritten Tamil symbol recognition with reevaluation techniques

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media