Skip to main content
Log in

A font and size-independent OCR system for printed Kannada documents using support vector machines

  • Published:
Sadhana Aims and scope Submit manuscript

Abstract

This paper describes an OCR system for printed text documents in Kannada, a South Indian language. The input to the system would be the scanned image of a page of text and the output is a machine editable file compatible with most typesetting software. The system first extracts words from the document image and then segments the words into sub-character level pieces. The segmentation algorithm is motivated by the structure of the script. We propose a novel set of features for the recognition problem which are computationally simple to extract. The final recognition is achieved by employing a number of 2-class classifiers based on the Support Vector Machine (SVM) method. The recognition is independent of the font and size of the printed text and the system is seen to deliver reasonable performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Antani S, Agnihotri L 1999 Gujarathi character recognition. InProc. Fifth Int. Conf. on Document Analysis and Recognition, Bangalore (IEEE Computer Society Press) pp 418–421

    Google Scholar 

  • Ashwin T V 2000A font and size independent OCR for printed Kannada using SVM. M E Project Report, Dept. Electrical Engg., Indian Institute of Science, Bangalore

    Google Scholar 

  • Bansal V, Sinha R M K 1999 On how to describe shapes of Devanagari characters and use them for recognition. InProc. Fifth Int. Conf. on Document Analysis and Recognition, Bangalore (IEEE Computer Society Press) pp 410–13

    Google Scholar 

  • Bosker M 1992 Omnidocument technologies.Proc. IEEE 80: 1066–1078

    Article  Google Scholar 

  • Burges C 1988 A tutorial on support vector machines for pattern recognition.Data Mining Knowledge Discovery 2: 121–167, available athttp://svm.research.bell-labs.com/papers/tutoriaL web -page.ps.gz.

    Article  Google Scholar 

  • Choudhury B B, Pal U 1997 An OCR system to read two Indian language scripts: Bangla and Devanagari. InProc. Fourth Int. Conf. on Document Analysis and Recognition (IEEE Computer Society Press) pp 1011–1015

  • Jagadeesh G S Gopinath V 2000 Kantex, a transliteration package for Kannada available at http://langmuir.eecs.berkeley.edur venkates/kantex_l.00.html).

  • Joachims T 1999a Making large-scale support vector machine learning practical. InAdvances in kernel methods -support vector learning (eds) B Scholkopf, C J C Burges, A Smola (Cambridge, MA: MIT Press) available athttp://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_99a.ps.gz

    Google Scholar 

  • Joachims T 1999bSVMlight. http://www-ai.informatik.uni-dortmund.de/FORSCHUNG/VER-FAHREN/SVM_LIGHT/svm_light.eng.html

  • Keerthi S S, Shevade S K, Bhattacharyya C, Murthy K R K 2000 A fast iterative nearest point algorithm for support vector machine classifier design.IEEE Trans. Neural Networks 11: 124–136

    Article  Google Scholar 

  • Lee H J, Chen B 1992 Recognition of handwritten Chinese characters via short line segments.Pattern Recogn. 25: 543–552

    Article  Google Scholar 

  • Lu S W, Ren Y, Suen C Y 1991 Hierarchical attributed graph representation and recognition of handwritten Chinese characters.Pattern Recogn. 24: 617–632

    Article  Google Scholar 

  • Mangasarian O L, Musicant D R 1999 Successive overrelaxation for support vector machines.IEEE Trans. Neural Networks 10: 1032–1037

    Article  Google Scholar 

  • O’Gorman L, Kasturi R 1995Document image analysis (IEEE Computer Society Press)

  • Pavlidis T 1986 A vectorizer and feature extractor for document recognition.Comput. Vision Graphics Image Process. 35: 111–127

    Article  Google Scholar 

  • Platt J C 1999 Sequential minimal optimisation: A fast algorithm for training support vector machines. InAdvances in kernel methods -support vector learning (eds) B Scholkopf, C J C Burges, A Smola (Cambridge, MA: MIT Press) available athttp://www.research.microsoft.com/∼jplatt

    Google Scholar 

  • Sekita I, Toraichi K, Mori R 1988 Feature extraction of hand written Japanese characters using spline functions and relaxation matching.Pattern Recogn. 21: 821–828

    Article  Google Scholar 

  • Sinha R M K, Mahabala H 1979 Machine recognition of Devanagari script.IEEE Trans. Syst., Man Cybern. 9: 435–149

    Article  MATH  MathSciNet  Google Scholar 

  • Trier O D, Jain A K, Taxt T 1996 Feature extraction methods for character recognition -a survey.Pattern Recogn. 29: 641–662

    Article  Google Scholar 

  • Vapnik V N 1995The nature of statistical learning theory (New York: Springer-Verlag)

    MATH  Google Scholar 

  • Vapnik V N 1999 An overview of statistical learning theory.IEEE Trans. Neural Networks 10: 988–999

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ashwin, T.V., Sastry, P.S. A font and size-independent OCR system for printed Kannada documents using support vector machines. Sadhana 27, 35–58 (2002). https://doi.org/10.1007/BF02703311

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02703311

Keywords

Navigation