Skip to main content
Top

2017 | OriginalPaper | Chapter

Effective Printed Tamil Text Segmentation and Recognition Using Bayesian Classifier

Authors : S. Manisha, T. Sree Sharmila

Published in: Computational Intelligence in Data Mining

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Text segmentation and recognition of Indian languages have gained a lot of research interest in the recent years. The existence of a huge number of symbols and varying characteristics in these languages makes segmentation and extraction of text a challenging task. The Tamil language has a wide variety of the literature, and printed text is available in various forms such as newspaper, books, and magazines. In this paper, extraction of printed Tamil text from an image is done irrespective of the characteristics of the text such as font style, color, and size. The proposed work uses scanned printed Tamil text as the input image. This input image is binarized since text is always available in the foreground, and histograms can be used to segment them into lines and words. The morphological operator, dilation, is used to remove outliers such as dots and commas present in an underlying object and segment the printed text into words to facilitate text detection. Further, each character is identified using bounding box technique. Classification of Tamil letters is done by extracting features such as gradient information and curvature-based information obtained from grayscale and binary images. These features are trained, and characters are classified using Bayesian classifier. The recognized characters are documented as text using Unicode format. The performance of the approach is evaluated using precision, recall, and F-measure.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ayan Kumar Bhunia, Ayan Das, Partha Pratim Roy, UmapadaPal, “A Comparative Study of Features for Handwritten Bangla Text Recognition”, International Conference on Document Analysis and Recognition (ICDAR), pp 636–640, 2015. Ayan Kumar Bhunia, Ayan Das, Partha Pratim Roy, UmapadaPal, “A Comparative Study of Features for Handwritten Bangla Text Recognition”, International Conference on Document Analysis and Recognition (ICDAR), pp 636–640, 2015.
2.
go back to reference S.M. Shyni, M. Antony Robert Raj and S. Abirami, “Offline Tamil Handwritten Character Recognition Using Sub Line Direction and Bounding Box Techniques”, Indian Journal of Science and Technology, Vol 8(S7), pp 110–116, 2015. S.M. Shyni, M. Antony Robert Raj and S. Abirami, “Offline Tamil Handwritten Character Recognition Using Sub Line Direction and Bounding Box Techniques”, Indian Journal of Science and Technology, Vol 8(S7), pp 110–116, 2015.
3.
go back to reference M. Antony Robert Raj, Dr. S. Abirami, “A Survey on Tamil Handwritten Character Recognition using OCR Techniques”, CCSEA, SEA, CLOUD, DKMP, CS & IT 05, pp. 115–127, 2012. M. Antony Robert Raj, Dr. S. Abirami, “A Survey on Tamil Handwritten Character Recognition using OCR Techniques”, CCSEA, SEA, CLOUD, DKMP, CS & IT 05, pp. 115–127, 2012.
4.
go back to reference Dr. C.P. Sumathi, S. Karpagavalli, “Techniques and methodologies for Recognition of Tamil Typewritten and Handwritten Characters: A Survey”, International Journal of Computer Science & Engineering Survey, Vol 3 (6), pp 23–35, 2012. Dr. C.P. Sumathi, S. Karpagavalli, “Techniques and methodologies for Recognition of Tamil Typewritten and Handwritten Characters: A Survey”, International Journal of Computer Science & Engineering Survey, Vol 3 (6), pp 23–35, 2012.
5.
go back to reference Mallikarjun Hangarge, K.C. Santosh, Srikanth Doddamani, Rajmohan Pardeshi, “Statistical Texture Features based Handwritten and Printed Text Classification in South Indian Documents”, In proceedings of ICECIT, pp 215–221, 2012. Mallikarjun Hangarge, K.C. Santosh, Srikanth Doddamani, Rajmohan Pardeshi, “Statistical Texture Features based Handwritten and Printed Text Classification in South Indian Documents”, In proceedings of ICECIT, pp 215–221, 2012.
6.
go back to reference Jomy John, Pramod K.V., Kannan Balakrishnan, “Handwritten Character Recognition of South Indian Scripts: A Review”, National Conference on Indian Language Computing, Kochi, pp 1–6, 2011. Jomy John, Pramod K.V., Kannan Balakrishnan, “Handwritten Character Recognition of South Indian Scripts: A Review”, National Conference on Indian Language Computing, Kochi, pp 1–6, 2011.
7.
go back to reference U. Pal, T. Wakabayashi, F. Kimura, “Comparative Study of Devnagari Handwritten Character Recognition using Different Feature and Classifiers”, International Conference on Document Analysis and Recognition, pp 1111–1115, 2009. U. Pal, T. Wakabayashi, F. Kimura, “Comparative Study of Devnagari Handwritten Character Recognition using Different Feature and Classifiers”, International Conference on Document Analysis and Recognition, pp 1111–1115, 2009.
8.
go back to reference LTG (Language Technologies Group), “Optical Character Recognition for Printed Kannada Text Documents”, SERC, IISc Bangalore, 2003. LTG (Language Technologies Group), “Optical Character Recognition for Printed Kannada Text Documents”, SERC, IISc Bangalore, 2003.
9.
go back to reference U. Pal and B. B. Choudhuri, “A Complete Printed Bangla OCR System”, Pattern Recognition. Vol 31 (5), pp 531–549, 1997. U. Pal and B. B. Choudhuri, “A Complete Printed Bangla OCR System”, Pattern Recognition. Vol 31 (5), pp 531–549, 1997.
10.
go back to reference U. Patil, M. Begum, “Word level handwritten and printed text separation based on shape features”, International Journal of Emerging Technology and Advanced Engineering Vol. 2 (4), pp 590–594, 2012. U. Patil, M. Begum, “Word level handwritten and printed text separation based on shape features”, International Journal of Emerging Technology and Advanced Engineering Vol. 2 (4), pp 590–594, 2012.
11.
go back to reference Kefali A, Sari, Sellami M, “Evaluation of binarization techniques for old Arabic document images”, MISC 2010, Algeria, pp. 88–99, 2010. Kefali A, Sari, Sellami M, “Evaluation of binarization techniques for old Arabic document images”, MISC 2010, Algeria, pp. 88–99, 2010.
12.
go back to reference Seethalakshmi R., Sree Ranjani T.R., Balachandar T., “Optical Character Recognition for Printed Tamil text using Unicode”, Journal of Zhejiang University Science, 6A (11), pp 1297–1305, 2005. Seethalakshmi R., Sree Ranjani T.R., Balachandar T., “Optical Character Recognition for Printed Tamil text using Unicode”, Journal of Zhejiang University Science, 6A (11), pp 1297–1305, 2005.
13.
go back to reference Otsu. N, “A threshold selection method from gray level histograms”, IEEE Trans. Systems, Man and Cybernetics, vol. 9, pp. 62–66, 1979. Otsu. N, “A threshold selection method from gray level histograms”, IEEE Trans. Systems, Man and Cybernetics, vol. 9, pp. 62–66, 1979.
14.
go back to reference Trier. O.D, Jain. A.K and Taxt. J, “Feature extraction methods for character recognition - A survey”, Pattern Recognition, vol. 29, no. 4, pp. 641–662, 1996. Trier. O.D, Jain. A.K and Taxt. J, “Feature extraction methods for character recognition - A survey”, Pattern Recognition, vol. 29, no. 4, pp. 641–662, 1996.
15.
go back to reference R. Indra Gandhi, Dr. K. Iyakutti, “An Attempt to Recognize Handwritten Tamil Character Using Kohonen SOM”, International Journal of Advanced Networking and Applications, Vol 1 (3), pp 188–192, 2009. R. Indra Gandhi, Dr. K. Iyakutti, “An Attempt to Recognize Handwritten Tamil Character Using Kohonen SOM”, International Journal of Advanced Networking and Applications, Vol 1 (3), pp 188–192, 2009.
16.
go back to reference Aparna K G and A G Ramakrishnan, “A Complete Tamil Optical Character Recognition System”, white paper pages 11, 2000. Aparna K G and A G Ramakrishnan, “A Complete Tamil Optical Character Recognition System”, white paper pages 11, 2000.
17.
go back to reference Siromoney et al., “Computer recognition of printed Tamil character”, Pattern Recognition, Vol. 10, pp 243–247, 1978. Siromoney et al., “Computer recognition of printed Tamil character”, Pattern Recognition, Vol. 10, pp 243–247, 1978.
18.
go back to reference Palaiahnakote Shivakumara, Rushi Padhuman Sreedhar, Trung Quy Phan, Shijian Lu, “Multi-oriented Video Scene Text Detection Through Bayesian Classification and Boundary Growing”, IEEE Transactions on Circuits and systems for Video Technology, Vol. 22 (8), pp 1227–1235, 2012. Palaiahnakote Shivakumara, Rushi Padhuman Sreedhar, Trung Quy Phan, Shijian Lu, “Multi-oriented Video Scene Text Detection Through Bayesian Classification and Boundary Growing”, IEEE Transactions on Circuits and systems for Video Technology, Vol. 22 (8), pp 1227–1235, 2012.
19.
go back to reference Mohamed Ben Halima, Hichem Karray and Adel M. Alimi, “Arabic Text Recognition in Video Sequences”, pp 603–608, 2010. Mohamed Ben Halima, Hichem Karray and Adel M. Alimi, “Arabic Text Recognition in Video Sequences”, pp 603–608, 2010.
Metadata
Title
Effective Printed Tamil Text Segmentation and Recognition Using Bayesian Classifier
Authors
S. Manisha
T. Sree Sharmila
Copyright Year
2017
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-3874-7_69

Premium Partner