Skip to main content

2017 | OriginalPaper | Buchkapitel

Effective Printed Tamil Text Segmentation and Recognition Using Bayesian Classifier

verfasst von : S. Manisha, T. Sree Sharmila

Erschienen in: Computational Intelligence in Data Mining

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text segmentation and recognition of Indian languages have gained a lot of research interest in the recent years. The existence of a huge number of symbols and varying characteristics in these languages makes segmentation and extraction of text a challenging task. The Tamil language has a wide variety of the literature, and printed text is available in various forms such as newspaper, books, and magazines. In this paper, extraction of printed Tamil text from an image is done irrespective of the characteristics of the text such as font style, color, and size. The proposed work uses scanned printed Tamil text as the input image. This input image is binarized since text is always available in the foreground, and histograms can be used to segment them into lines and words. The morphological operator, dilation, is used to remove outliers such as dots and commas present in an underlying object and segment the printed text into words to facilitate text detection. Further, each character is identified using bounding box technique. Classification of Tamil letters is done by extracting features such as gradient information and curvature-based information obtained from grayscale and binary images. These features are trained, and characters are classified using Bayesian classifier. The recognized characters are documented as text using Unicode format. The performance of the approach is evaluated using precision, recall, and F-measure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ayan Kumar Bhunia, Ayan Das, Partha Pratim Roy, UmapadaPal, “A Comparative Study of Features for Handwritten Bangla Text Recognition”, International Conference on Document Analysis and Recognition (ICDAR), pp 636–640, 2015. Ayan Kumar Bhunia, Ayan Das, Partha Pratim Roy, UmapadaPal, “A Comparative Study of Features for Handwritten Bangla Text Recognition”, International Conference on Document Analysis and Recognition (ICDAR), pp 636–640, 2015.
2.
Zurück zum Zitat S.M. Shyni, M. Antony Robert Raj and S. Abirami, “Offline Tamil Handwritten Character Recognition Using Sub Line Direction and Bounding Box Techniques”, Indian Journal of Science and Technology, Vol 8(S7), pp 110–116, 2015. S.M. Shyni, M. Antony Robert Raj and S. Abirami, “Offline Tamil Handwritten Character Recognition Using Sub Line Direction and Bounding Box Techniques”, Indian Journal of Science and Technology, Vol 8(S7), pp 110–116, 2015.
3.
Zurück zum Zitat M. Antony Robert Raj, Dr. S. Abirami, “A Survey on Tamil Handwritten Character Recognition using OCR Techniques”, CCSEA, SEA, CLOUD, DKMP, CS & IT 05, pp. 115–127, 2012. M. Antony Robert Raj, Dr. S. Abirami, “A Survey on Tamil Handwritten Character Recognition using OCR Techniques”, CCSEA, SEA, CLOUD, DKMP, CS & IT 05, pp. 115–127, 2012.
4.
Zurück zum Zitat Dr. C.P. Sumathi, S. Karpagavalli, “Techniques and methodologies for Recognition of Tamil Typewritten and Handwritten Characters: A Survey”, International Journal of Computer Science & Engineering Survey, Vol 3 (6), pp 23–35, 2012. Dr. C.P. Sumathi, S. Karpagavalli, “Techniques and methodologies for Recognition of Tamil Typewritten and Handwritten Characters: A Survey”, International Journal of Computer Science & Engineering Survey, Vol 3 (6), pp 23–35, 2012.
5.
Zurück zum Zitat Mallikarjun Hangarge, K.C. Santosh, Srikanth Doddamani, Rajmohan Pardeshi, “Statistical Texture Features based Handwritten and Printed Text Classification in South Indian Documents”, In proceedings of ICECIT, pp 215–221, 2012. Mallikarjun Hangarge, K.C. Santosh, Srikanth Doddamani, Rajmohan Pardeshi, “Statistical Texture Features based Handwritten and Printed Text Classification in South Indian Documents”, In proceedings of ICECIT, pp 215–221, 2012.
6.
Zurück zum Zitat Jomy John, Pramod K.V., Kannan Balakrishnan, “Handwritten Character Recognition of South Indian Scripts: A Review”, National Conference on Indian Language Computing, Kochi, pp 1–6, 2011. Jomy John, Pramod K.V., Kannan Balakrishnan, “Handwritten Character Recognition of South Indian Scripts: A Review”, National Conference on Indian Language Computing, Kochi, pp 1–6, 2011.
7.
Zurück zum Zitat U. Pal, T. Wakabayashi, F. Kimura, “Comparative Study of Devnagari Handwritten Character Recognition using Different Feature and Classifiers”, International Conference on Document Analysis and Recognition, pp 1111–1115, 2009. U. Pal, T. Wakabayashi, F. Kimura, “Comparative Study of Devnagari Handwritten Character Recognition using Different Feature and Classifiers”, International Conference on Document Analysis and Recognition, pp 1111–1115, 2009.
8.
Zurück zum Zitat LTG (Language Technologies Group), “Optical Character Recognition for Printed Kannada Text Documents”, SERC, IISc Bangalore, 2003. LTG (Language Technologies Group), “Optical Character Recognition for Printed Kannada Text Documents”, SERC, IISc Bangalore, 2003.
9.
Zurück zum Zitat U. Pal and B. B. Choudhuri, “A Complete Printed Bangla OCR System”, Pattern Recognition. Vol 31 (5), pp 531–549, 1997. U. Pal and B. B. Choudhuri, “A Complete Printed Bangla OCR System”, Pattern Recognition. Vol 31 (5), pp 531–549, 1997.
10.
Zurück zum Zitat U. Patil, M. Begum, “Word level handwritten and printed text separation based on shape features”, International Journal of Emerging Technology and Advanced Engineering Vol. 2 (4), pp 590–594, 2012. U. Patil, M. Begum, “Word level handwritten and printed text separation based on shape features”, International Journal of Emerging Technology and Advanced Engineering Vol. 2 (4), pp 590–594, 2012.
11.
Zurück zum Zitat Kefali A, Sari, Sellami M, “Evaluation of binarization techniques for old Arabic document images”, MISC 2010, Algeria, pp. 88–99, 2010. Kefali A, Sari, Sellami M, “Evaluation of binarization techniques for old Arabic document images”, MISC 2010, Algeria, pp. 88–99, 2010.
12.
Zurück zum Zitat Seethalakshmi R., Sree Ranjani T.R., Balachandar T., “Optical Character Recognition for Printed Tamil text using Unicode”, Journal of Zhejiang University Science, 6A (11), pp 1297–1305, 2005. Seethalakshmi R., Sree Ranjani T.R., Balachandar T., “Optical Character Recognition for Printed Tamil text using Unicode”, Journal of Zhejiang University Science, 6A (11), pp 1297–1305, 2005.
13.
Zurück zum Zitat Otsu. N, “A threshold selection method from gray level histograms”, IEEE Trans. Systems, Man and Cybernetics, vol. 9, pp. 62–66, 1979. Otsu. N, “A threshold selection method from gray level histograms”, IEEE Trans. Systems, Man and Cybernetics, vol. 9, pp. 62–66, 1979.
14.
Zurück zum Zitat Trier. O.D, Jain. A.K and Taxt. J, “Feature extraction methods for character recognition - A survey”, Pattern Recognition, vol. 29, no. 4, pp. 641–662, 1996. Trier. O.D, Jain. A.K and Taxt. J, “Feature extraction methods for character recognition - A survey”, Pattern Recognition, vol. 29, no. 4, pp. 641–662, 1996.
15.
Zurück zum Zitat R. Indra Gandhi, Dr. K. Iyakutti, “An Attempt to Recognize Handwritten Tamil Character Using Kohonen SOM”, International Journal of Advanced Networking and Applications, Vol 1 (3), pp 188–192, 2009. R. Indra Gandhi, Dr. K. Iyakutti, “An Attempt to Recognize Handwritten Tamil Character Using Kohonen SOM”, International Journal of Advanced Networking and Applications, Vol 1 (3), pp 188–192, 2009.
16.
Zurück zum Zitat Aparna K G and A G Ramakrishnan, “A Complete Tamil Optical Character Recognition System”, white paper pages 11, 2000. Aparna K G and A G Ramakrishnan, “A Complete Tamil Optical Character Recognition System”, white paper pages 11, 2000.
17.
Zurück zum Zitat Siromoney et al., “Computer recognition of printed Tamil character”, Pattern Recognition, Vol. 10, pp 243–247, 1978. Siromoney et al., “Computer recognition of printed Tamil character”, Pattern Recognition, Vol. 10, pp 243–247, 1978.
18.
Zurück zum Zitat Palaiahnakote Shivakumara, Rushi Padhuman Sreedhar, Trung Quy Phan, Shijian Lu, “Multi-oriented Video Scene Text Detection Through Bayesian Classification and Boundary Growing”, IEEE Transactions on Circuits and systems for Video Technology, Vol. 22 (8), pp 1227–1235, 2012. Palaiahnakote Shivakumara, Rushi Padhuman Sreedhar, Trung Quy Phan, Shijian Lu, “Multi-oriented Video Scene Text Detection Through Bayesian Classification and Boundary Growing”, IEEE Transactions on Circuits and systems for Video Technology, Vol. 22 (8), pp 1227–1235, 2012.
19.
Zurück zum Zitat Mohamed Ben Halima, Hichem Karray and Adel M. Alimi, “Arabic Text Recognition in Video Sequences”, pp 603–608, 2010. Mohamed Ben Halima, Hichem Karray and Adel M. Alimi, “Arabic Text Recognition in Video Sequences”, pp 603–608, 2010.
Metadaten
Titel
Effective Printed Tamil Text Segmentation and Recognition Using Bayesian Classifier
verfasst von
S. Manisha
T. Sree Sharmila
Copyright-Jahr
2017
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-3874-7_69

Premium Partner