Skip to main content

2015 | OriginalPaper | Buchkapitel

Line-Level Script Identification for Six Handwritten Scripts Using Texture Based Features

verfasst von : Pawan Kumar Singh, Ram Sarkar, Mita Nasipuri

Erschienen in: Information Systems Design and Intelligent Applications

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Script identification from a given document image has some important applicability in many computer applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific Optical Character Recognition (OCR) engine in any multilingual environment. In this paper, we propose a texture based approach for text line-level script identification of six handwritten scripts namely, Bangla, Devnagari, Malayalam, Tamil, Telugu and Roman. A set of 80 features based on Gray Level Co-occurrence Matrix (GLCM) has been designed for the present work. Multi Layer Perceptron (MLP) is found to be the best classifier among a set of popular multiple classifiers which is then extensively tested by tuning different parameters. Finally, an accuracy of 95.67 % has been achieved on a dataset of 600 text lines using 3-fold cross validation with epoch size 1,500 of MLP classifier.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Pal, U., Chaudhuri, B.B.: Script line separation from indian multi-script documents. In: Proceedings of 5th International Conference on Document Analysis and Recognition (ICDAR), pp. 406–409. (1999) Pal, U., Chaudhuri, B.B.: Script line separation from indian multi-script documents. In: Proceedings of 5th International Conference on Document Analysis and Recognition (ICDAR), pp. 406–409. (1999)
2.
Zurück zum Zitat Pal, U., Chaudhuri, B.B.: Identification of different script lines from multi-script documents. Image Vis. Comput. 20(13–14), 945–954 (2002)CrossRef Pal, U., Chaudhuri, B.B.: Identification of different script lines from multi-script documents. Image Vis. Comput. 20(13–14), 945–954 (2002)CrossRef
3.
Zurück zum Zitat Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-script line identification from indian documents. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), pp. 880–884. (2003) Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-script line identification from indian documents. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), pp. 880–884. (2003)
4.
Zurück zum Zitat Joshi, G.D., Garg, S., Sivaswamy, J.: Script identification from Indian documents. In: International Workshop Document Analysis Systems, Nelson. Lecture Notes in Computer Science, vol. 3872, pp. 255–267. (2006) Joshi, G.D., Garg, S., Sivaswamy, J.: Script identification from Indian documents. In: International Workshop Document Analysis Systems, Nelson. Lecture Notes in Computer Science, vol. 3872, pp. 255–267. (2006)
5.
Zurück zum Zitat Padma, M.C., Vijaya, P.A.: Identification of Telugu, Devnagari and English scripts using discriminating features. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 1(2), 64–78 (2009) Padma, M.C., Vijaya, P.A.: Identification of Telugu, Devnagari and English scripts using discriminating features. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 1(2), 64–78 (2009)
6.
Zurück zum Zitat Gopakumar, R., SubbaReddy, N.V., Makkithaya, K., Dinesh Acharya, U.: Script identification from multilingual indian documents using structural features. J. Comput. 2(7), 106–111 (2010) Gopakumar, R., SubbaReddy, N.V., Makkithaya, K., Dinesh Acharya, U.: Script identification from multilingual indian documents using structural features. J. Comput. 2(7), 106–111 (2010)
7.
Zurück zum Zitat Chaudhuri, B.B., Bera, S.: Handwritten text line identification in Indian scripts. In: Proceedings of 10th International Conference on Document Analysis and Recognition, pp. 636–640. (2009) Chaudhuri, B.B., Bera, S.: Handwritten text line identification in Indian scripts. In: Proceedings of 10th International Conference on Document Analysis and Recognition, pp. 636–640. (2009)
8.
Zurück zum Zitat Hangarge, M., Dhandra, B.V.: Offline handwritten script identification in document images. Int. J. Comput. Appl. (IJCA) 4(6), 6–10 (2010) Hangarge, M., Dhandra, B.V.: Offline handwritten script identification in document images. Int. J. Comput. Appl. (IJCA) 4(6), 6–10 (2010)
9.
Zurück zum Zitat Haralick, R.M., Shanmungam, K., Dinstein, I.: Textural features of image classification. IEEE Trans. Syst. Man, Cybern. 3, 610–621 (1973)CrossRef Haralick, R.M., Shanmungam, K., Dinstein, I.: Textural features of image classification. IEEE Trans. Syst. Man, Cybern. 3, 610–621 (1973)CrossRef
10.
Zurück zum Zitat Haralick, R.M., Watson, L.: A facet model for image data. Comput. Vision Graph. Image Process. 15, 113–129 (1981)CrossRef Haralick, R.M., Watson, L.: A facet model for image data. Comput. Vision Graph. Image Process. 15, 113–129 (1981)CrossRef
11.
Zurück zum Zitat Busch, A., Boles, W.W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1720–1732 (2005)CrossRef Busch, A., Boles, W.W., Sridharan, S.: Texture for script identification. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1720–1732 (2005)CrossRef
12.
Zurück zum Zitat Gonzalez, R.C., Woods, R.E.: Digital Image Processing, vol. I. PHI, New Delhi (1992) Gonzalez, R.C., Woods, R.E.: Digital Image Processing, vol. I. PHI, New Delhi (1992)
13.
Zurück zum Zitat Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M.: Extraction of text lines from handwritten documents using piecewise water flow technique. J. Intell. Syst. 23(3), 245–260 (2014) Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M.: Extraction of text lines from handwritten documents using piecewise water flow technique. J. Intell. Syst. 23(3), 245–260 (2014)
14.
Zurück zum Zitat Ostu, N.: A thresholding selection method from gray-level histogram. IEEE Trans. Syst. Man Cybern. SMC-8, 62–66 (1978) Ostu, N.: A thresholding selection method from gray-level histogram. IEEE Trans. Syst. Man Cybern. SMC-8, 62–66 (1978)
Metadaten
Titel
Line-Level Script Identification for Six Handwritten Scripts Using Texture Based Features
verfasst von
Pawan Kumar Singh
Ram Sarkar
Mita Nasipuri
Copyright-Jahr
2015
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2247-7_30

Premium Partner