Skip to main content
Top

2016 | OriginalPaper | Chapter

Identification of Devanagari Script from Bilingual Printed Text Documents

Authors : Ranjana S. Zinjore, R. J. Ramteke

Published in: Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing

Publisher: Springer India

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Bilingual script identification is one of the challenging steps in the development of optical character recognition (OCR). As India is a multilingual multiscript country, in its constitution for each state, the respective state language and the script is used for state official work. In Maharashtra, Marathi is the state official language with Devanagari script and English as the communication language. We need to develop the OCR that can identify and differentiate both scripts. This paper presents a research work for identification of Devanagari (Marathi) script from printed bilingual text document. In our work, we have developed a methodology that applies projection profile in line segmentation which is followed by twofold word segmentation. We examined the use of structural features (header-line pixel count and intercharacter gap) as a tool for determining the Devanagari words. The heuristic rule approach is used for classification. The proposed method is implemented on ten printed bilingual text images. These images consist of 77 lines and 474 words of varying font sizes. The result of our experimentation shows accuracy of 87.25 % in identification of Marathi words.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Chaudhuri, B.B., Pal, U.: Automatic separation of words in Multi-lingual Multi-script India Documents. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, IEEE, 18–20 Aug 1997 Chaudhuri, B.B., Pal, U.: Automatic separation of words in Multi-lingual Multi-script India Documents. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, IEEE, 18–20 Aug 1997
2.
go back to reference Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Kumar Basu, D.: Word level script identification from Bangla and Devanagri handwritten texts mixed with Roman Script. J. Comput. 2(2) (2010). ISSN: 2151-9617 Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Kumar Basu, D.: Word level script identification from Bangla and Devanagri handwritten texts mixed with Roman Script. J. Comput. 2(2) (2010). ISSN: 2151-9617
3.
go back to reference Ajmire, P.E., Dharaskar, P.V., Thakare, V.M.: A Comparative study of handwritten Marathi character recognition. In: National Conference on Innovative Paradigms in Engineering and Technology (NCIPET-2012), Proceedings published by International Journal of Computer Application, pp. 26–28 (2012) Ajmire, P.E., Dharaskar, P.V., Thakare, V.M.: A Comparative study of handwritten Marathi character recognition. In: National Conference on Innovative Paradigms in Engineering and Technology (NCIPET-2012), Proceedings published by International Journal of Computer Application, pp. 26–28 (2012)
4.
go back to reference Shelke, S., Apte, S.: A multistage handwritten Marathi compound character recognition scheme using neural networks and wavelet features. Int. J. Signal Process. Image Process. Pattern Recogn. 4 (2011) Shelke, S., Apte, S.: A multistage handwritten Marathi compound character recognition scheme using neural networks and wavelet features. Int. J. Signal Process. Image Process. Pattern Recogn. 4 (2011)
5.
go back to reference Ambekar, A.G., Hinge, C.S., Kulkarni, S.S.: Bilingual OCR for printed English and Devnagari Text. Int. J. Res. 2(1) (2013). ISSN: 2250-1991 Ambekar, A.G., Hinge, C.S., Kulkarni, S.S.: Bilingual OCR for printed English and Devnagari Text. Int. J. Res. 2(1) (2013). ISSN: 2250-1991
6.
go back to reference Chaudhuri, B.B., Pal, U.: An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi). In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, IEEE, 18–20 Aug 1997 Chaudhuri, B.B., Pal, U.: An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi). In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, IEEE, 18–20 Aug 1997
7.
go back to reference Dhandra, B.V., Mallikarjun, H., Hegadil, R., Malemathl, V.S.: Word level script identification in Bilingual documents through discriminating features. In: International Conference on Signal Processing, Communications and Networking (ICSCN), IEEE, pp. 630–635 (2007) Dhandra, B.V., Mallikarjun, H., Hegadil, R., Malemathl, V.S.: Word level script identification in Bilingual documents through discriminating features. In: International Conference on Signal Processing, Communications and Networking (ICSCN), IEEE, pp. 630–635 (2007)
8.
go back to reference Patil, S.B., Subbareddy, N.V.: Neural network based system for script identification in Indian documents. Sadhana Special Issue Indian Lang. Doc. Process. 27, Part-1, 83–97 (2002) Patil, S.B., Subbareddy, N.V.: Neural network based system for script identification in Indian documents. Sadhana Special Issue Indian Lang. Doc. Process. 27, Part-1, 83–97 (2002)
9.
go back to reference Zhou, L., Lu, Y., Tan, C.L.: Bangla/English script identification based on analysis of connected component profiles. In: Proceedings of 7th IAPR Workshop on Document Analysis System, New land, pp. 234–254 (2006) Zhou, L., Lu, Y., Tan, C.L.: Bangla/English script identification based on analysis of connected component profiles. In: Proceedings of 7th IAPR Workshop on Document Analysis System, New land, pp. 234–254 (2006)
10.
go back to reference Hassan, E., Garg, R., Chaudhury, S., Gopal, M.: Script based text identification: a multi-level architecture. In: Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, ACM (2011) Hassan, E., Garg, R., Chaudhury, S., Gopal, M.: Script based text identification: a multi-level architecture. In: Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, ACM (2011)
11.
go back to reference Dhir, R., Singh, C., Lehal, G.S.: A Structural Feature based approach for script identification of Gurumukhi and Roman characters and words. In: Proceedings of 39th Annual National Convention of Computer Society of India, Dec 2004 (2004) Dhir, R., Singh, C., Lehal, G.S.: A Structural Feature based approach for script identification of Gurumukhi and Roman characters and words. In: Proceedings of 39th Annual National Convention of Computer Society of India, Dec 2004 (2004)
12.
go back to reference Aithal, P.K., Rajesh, G., Acharya, D.U., Krishnamoorthi, M., Subbareddy, N.V.: Text line script identification for a tri-lingual document. In: IEEE Second International conference on Computing, Communication and Networking Technologies, pp. 1–3 (2010). ISBN: 978-4244-6589-7 Aithal, P.K., Rajesh, G., Acharya, D.U., Krishnamoorthi, M., Subbareddy, N.V.: Text line script identification for a tri-lingual document. In: IEEE Second International conference on Computing, Communication and Networking Technologies, pp. 1–3 (2010). ISBN: 978-4244-6589-7
13.
go back to reference Vijaya, P.A., Padma, M.C.: Text line identification from a multilingual document. In: IEEE International Conference on Digital Image Processing, pp. 302–305 (2009). ISBN: 978-0-7695-3565-4 Vijaya, P.A., Padma, M.C.: Text line identification from a multilingual document. In: IEEE International Conference on Digital Image Processing, pp. 302–305 (2009). ISBN: 978-0-7695-3565-4
14.
go back to reference Elgammal, A.M., Ismail, A.M.: Techniques for language identification for hybrid Arabic-English document images. In: Proceedings Sixth International Conference on Document Analysis and Recognition, IEEE (2001) Elgammal, A.M., Ismail, A.M.: Techniques for language identification for hybrid Arabic-English document images. In: Proceedings Sixth International Conference on Document Analysis and Recognition, IEEE (2001)
15.
go back to reference Godara, S.P., Patwal, P.S.: Latin script detection and removal from Devanagari document image for OCR. Int. J. Comput. Organ. Trends 6, 33–36 (2014). ISSN: 2249-2593 Godara, S.P., Patwal, P.S.: Latin script detection and removal from Devanagari document image for OCR. Int. J. Comput. Organ. Trends 6, 33–36 (2014). ISSN: 2249-2593
Metadata
Title
Identification of Devanagari Script from Bilingual Printed Text Documents
Authors
Ranjana S. Zinjore
R. J. Ramteke
Copyright Year
2016
Publisher
Springer India
DOI
https://doi.org/10.1007/978-81-322-2638-3_52