Skip to main content
Top
Published in: Arabian Journal for Science and Engineering 2/2022

20-09-2021 | Research Article-Computer Engineering and Computer Science

A Novel Approach to Printed Arabic Optical Character Recognition

Author: Mansoor A. Al Ghamdi

Published in: Arabian Journal for Science and Engineering | Issue 2/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Optical character recognition (OCR) is widely used in various real-world applications, such as digitizing learning resources, to assist visually impaired people and transform printed resources into electronic media. As far as the Arabic language is concerned, the need to extend digital Arabic content on the Internet has recently motivated researchers to focus on the Arabic text recognition. Despite the important number of works studying the Arabic OCR, the latter still faces numerous challenges due to the special characteristics of the Arabic script. This research aims at developing an effective printed Arabic OCR system. In this work, the implementation of a printed Arabic OCR system is described. It is divided into four stages: pre-processing, feature extraction as well as character segmentation and classification. Unlike other typical Arabic OCR systems, in the developed one, the feature extraction stage is performed prior to the character segmentation stage. In the pre-processing stage, a novel thinning algorithm is applied in order to produce skeletons for the Arabic text images. In the second stage, a new chain code representation technique using an agent-based model for the features extraction from non-dotted Arabic text images is proposed. Relying on the extracted features, a character segmentation technique employed to segment-connected Arabic words into characters is introduced. In the classification stage, the prediction by partial matching (PPM) compression-based method is applied as a classifier to recognize the Arabic text. Experimental evaluation of Arabic OCR systems on a public dataset reveals that the system has an accuracy of 77.3% for paragraph-based text images.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Alginahi, Y.M.: A survey on Arabic character segmentation. Int. J. Doc. Anal. Recogn. 16(2), 105–126 (2013)CrossRef Alginahi, Y.M.: A survey on Arabic character segmentation. Int. J. Doc. Anal. Recogn. 16(2), 105–126 (2013)CrossRef
2.
go back to reference Al-Badr, B.; Mahmoud, S.A.: Survey and bibliography of Arabic optical text recognition. Signal Process. 41(1), 49–77 (1995)CrossRef Al-Badr, B.; Mahmoud, S.A.: Survey and bibliography of Arabic optical text recognition. Signal Process. 41(1), 49–77 (1995)CrossRef
3.
go back to reference Darwish, S.M.; Elzoghaly, K.O.: An enhanced offline printed Arabic OCR model based on bio-inspired fuzzy classifier. IEEE Access 8, 1 (2020)CrossRef Darwish, S.M.; Elzoghaly, K.O.: An enhanced offline printed Arabic OCR model based on bio-inspired fuzzy classifier. IEEE Access 8, 1 (2020)CrossRef
4.
go back to reference Ahmad, I.: Modeling and training options for handwritten arabic text recognition. Technische Universität Dortmund, Dortmund (2016) Ahmad, I.: Modeling and training options for handwritten arabic text recognition. Technische Universität Dortmund, Dortmund (2016)
5.
go back to reference Slimane, F.; Kanoun, S.; Hennebert, J.; Alimi, A.M.; Ingold, R.: A study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution. Pattern Recognit. Lett. 34(2), 209–218 (2013)CrossRef Slimane, F.; Kanoun, S.; Hennebert, J.; Alimi, A.M.; Ingold, R.: A study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution. Pattern Recognit. Lett. 34(2), 209–218 (2013)CrossRef
6.
go back to reference BinAhmed, S.; Naz, S.; Razzak, M.I.; Yusof, R.: Arabic cursive text recognition from natural scene images. Appl. Sci. 9(2), 236 (2019)CrossRef BinAhmed, S.; Naz, S.; Razzak, M.I.; Yusof, R.: Arabic cursive text recognition from natural scene images. Appl. Sci. 9(2), 236 (2019)CrossRef
7.
go back to reference Reul, C.: An intelligent semi-automatic workflow for optical character recognition of historical printings (2020) Reul, C.: An intelligent semi-automatic workflow for optical character recognition of historical printings (2020)
8.
go back to reference Slimane, F., Ingold, R., Hennebert, J.: ICDAR2017 competition on multi-font and multi-size digitally represented Arabic text. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (2018) Slimane, F., Ingold, R., Hennebert, J.: ICDAR2017 competition on multi-font and multi-size digitally represented Arabic text. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (2018)
9.
go back to reference Nasrollahi, S.; Ebrahimi, A.: Printed persian subword recognition using wavelet packet descriptors. J. Eng. (UK) 2013, 1–11 (2013)CrossRef Nasrollahi, S.; Ebrahimi, A.: Printed persian subword recognition using wavelet packet descriptors. J. Eng. (UK) 2013, 1–11 (2013)CrossRef
10.
go back to reference Krayem, A., Sherkat, N., Evett, L., Osman, T.: Holistic arabic whole word recognition using HMM and block-based DCT. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (2013) Krayem, A., Sherkat, N., Evett, L., Osman, T.: Holistic arabic whole word recognition using HMM and block-based DCT. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR (2013)
11.
go back to reference Khorsheed, M.S.: Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK). Pattern Recognit. Lett. 28(12), 1563–1571 (2007)CrossRef Khorsheed, M.S.: Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK). Pattern Recognit. Lett. 28(12), 1563–1571 (2007)CrossRef
12.
go back to reference Khorsheed, M. S.: A lexicon based system with multiple hmms to recognise typewritten and handwritten Arabic words. In: The 17th National Computer Conference, Madinah, Saudi Arabia, pp. 5–8 (2004) Khorsheed, M. S.: A lexicon based system with multiple hmms to recognise typewritten and handwritten Arabic words. In: The 17th National Computer Conference, Madinah, Saudi Arabia, pp. 5–8 (2004)
13.
go back to reference Khorsheed, M. S., Clocksin, W. F.: Multi-font arabic word recognition using spectral features. Proc. Int. Conf. Pattern Recognit. (2000) Khorsheed, M. S., Clocksin, W. F.: Multi-font arabic word recognition using spectral features. Proc. Int. Conf. Pattern Recognit. (2000)
14.
go back to reference Al-Badr, B., Haralick, R. M.: Segmentation-free word recognition with application to Arabic. In: Proceedings of the 3rd International Conference on Document Analaysis Recognition, vol. 1, pp. 355–359 (1995) Al-Badr, B., Haralick, R. M.: Segmentation-free word recognition with application to Arabic. In: Proceedings of the 3rd International Conference on Document Analaysis Recognition, vol. 1, pp. 355–359 (1995)
15.
go back to reference Erlandson, E.J.; Trenkle, J.M.; Vogt, R.C.: Word-level recognition of multifont Arabic text using a feature vector matching approach. Doc. Recogn. III 2660, 63–71 (1996) Erlandson, E.J.; Trenkle, J.M.; Vogt, R.C.: Word-level recognition of multifont Arabic text using a feature vector matching approach. Doc. Recogn. III 2660, 63–71 (1996)
16.
go back to reference Nashwan, F.M.A.; Rashwan, M.A.A.; Al-Barhamtoshy, H.M.; Abdou, S.M.; Moussa, A.M.: A holistic technique for an Arabic OCR system. J. Imag. 4, 1 (2018) Nashwan, F.M.A.; Rashwan, M.A.A.; Al-Barhamtoshy, H.M.; Abdou, S.M.; Moussa, A.M.: A holistic technique for an Arabic OCR system. J. Imag. 4, 1 (2018)
17.
go back to reference Alghamdi, M., Teahan, W.: Printed Arabic script recognition: a survey. Int. J. Adv. Comput. Sci. Appl. (2018) Alghamdi, M., Teahan, W.: Printed Arabic script recognition: a survey. Int. J. Adv. Comput. Sci. Appl. (2018)
18.
go back to reference Nashwan, F.; Rashwan, M.; Al-Barhamtoshy, H.; Abdou, S.; Moussa, A.: A holistic technique for an Arabic OCR system. J. Imag. 4(1), 6 (2017)CrossRef Nashwan, F.; Rashwan, M.; Al-Barhamtoshy, H.; Abdou, S.; Moussa, A.: A holistic technique for an Arabic OCR system. J. Imag. 4(1), 6 (2017)CrossRef
19.
20.
go back to reference Tamen, Z., Drias, H.: How to overcome some segmentation problems in a constrained handwritten Arabic character recognition system. In: Proceedings of the 10th International Conference on Information Sciences, Signal Processing and their Applications, ISSPA 2010, pp. 634–637 (2010) Tamen, Z., Drias, H.: How to overcome some segmentation problems in a constrained handwritten Arabic character recognition system. In: Proceedings of the 10th International Conference on Information Sciences, Signal Processing and their Applications, ISSPA 2010, pp. 634–637 (2010)
21.
go back to reference Younis, K.S.; Alkhateeb, A.A.: A new implementation of deep neural networks for optical character recognition and face recognition. Proc. New Trends Inf. Technol. Jordan 25, 157–162 (2017) Younis, K.S.; Alkhateeb, A.A.: A new implementation of deep neural networks for optical character recognition and face recognition. Proc. New Trends Inf. Technol. Jordan 25, 157–162 (2017)
22.
go back to reference Radwan, M.A.; Khalil, M.I.; Abbas, H.M.: Neural networks pipeline for offline machine printed Arabic OCR. Neural Process. Lett. 48, 769–787 (2018)CrossRef Radwan, M.A.; Khalil, M.I.; Abbas, H.M.: Neural networks pipeline for offline machine printed Arabic OCR. Neural Process. Lett. 48, 769–787 (2018)CrossRef
23.
go back to reference Ko, D.; Lee, C.; Han, D.; Ohk, H.; Kang, K.; Han, S.: Approach for machine-printed Arabic character recognition: the-state-of-the-art deep-learning method. Electron. Imaging 2018(2), 1–8 (2018)CrossRef Ko, D.; Lee, C.; Han, D.; Ohk, H.; Kang, K.; Han, S.: Approach for machine-printed Arabic character recognition: the-state-of-the-art deep-learning method. Electron. Imaging 2018(2), 1–8 (2018)CrossRef
24.
go back to reference Rashid, S. F., Schambach, M.-P., Rottland, J., von der Nüll, S.: Low resolution Arabic recognition with multidimensional recurrent neural networks (2013) Rashid, S. F., Schambach, M.-P., Rottland, J., von der Nüll, S.: Low resolution Arabic recognition with multidimensional recurrent neural networks (2013)
25.
go back to reference Rahman, A. F. R., Fairhurst, M. C.: Multiple classifier decision combination strategies for character recognition: a review. Int. J. Doc. Anal. Recognit. (2003) Rahman, A. F. R., Fairhurst, M. C.: Multiple classifier decision combination strategies for character recognition: a review. Int. J. Doc. Anal. Recognit. (2003)
26.
go back to reference Saad, M., Ashour, W.: OSAC: open source arabic corpora. In: Proceedings of the 6th International Conference on Electronics Computer System (EECS’10), Nov 25–26, 2010, Lefke, Cyprus, pp. 118–123 (2010) Saad, M., Ashour, W.: OSAC: open source arabic corpora. In: Proceedings of the 6th International Conference on Electronics Computer System (EECS’10), Nov 25–26, 2010, Lefke, Cyprus, pp. 118–123 (2010)
29.
go back to reference Alghamdi, M.A.; Teahan, W.J.: A new thinning Algorithm for Arabic script. Int. J. Comput. Sci. Inf. Sec. (IJCSIS) 15(1), 204–211 (2017) Alghamdi, M.A.; Teahan, W.J.: A new thinning Algorithm for Arabic script. Int. J. Comput. Sci. Inf. Sec. (IJCSIS) 15(1), 204–211 (2017)
30.
go back to reference Zhang, T.Y.; Suen, C.Y.: A fast parallel algorithm for thinning digital patterns. Commun. ACM 27(3), 236–239 (1984)CrossRef Zhang, T.Y.; Suen, C.Y.: A fast parallel algorithm for thinning digital patterns. Commun. ACM 27(3), 236–239 (1984)CrossRef
31.
go back to reference Hilitch, C. J.: Linear skeletons from square cupboards. In: B. Meltzer and D. Michie, (Eds) Machine intelligence, vol 4, Edinburgh University Press, Edinburgh, p. 403 (1969) Hilitch, C. J.: Linear skeletons from square cupboards. In: B. Meltzer and D. Michie, (Eds) Machine intelligence, vol 4, Edinburgh University Press, Edinburgh, p. 403 (1969)
32.
go back to reference Rashid, S. F.: Optical character recognition: a combined ANN/HMM approach (2014) Rashid, S. F.: Optical character recognition: a combined ANN/HMM approach (2014)
33.
go back to reference Pi, Y., Liao, W., Liu, M., Lu, J.: Theory of cognitive pattern recognition. Pattern Recogn. Tech. 4 (2008). Pi, Y., Liao, W., Liu, M., Lu, J.: Theory of cognitive pattern recognition. Pattern Recogn. Tech. 4 (2008).
34.
go back to reference Bobik, J.; Sayre, K.M.: Pattern recognition mechanisms and St. Thomas’ theory of abstraction. Rev. Philos. Louv. 61, 24–43 (1963) Bobik, J.; Sayre, K.M.: Pattern recognition mechanisms and St. Thomas’ theory of abstraction. Rev. Philos. Louv. 61, 24–43 (1963)
35.
go back to reference Kandel, S.; Orliaguet, J.P.; Viviani, P.: Perceptual anticipation in handwriting: The role of implicit motor competence. Percept. Psychophys. 62(4), 706–716 (2000)CrossRef Kandel, S.; Orliaguet, J.P.; Viviani, P.: Perceptual anticipation in handwriting: The role of implicit motor competence. Percept. Psychophys. 62(4), 706–716 (2000)CrossRef
36.
go back to reference Tse, P.U.; Cavanagh, P.: Chinese and Americans see opposite apparent motions in a Chinese character. Cognition 74, 3 (2000)CrossRef Tse, P.U.; Cavanagh, P.: Chinese and Americans see opposite apparent motions in a Chinese character. Cognition 74, 3 (2000)CrossRef
37.
go back to reference Vinter, A.; Chartrel, E.: Effects of different types of learning on handwriting movements in young children. Learn. Instr. 20(6), 476–486 (2010)CrossRef Vinter, A.; Chartrel, E.: Effects of different types of learning on handwriting movements in young children. Learn. Instr. 20(6), 476–486 (2010)CrossRef
38.
go back to reference Freeman, H.: On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. 10(2), 260–268 (1961)MathSciNetCrossRef Freeman, H.: On the encoding of arbitrary geometric configurations. IRE Trans. Electron. Comput. 10(2), 260–268 (1961)MathSciNetCrossRef
39.
go back to reference Nixon, M. S., Aguado, A. S.: Feature extraction and image processing. Academic Press, New York, p. 88 (2008) Nixon, M. S., Aguado, A. S.: Feature extraction and image processing. Academic Press, New York, p. 88 (2008)
40.
go back to reference Kocyigit, P.: Agent based optical character recognition. Bangor (2012) Kocyigit, P.: Agent based optical character recognition. Bangor (2012)
41.
go back to reference Zeki, A.M.; Zakaria, M.S.; Liong, C.-Y.: Segmentation of Arabic characters: a comprehensive survey. Int. J. Technol. Diffus. 2(4), 48–82 (2011)CrossRef Zeki, A.M.; Zakaria, M.S.; Liong, C.-Y.: Segmentation of Arabic characters: a comprehensive survey. Int. J. Technol. Diffus. 2(4), 48–82 (2011)CrossRef
42.
go back to reference Gouda, A. M., Rashwan, M. A.: Segmentation of connected arabic characters using hidden markov models. In: Proceedings of the 2004 IEEE international conference on computational intelligence for measurements systems and applications, CIMSA (2004) Gouda, A. M., Rashwan, M. A.: Segmentation of connected arabic characters using hidden markov models. In: Proceedings of the 2004 IEEE international conference on computational intelligence for measurements systems and applications, CIMSA (2004)
43.
go back to reference Alkhazi, I. S., Alghamdi, M., Teahan, W. J.: Tag based models for Arabic text compression. In: Proceedings of the Intelligent System Conference, pp. 697–705 (2017) Alkhazi, I. S., Alghamdi, M., Teahan, W. J.: Tag based models for Arabic text compression. In: Proceedings of the Intelligent System Conference, pp. 697–705 (2017)
44.
go back to reference Teahan, W.: A compression-based toolkit for modelling and processing natural language text. Information 9(12), 294 (2018)CrossRef Teahan, W.: A compression-based toolkit for modelling and processing natural language text. Information 9(12), 294 (2018)CrossRef
45.
go back to reference Teahan, W. J., Harper, D. J.: Using compression-based language models for text categorization. Lang. Model. Inf. Retr. (2003) Teahan, W. J., Harper, D. J.: Using compression-based language models for text categorization. Lang. Model. Inf. Retr. (2003)
46.
go back to reference Altamimi, M., Teahan, W. J.: Gender and authorship categorisation of arabic text from twitter using PPM. Int. J. Comput. Sci. Inf. Technol. (2017) Altamimi, M., Teahan, W. J.: Gender and authorship categorisation of arabic text from twitter using PPM. Int. J. Comput. Sci. Inf. Technol. (2017)
47.
go back to reference Almahdawi, A., Teahan, W. J.: Emotion recognition in text using ppm. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (2017) Almahdawi, A., Teahan, W. J.: Emotion recognition in text using ppm. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) (2017)
48.
go back to reference Alkhazi, I. S., Teahan, W. J.: Classifying and segmenting classical and modern standard Arabic using minimum cross-entropy. Int. J. Adv. Comput. Sci. Appl. (2017) Alkhazi, I. S., Teahan, W. J.: Classifying and segmenting classical and modern standard Arabic using minimum cross-entropy. Int. J. Adv. Comput. Sci. Appl. (2017)
49.
go back to reference Alkholy, M.D.A.-Z.: Arabic optical character recognition using local invariant features. Menoufia University, Faculty of Computers and Information (2016) Alkholy, M.D.A.-Z.: Arabic optical character recognition using local invariant features. Menoufia University, Faculty of Computers and Information (2016)
50.
go back to reference AbdelRaouf, A. M.: Offline printed Arabic character recognition (2012) AbdelRaouf, A. M.: Offline printed Arabic character recognition (2012)
51.
go back to reference Luqman, H.; Mahmoud, S.A.; Awaida, S.: KAFD Arabic font database. Pattern Recognit. 47(6), 2231–2240 (2014)CrossRef Luqman, H.; Mahmoud, S.A.; Awaida, S.: KAFD Arabic font database. Pattern Recognit. 47(6), 2231–2240 (2014)CrossRef
52.
go back to reference Alghamdi, M.A.; Teahan, W.J.: Experimental evaluation of Arabic OCR systems. PSU Res. Rev. 1(3), 229–241 (2017)CrossRef Alghamdi, M.A.; Teahan, W.J.: Experimental evaluation of Arabic OCR systems. PSU Res. Rev. 1(3), 229–241 (2017)CrossRef
Metadata
Title
A Novel Approach to Printed Arabic Optical Character Recognition
Author
Mansoor A. Al Ghamdi
Publication date
20-09-2021
Publisher
Springer Berlin Heidelberg
Published in
Arabian Journal for Science and Engineering / Issue 2/2022
Print ISSN: 2193-567X
Electronic ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-021-06163-9

Other articles of this Issue 2/2022

Arabian Journal for Science and Engineering 2/2022 Go to the issue

Research Article-Computer Engineering and Computer Science

Detection of Turkish Fake News in Twitter with Machine Learning Algorithms

Research Article-Computer Engineering and Computer Science

A Framework for Video Popularity Forecast Utilizing Metaheuristic Algorithms

Premium Partners