Skip to main content
Erschienen in: Neural Computing and Applications 22/2021

07.06.2021 | Original Article

UrduDeepNet: offline handwritten Urdu character recognition using deep neural network

verfasst von: Faisel Mushtaq, Muzafar Mehraj Misgar, Munish Kumar, Surinder Singh Khurana

Erschienen in: Neural Computing and Applications | Ausgabe 22/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Handwritten Urdu character recognition system faces several challenges including the writer-dependent variations and non-availability of benchmark databases for cursive writing scripts. In this study, we propose a handwritten Urdu character dataset for Nasta’liq writing style covering isolated, positional characters as well as numerals. We also propose a convolutional neural network (CNN) architecture for the recognition of handwritten Urdu characters and numerals. CNN is a novel technique for image recognition that does not need explicit feature engineering and extraction and produces efficient results as compared to standard handcrafted feature extraction approaches. The proposed system was trained on a training dataset of 74, 285 samples and evaluated on a test dataset of 21, 223 samples and achieved a recognition rate of 98.82% for 133 classes, outperforming the results of all state-of-the-art systems for the Urdu language.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ahmad I, Wang X, Hao Mao Y, Liu G, Ahmad H, Ullah R (2018) Ligature based Urdu Nastaleeq sentence recognition using gated bidirectional long short term memory. Clust Comput 21(1):703–714CrossRef Ahmad I, Wang X, Hao Mao Y, Liu G, Ahmad H, Ullah R (2018) Ligature based Urdu Nastaleeq sentence recognition using gated bidirectional long short term memory. Clust Comput 21(1):703–714CrossRef
2.
Zurück zum Zitat Ahmad Z, Orakzai JK, Shamsher I, & Adnan A (2007) Urdu Nastaleeq optical character recognition. Paper presented at the Proceedings of world academy of science, engineering and technology, pp. 2380–2383 Ahmad Z, Orakzai JK, Shamsher I, & Adnan A (2007) Urdu Nastaleeq optical character recognition. Paper presented at the Proceedings of world academy of science, engineering and technology, pp. 2380–2383
3.
Zurück zum Zitat Ahmed SB, Naz S, Swati S, Razzak I, Umar AI, Khan AA (2017) UCOM offline dataset-an Urdu handwritten dataset generation. Int Arab J Inf Technol (IAJIT) 14(2):239–245 Ahmed SB, Naz S, Swati S, Razzak I, Umar AI, Khan AA (2017) UCOM offline dataset-an Urdu handwritten dataset generation. Int Arab J Inf Technol (IAJIT) 14(2):239–245
4.
Zurück zum Zitat Ahmed Z, Iqbal K, Mehmood I, Ayub MA (2017). Ligature analysis-based Urdu OCR framework. Paper presented at the 2017 International Conference on Frontiers of Information Technology (FIT), pp. 87–92 Ahmed Z, Iqbal K, Mehmood I, Ayub MA (2017). Ligature analysis-based Urdu OCR framework. Paper presented at the 2017 International Conference on Frontiers of Information Technology (FIT), pp. 87–92
5.
Zurück zum Zitat Akram QUA, Hussain S (2019) Improving Urdu recognition using character-based artistic features of nastalique calligraphy. IEEE Access 7:8495–8507CrossRef Akram QUA, Hussain S (2019) Improving Urdu recognition using character-based artistic features of nastalique calligraphy. IEEE Access 7:8495–8507CrossRef
6.
Zurück zum Zitat Al-Rashaideh H (2006) Preprocessing phase for Arabic word handwritten recognition. Inf Process (Russian) 6(1):11–19 Al-Rashaideh H (2006) Preprocessing phase for Arabic word handwritten recognition. Inf Process (Russian) 6(1):11–19
7.
Zurück zum Zitat Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Asari VK (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3):292CrossRef Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Asari VK (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3):292CrossRef
8.
Zurück zum Zitat Arica N, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans Syst Man Cybern Part C Appl Rev 31(2):216–233CrossRef Arica N, Yarman-Vural FT (2001) An overview of character recognition focused on off-line handwriting. IEEE Trans Syst Man Cybern Part C Appl Rev 31(2):216–233CrossRef
9.
Zurück zum Zitat Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47(3):279–311CrossRef Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47(3):279–311CrossRef
10.
Zurück zum Zitat Din IU, Malik Z, Siddiqi I, Khalid S (2016) Line and ligature segmentation in printed Urdu document images. J Appl Environ Biol Sci 6(3):114–120 Din IU, Malik Z, Siddiqi I, Khalid S (2016) Line and ligature segmentation in printed Urdu document images. J Appl Environ Biol Sci 6(3):114–120
11.
Zurück zum Zitat Din IU, Siddiqi I, Khalid S, Azam T (2017) Segmentation-free optical character recognition for printed Urdu text. EURASIP J Image Video Process 2017(1):62CrossRef Din IU, Siddiqi I, Khalid S, Azam T (2017) Segmentation-free optical character recognition for printed Urdu text. EURASIP J Image Video Process 2017(1):62CrossRef
12.
Zurück zum Zitat Farooq F, Govindaraju V, Perrone M (2005) Pre-processing methods for handwritten Arabic documents. Paper presented at the Eighth International Conference on Document Analysis and Recognition (ICDAR'05). pp. 1–5 Farooq F, Govindaraju V, Perrone M (2005) Pre-processing methods for handwritten Arabic documents. Paper presented at the Eighth International Conference on Document Analysis and Recognition (ICDAR'05). pp. 1–5
13.
Zurück zum Zitat Jain M, Mathew M, & Jawahar C (2017) Unconstrained ocr for urdu using deep cnn-rnn hybrid networks. Paper presented at the 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR) Jain M, Mathew M, & Jawahar C (2017) Unconstrained ocr for urdu using deep cnn-rnn hybrid networks. Paper presented at the 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)
14.
Zurück zum Zitat Javed N, Shabbir S, Siddiqi I, Khurshid K (2017) Classification of Urdu ligatures using convolutional neural networks-a novel approach. Paper presented at the 2017 International Conference on Frontiers of Information Technology (FIT). pp. 93–97 Javed N, Shabbir S, Siddiqi I, Khurshid K (2017) Classification of Urdu ligatures using convolutional neural networks-a novel approach. Paper presented at the 2017 International Conference on Frontiers of Information Technology (FIT). pp. 93–97
15.
Zurück zum Zitat Javed ST, Fasihi MM, Khan A, Ashraf U (2017) Background and punch-hole noise removal from handwritten urdu text. Paper presented at the 2017 International Multi-topic Conference (INMIC). pp. 1–6 Javed ST, Fasihi MM, Khan A, Ashraf U (2017) Background and punch-hole noise removal from handwritten urdu text. Paper presented at the 2017 International Multi-topic Conference (INMIC). pp. 1–6
16.
Zurück zum Zitat Javed ST, Hussain S, Maqbool A, Asloob S, Jamil S, Moin H (2010) Segmentation free nastalique urdu ocr. World Academy Sci Eng Technol 46:456–461 Javed ST, Hussain S, Maqbool A, Asloob S, Jamil S, Moin H (2010) Segmentation free nastalique urdu ocr. World Academy Sci Eng Technol 46:456–461
17.
Zurück zum Zitat Kadhm MS, Abdul APDAK (2015) Handwriting word recognition based on SVM classifier. Int J Adv Comput Sci Appl 1:64–68 Kadhm MS, Abdul APDAK (2015) Handwriting word recognition based on SVM classifier. Int J Adv Comput Sci Appl 1:64–68
18.
Zurück zum Zitat Kaushal DS, Khan Y, Varma DS (2014) Handwritten Urdu character recognition using Zernike MI’s feature extraction and support vector machine classifier. Int J Res 1(7):1084–1089 Kaushal DS, Khan Y, Varma DS (2014) Handwritten Urdu character recognition using Zernike MI’s feature extraction and support vector machine classifier. Int J Res 1(7):1084–1089
19.
Zurück zum Zitat Khan K, Khan RU, Alkhalifah A, Ahmad N (2015). Urdu text classification using decision trees. Paper presented at the 2015 12th International Conference on High-capacity Optical Networks and Enabling/Emerging Technologies (HONET). pp. 56–59 Khan K, Khan RU, Alkhalifah A, Ahmad N (2015). Urdu text classification using decision trees. Paper presented at the 2015 12th International Conference on High-capacity Optical Networks and Enabling/Emerging Technologies (HONET). pp. 56–59
20.
Zurück zum Zitat Khan NH, Adnan A, Basar S (2018) Urdu ligature recognition using multi-level agglomerative hierarchical clustering. Clust Comput 21(1):503–514CrossRef Khan NH, Adnan A, Basar S (2018) Urdu ligature recognition using multi-level agglomerative hierarchical clustering. Clust Comput 21(1):503–514CrossRef
21.
Zurück zum Zitat Khan SN, Khan K, Khan A, Khan A, Khan AU, Ullah B (2018) Urdu word segmentation using machine learning approaches. Int J Adv Comput Sci Appl 9(6):193–200 Khan SN, Khan K, Khan A, Khan A, Khan AU, Ullah B (2018) Urdu word segmentation using machine learning approaches. Int J Adv Comput Sci Appl 9(6):193–200
22.
Zurück zum Zitat Kumar G, Bhatia PK, Banger I (2013) Analytical review of preprocessing techniques for offline handwritten character recognition. Int J Adv Eng Sci 3(3):14–22 Kumar G, Bhatia PK, Banger I (2013) Analytical review of preprocessing techniques for offline handwritten character recognition. Int J Adv Eng Sci 3(3):14–22
23.
Zurück zum Zitat Latif G, Alghazo J, Alzubaidi L, Naseer MM, Alghazo Y (2018) Deep Convolutional Neural Network for Recognition of Unified Multi-Language Handwritten Numerals. Paper presented at the 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), 90–95 Latif G, Alghazo J, Alzubaidi L, Naseer MM, Alghazo Y (2018) Deep Convolutional Neural Network for Recognition of Unified Multi-Language Handwritten Numerals. Paper presented at the 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), 90–95
24.
Zurück zum Zitat Mahmood A (2013) Arabic and Urdu text segmentation challenges and techniques. Int J Comput Sci Technol 4:32–34 Mahmood A (2013) Arabic and Urdu text segmentation challenges and techniques. Int J Comput Sci Technol 4:32–34
25.
Zurück zum Zitat Muaz A (2010) Urdu optical character recognition system MS thesis. National University of Computer and Emerging Sciences, Lahore Pakistan Muaz A (2010) Urdu optical character recognition system MS thesis. National University of Computer and Emerging Sciences, Lahore Pakistan
26.
Zurück zum Zitat Nautiyal CT, Singh S, Rana US (2017) Noisy Character Recognition. Global J Pure Appl Math 13(6):1875–1892 Nautiyal CT, Singh S, Rana US (2017) Noisy Character Recognition. Global J Pure Appl Math 13(6):1875–1892
27.
Zurück zum Zitat Naz S, Ahmed S, Ahmad R, Razza M (2015) Arabic script based digit recognition systems. Paper presented at the International Conference on Recent Advances in Computer Systems, pp. 67–73 Naz S, Ahmed S, Ahmad R, Razza M (2015) Arabic script based digit recognition systems. Paper presented at the International Conference on Recent Advances in Computer Systems, pp. 67–73
28.
Zurück zum Zitat Naz S, Ahmed SB, Ahmad R, Razzak MI (2016) Zoning features and 2DLSTM for Urdu text-line recognition. Procedia Comput Sci 96:16–22CrossRef Naz S, Ahmed SB, Ahmad R, Razzak MI (2016) Zoning features and 2DLSTM for Urdu text-line recognition. Procedia Comput Sci 96:16–22CrossRef
29.
Zurück zum Zitat Naz S, Hayat K, Razzak MI, Anwar MW, Madani SA, Khan SU (2014) The optical character recognition of Urdu-like cursive scripts. Pattern Recogn 47(3):1229–1248CrossRef Naz S, Hayat K, Razzak MI, Anwar MW, Madani SA, Khan SU (2014) The optical character recognition of Urdu-like cursive scripts. Pattern Recogn 47(3):1229–1248CrossRef
30.
Zurück zum Zitat Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Siddiqi I, Razzak MI (2016) Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing 177:228–241CrossRef Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Siddiqi I, Razzak MI (2016) Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing 177:228–241CrossRef
31.
Zurück zum Zitat Naz S, Umar AI, Ahmad R, Siddiqi I, Ahmed SB, Razzak MI, Shafait F (2017) Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243:80–87CrossRef Naz S, Umar AI, Ahmad R, Siddiqi I, Ahmed SB, Razzak MI, Shafait F (2017) Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243:80–87CrossRef
32.
Zurück zum Zitat Pal U, Sarkar A (2003) Recognition of printed Urdu script. Paper presented at the Seventh International Conference on Document Analysis and Recognition, 2003, pp. 1–5 Pal U, Sarkar A (2003) Recognition of printed Urdu script. Paper presented at the Seventh International Conference on Document Analysis and Recognition, 2003, pp. 1–5
33.
Zurück zum Zitat Raza AA, Habib A, Ashraf J, Javed M (2017) A review on Urdu language parsing. Int J Adv Comput Sci Appl 8(4):93–97 Raza AA, Habib A, Ashraf J, Javed M (2017) A review on Urdu language parsing. Int J Adv Comput Sci Appl 8(4):93–97
34.
Zurück zum Zitat Rizvi SS, Sagheer A, Adnan K, Muhammad A (2019) Optical character recognition system for Nastalique Urdu-like script languages using supervised learning. Int J Pattern Recogn Artif Intell 33:1953004CrossRef Rizvi SS, Sagheer A, Adnan K, Muhammad A (2019) Optical character recognition system for Nastalique Urdu-like script languages using supervised learning. Int J Pattern Recogn Artif Intell 33:1953004CrossRef
35.
Zurück zum Zitat Sardar S, Wahab, A (2010) Optical character recognition system for Urdu. Paper presented at the 2010 International Conference on Information and Emerging Technologies, pp. 1–5 Sardar S, Wahab, A (2010) Optical character recognition system for Urdu. Paper presented at the 2010 International Conference on Information and Emerging Technologies, pp. 1–5
36.
Zurück zum Zitat Sattar SA (2009) A Technique for the Design and Implementation of an OCR for Printed Nastalique Text. NED University of Engineering and Technology Karachi, Sindh Pakistan Sattar SA (2009) A Technique for the Design and Implementation of an OCR for Printed Nastalique Text. NED University of Engineering and Technology Karachi, Sindh Pakistan
37.
Zurück zum Zitat Shafait F, Keysers D, Breuel T (2008) Efficient implementation of local adaptive thresholding techniques using integral images. Electron Imaging Int Soc Optics Photonics 6815:681510–681510 Shafait F, Keysers D, Breuel T (2008) Efficient implementation of local adaptive thresholding techniques using integral images. Electron Imaging Int Soc Optics Photonics 6815:681510–681510
38.
Zurück zum Zitat Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065CrossRef Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065CrossRef
39.
Zurück zum Zitat Singh D, Khan MA, Bansal A, Bansal N (2015) An application of SVM in character recognition with chain code. Paper presented at the 2015 Communication, Control and Intelligent Systems (CCIS). pp. 1–5 Singh D, Khan MA, Bansal A, Bansal N (2015) An application of SVM in character recognition with chain code. Paper presented at the 2015 Communication, Control and Intelligent Systems (CCIS). pp. 1–5
40.
Zurück zum Zitat Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4):611–629CrossRef Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9(4):611–629CrossRef
Metadaten
Titel
UrduDeepNet: offline handwritten Urdu character recognition using deep neural network
verfasst von
Faisel Mushtaq
Muzafar Mehraj Misgar
Munish Kumar
Surinder Singh Khurana
Publikationsdatum
07.06.2021
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 22/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-021-06144-x

Weitere Artikel der Ausgabe 22/2021

Neural Computing and Applications 22/2021 Zur Ausgabe

Premium Partner