Skip to main content
Top

Hint

Swipe to navigate through the articles of this issue

Published in: Neural Computing and Applications 34/2023

22-09-2023 | Original Article

Computationally efficient recognition of unconstrained handwritten Urdu script using BERT with vision transformers

Authors: Aejaz Farooq Ganai, Farida Khursheed

Published in: Neural Computing and Applications | Issue 34/2023

Log in

Abstract

The handwritten Urdu text recognition is a challenging area in pattern recognition and has gained much importance after the rapid emergence of several camera-based applications on portable devices, which facilitate the daily processing of plenty of images. The various challenges encountered in handwritten Urdu recognition are writer-dependent variations amongst different Urdu writers, irregular positioning of diacritics associated with a character, context sensitivity of characters, and cursive nature of Urdu script. These challenges also make it difficult to formulate a large generalized handwritten Urdu dataset. The state-of-the-art approaches proposed for the recognition of handwritten Urdu text mostly focus on implicit approaches. These approaches are error prone and do not yield significant recognition rates. The holistic approach of handwritten Urdu recognition has been least explored to date and the existing holistic approaches are complex and time consuming as they mostly rely on convolutional/recurrent neural networks or statistical methods. Hence, in this research, a novel and efficient vision transformer-based methodology using BERT architecture has been proposed to the recognition of handwritten Urdu text. The proposed approach uses convolution feature maps as word embedding in the transformer that makes full use of the powerful attention mechanism of the vision transformer to focus on a particular connected component (ligature) in handwritten Urdu text. To cover the entire Urdu corpus, we have pre-trained several benchmark handwritten Urdu datasets such as UNHD and NUST-UHWR and tested unconstrained handwritten Urdu text. In comparison with the state-of-the-art techniques, the experimental evaluation of the proposed approach reports the better results of the various performance parameters such as Ligature Error Rate (LER), precision, sensitivity, specificity, f1-score, and accuracy. The great success of the proposed approach lies in (i) the significant reduction of training time needed to train a large handwritten Urdu dataset, (ii) minimum computational complexity as there is no overhead of diacritic separation and re-association as used in most of the state-of-the-art techniques, and (iii) the proposed approach registers a new state-of-the-art LER of up to 3% only on unconstrained handwritten Urdu text.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Naz S, Umar AI, Shirazi SH, Khan SA, Ahmed I, Khan AA (2014) Challenges of urdu named entity recognition: a scarce resourced language. Res J Appl Sci Eng Technol 8(10):1272–1278 CrossRef Naz S, Umar AI, Shirazi SH, Khan SA, Ahmed I, Khan AA (2014) Challenges of urdu named entity recognition: a scarce resourced language. Res J Appl Sci Eng Technol 8(10):1272–1278 CrossRef
2.
go back to reference Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47:279–311 CrossRef Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47:279–311 CrossRef
3.
go back to reference Khan NH, Adnan A (2018) Urdu optical character recognition systems: Present contributions and future directions. IEEE Access 6:46019–46046 CrossRef Khan NH, Adnan A (2018) Urdu optical character recognition systems: Present contributions and future directions. IEEE Access 6:46019–46046 CrossRef
4.
go back to reference Satti DA, Saleem K (2012) Complexities and implementation challenges in offline urdu nastaliq ocr. In: Proceedings of the conference on language and technology, pp 85–91 Satti DA, Saleem K (2012) Complexities and implementation challenges in offline urdu nastaliq ocr. In: Proceedings of the conference on language and technology, pp 85–91
5.
go back to reference Ahmed SB, Naz S, Swati S, Razzak MI (2019) Handwritten urdu character recognition using one-dimensional blstm classifier. Neural Comput Appl 31:1143–1151 CrossRef Ahmed SB, Naz S, Swati S, Razzak MI (2019) Handwritten urdu character recognition using one-dimensional blstm classifier. Neural Comput Appl 31:1143–1151 CrossRef
6.
go back to reference ul Sehr Zia N, Naeem MF, Raza SMK, Khan MM, Ul-Hasan A, Shafait F (2022) A convolutional recursive deep architecture for unconstrained urdu handwriting recognition. Neural Comput Appl, pp 1–14 ul Sehr Zia N, Naeem MF, Raza SMK, Khan MM, Ul-Hasan A, Shafait F (2022) A convolutional recursive deep architecture for unconstrained urdu handwriting recognition. Neural Comput Appl, pp 1–14
7.
go back to reference Naz S, Umar AI, Shirazi SH, Ahmed SB, Razzak MI, Siddiqi I (2016) Segmentation techniques for recognition of arabic-like scripts: a comprehensive survey. Educ Inf Technol 21:1225–1241 CrossRef Naz S, Umar AI, Shirazi SH, Ahmed SB, Razzak MI, Siddiqi I (2016) Segmentation techniques for recognition of arabic-like scripts: a comprehensive survey. Educ Inf Technol 21:1225–1241 CrossRef
8.
go back to reference Ganai AF, Khursheed F (2022) A novel holistic unconstrained handwritten urdu recognition system using convolutional neural networks. Int J Document Anal Recogn (IJDAR) 25(4):351–371 CrossRef Ganai AF, Khursheed F (2022) A novel holistic unconstrained handwritten urdu recognition system using convolutional neural networks. Int J Document Anal Recogn (IJDAR) 25(4):351–371 CrossRef
9.
go back to reference Ganai AF, Khursheed F (2023) Computationally efficient holistic approach for handwritten urdu recognition using lrcn model. Int J Intell Syst Appl Eng, 11(4s):536–551 Ganai AF, Khursheed F (2023) Computationally efficient holistic approach for handwritten urdu recognition using lrcn model. Int J Intell Syst Appl Eng, 11(4s):536–551
10.
go back to reference Ahmed SB, Hameed IA, Naz S, Razzak MI, Yusof R (2019) Evaluation of handwritten urdu text by integration of mnist dataset learning experience. IEEE Access 7:153566–153578 CrossRef Ahmed SB, Hameed IA, Naz S, Razzak MI, Yusof R (2019) Evaluation of handwritten urdu text by integration of mnist dataset learning experience. IEEE Access 7:153566–153578 CrossRef
11.
go back to reference Schaefer AM, Udluft S, Zimmermann H-G (2008) Learning long-term dependencies with recurrent neural networks. Neurocomputing 71(13–15):2481–2488 CrossRef Schaefer AM, Udluft S, Zimmermann H-G (2008) Learning long-term dependencies with recurrent neural networks. Neurocomputing 71(13–15):2481–2488 CrossRef
13.
go back to reference Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805
14.
go back to reference Floridi L, Chiriatti M (2020) Gpt-3: its nature, scope, limits, and consequences. Minds Mach 30:681–694 CrossRef Floridi L, Chiriatti M (2020) Gpt-3: its nature, scope, limits, and consequences. Minds Mach 30:681–694 CrossRef
16.
go back to reference Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst, 30 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst, 30
18.
go back to reference Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2020) A comprehensive survey on transfer learning. Proc IEEE 109(1):43–76 CrossRef Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2020) A comprehensive survey on transfer learning. Proc IEEE 109(1):43–76 CrossRef
19.
go back to reference Khan L, Amjad A, Ashraf N, Chang H-T (2022) Multi-class sentiment analysis of urdu text using multilingual bert. Sci Rep 12(1):5436 CrossRef Khan L, Amjad A, Ashraf N, Chang H-T (2022) Multi-class sentiment analysis of urdu text using multilingual bert. Sci Rep 12(1):5436 CrossRef
20.
go back to reference Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:​2010.​11929 Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:​2010.​11929
21.
go back to reference Ramchoun H, Ghanou Y, Ettaouil M, Janati Idrissi MA (2016) Multilayer perceptron: architecture optimization and training Ramchoun H, Ghanou Y, Ettaouil M, Janati Idrissi MA (2016) Multilayer perceptron: architecture optimization and training
22.
go back to reference DeMers D, Cottrell G (1992) Non-linear dimensionality reduction. Adv Neural Inf Process Syst, 5 DeMers D, Cottrell G (1992) Non-linear dimensionality reduction. Adv Neural Inf Process Syst, 5
23.
go back to reference Likhomanenko T, Xu Q, Synnaeve G, Collobert R, Rogozhnikov, (2021) A Cape: Encoding relative positions with continuous augmented positional embeddings. Adv Neural Inf Process Syst 34:16079–16092 Likhomanenko T, Xu Q, Synnaeve G, Collobert R, Rogozhnikov, (2021) A Cape: Encoding relative positions with continuous augmented positional embeddings. Adv Neural Inf Process Syst 34:16079–16092
24.
go back to reference Reyes AK, Caicedo JC, Camargo JE (2015) Fine-tuning deep convolutional networks for plant recognition. CLEF (Working Notes) 1391:467–475 Reyes AK, Caicedo JC, Camargo JE (2015) Fine-tuning deep convolutional networks for plant recognition. CLEF (Working Notes) 1391:467–475
25.
go back to reference Bin Ahmed S, Naz S, Swati S, Razzak I, Umar AI, Ali Khan A (2017) Ucom offline dataset-an urdu handwritten dataset generation Bin Ahmed S, Naz S, Swati S, Razzak I, Umar AI, Ali Khan A (2017) Ucom offline dataset-an urdu handwritten dataset generation
26.
go back to reference Husnain M, Saad Missen MM, Mumtaz S, Jhanidr MZ, Coustaty M, Muzzamil Luqman M, Ogier J-M, Sang Choi G (2019) Recognition of urdu handwritten characters using convolutional neural network. Appl Sci 9(13):2758 CrossRef Husnain M, Saad Missen MM, Mumtaz S, Jhanidr MZ, Coustaty M, Muzzamil Luqman M, Ogier J-M, Sang Choi G (2019) Recognition of urdu handwritten characters using convolutional neural network. Appl Sci 9(13):2758 CrossRef
27.
go back to reference Hassan S, Irfan A, Mirza A, Siddiqi I (2019) Cursive handwritten text recognition using bi-directional lstms: a case study on urdu handwriting. In: 2019 International conference on deep learning and machine learning in emerging applications (Deep-ML), IEEE, pp 67–72 Hassan S, Irfan A, Mirza A, Siddiqi I (2019) Cursive handwritten text recognition using bi-directional lstms: a case study on urdu handwriting. In: 2019 International conference on deep learning and machine learning in emerging applications (Deep-ML), IEEE, pp 67–72
28.
go back to reference Pauls A, Klein D (2011) Faster and smaller n-gram language models. In: Proceedings of the 49th annual meeting of the association for computational linguistics. Human Lang Technol, pp 258–267 Pauls A, Klein D (2011) Faster and smaller n-gram language models. In: Proceedings of the 49th annual meeting of the association for computational linguistics. Human Lang Technol, pp 258–267
29.
go back to reference Misgar MM, Mushtaq F, Khurana SS, Kumar M (2023) Recognition of offline handwritten urdu characters using rnn and lstm models. Multimedia Tools Appl 82(2):2053–2076 CrossRef Misgar MM, Mushtaq F, Khurana SS, Kumar M (2023) Recognition of offline handwritten urdu characters using rnn and lstm models. Multimedia Tools Appl 82(2):2053–2076 CrossRef
30.
go back to reference Kang L, Riba P, Rusiñol M, Fornés A, Villegas M (2022) Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn 129:108766 CrossRef Kang L, Riba P, Rusiñol M, Fornés A, Villegas M (2022) Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn 129:108766 CrossRef
33.
go back to reference Marti U-V, Bunke H (2002) The iam-database: an english sentence database for offline handwriting recognition. Int J Document Anal Recogn 5:39–46 CrossRefMATH Marti U-V, Bunke H (2002) The iam-database: an english sentence database for offline handwriting recognition. Int J Document Anal Recogn 5:39–46 CrossRefMATH
34.
go back to reference Sanchez JA, Romero V, Toselli AH, Vidal E (2016) Icfhr2016 competition on handwritten text recognition on the read dataset. In: 2016 15th International conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 630–635 Sanchez JA, Romero V, Toselli AH, Vidal E (2016) Icfhr2016 competition on handwritten text recognition on the read dataset. In: 2016 15th International conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 630–635
35.
go back to reference Sanchez JA, Romero V, Toselli AH, Villegas M, Vidal E (2017) Icdar2017 competition on handwritten text recognition on the read dataset. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1, IEEE, pp 1383–1388 Sanchez JA, Romero V, Toselli AH, Villegas M, Vidal E (2017) Icdar2017 competition on handwritten text recognition on the read dataset. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1, IEEE, pp 1383–1388
36.
go back to reference Riaz N, Arbab H, Maqsood A, Nasir K, Ul-Hasan A, Shafait F (2022) Conv-transformer architecture for unconstrained off-line urdu handwriting recognition. Int J Document Anal Recogn (IJDAR) 25(4):373–384 CrossRef Riaz N, Arbab H, Maqsood A, Nasir K, Ul-Hasan A, Shafait F (2022) Conv-transformer architecture for unconstrained off-line urdu handwriting recognition. Int J Document Anal Recogn (IJDAR) 25(4):373–384 CrossRef
37.
go back to reference Naz S, Umar AI, Ahmed R, Razzak MI, Rashid SF, Shafait F (2016) Urdu nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5:1–16 CrossRef Naz S, Umar AI, Ahmed R, Razzak MI, Rashid SF, Shafait F (2016) Urdu nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5:1–16 CrossRef
38.
go back to reference Cunningham P, Cord M, Delany SJ (2008) Supervised learning. Machine learning techniques for multimedia: case studies on organization and retrieval, pp 21–49 Cunningham P, Cord M, Delany SJ (2008) Supervised learning. Machine learning techniques for multimedia: case studies on organization and retrieval, pp 21–49
39.
go back to reference Ganai AF, Koul A (2016) Projection profile based ligature segmentation of nastaleeq urdu ocr. In: 2016 4th International symposium on computational and business intelligence (ISCBI), IEEE, pp 170–175 Ganai AF, Koul A (2016) Projection profile based ligature segmentation of nastaleeq urdu ocr. In: 2016 4th International symposium on computational and business intelligence (ISCBI), IEEE, pp 170–175
40.
go back to reference Lehal GS (2013) Ligature segmentation for urdu ocr. In: 2013 12th International conference on document analysis and recognition, IEEE, pp 1130–1134 Lehal GS (2013) Ligature segmentation for urdu ocr. In: 2013 12th International conference on document analysis and recognition, IEEE, pp 1130–1134
41.
go back to reference Uddin I, Javed N, Siddiqi I, Khalid S, Khurshid K (2019) Recognition of printed urdu ligatures using convolutional neural networks. J Electronic Imag 28(3):033004–033004 CrossRef Uddin I, Javed N, Siddiqi I, Khalid S, Khurshid K (2019) Recognition of printed urdu ligatures using convolutional neural networks. J Electronic Imag 28(3):033004–033004 CrossRef
43.
go back to reference Brownlee J (2017) Gentle introduction to the adam optimization algorithm for deep learning. Mach Learn Mastery 3(7) Brownlee J (2017) Gentle introduction to the adam optimization algorithm for deep learning. Mach Learn Mastery 3(7)
Metadata
Title
Computationally efficient recognition of unconstrained handwritten Urdu script using BERT with vision transformers
Authors
Aejaz Farooq Ganai
Farida Khursheed
Publication date
22-09-2023
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 34/2023
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-023-08976-1

Other articles of this Issue 34/2023

Neural Computing and Applications 34/2023 Go to the issue

Premium Partner