Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 4/2022

23-09-2022 | Special Issue Paper

Conv-transformer architecture for unconstrained off-line Urdu handwriting recognition

Authors: Nauman Riaz, Haziq Arbab, Arooba Maqsood, Khuzaeymah Nasir, Adnan Ul-Hasan, Faisal Shafait

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 4/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Unconstrained off-line handwriting text recognition in general and for Arabic-like scripts in particular is a challenging task and is still an active research area. Transformer-based models for English handwriting recognition have recently shown promising results. In this paper, we have explored the use of transformer architecture for Urdu handwriting recognition. The use of a convolution neural network before a Vanilla full transformer and using Urdu printed text-lines along with handwritten text lines during the training are the highlights of the proposed work. The convolution layers act to reduce the spatial resolutions and compensate for the \(n^{2}\) complexity of transformer multi-head attention layers. Moreover, the printed text images in the training phase help the model in learning a greater number of ligatures (a prominent feature of Arabic-like scripts) and a better language model. Our model achieved state-of-the-art accuracy (CER of \(5.31\%\)) on publicly available NUST-UHWR dataset (Zia et al. in Neural Comput Appl 34:1–14, 2021).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: Holistic Urdu handwritten word recognition using support vector machine. In: 2010 20th International Conference on Pattern Recognition, pp. 1900–1903 (2010) Sagheer, M.W., He, C.L., Nobile, N., Suen, C.Y.: Holistic Urdu handwritten word recognition using support vector machine. In: 2010 20th International Conference on Pattern Recognition, pp. 1900–1903 (2010)
2.
go back to reference Zia, N., Naeem, M.F., Raza, S.K., Khan, M.M., Ul-Hasan, A., Shafait, F.: A convolutional recursive deep architecture for unconstrained Urdu handwriting recognition. Neural Comput. Appl. 34, 1–14 (2021) Zia, N., Naeem, M.F., Raza, S.K., Khan, M.M., Ul-Hasan, A., Shafait, F.: A convolutional recursive deep architecture for unconstrained Urdu handwriting recognition. Neural Comput. Appl. 34, 1–14 (2021)
3.
go back to reference Naz, S., Umar, A.I., Ahmad, R., Siddiqi, I., Ahmed, S.B., Razzak, M.I., Shafait, F.: Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243, 80–87 (2017)CrossRef Naz, S., Umar, A.I., Ahmad, R., Siddiqi, I., Ahmed, S.B., Razzak, M.I., Shafait, F.: Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243, 80–87 (2017)CrossRef
4.
go back to reference Asad, K., Asghar, M.Z., Anam, S., Hameed, I.A., Asif, S.H., Shakeel, A.: A survey on sentiment analysis in Urdu: a resource-poor language. Egypt. Inform. J. 22(1), 53–74 (2021)CrossRef Asad, K., Asghar, M.Z., Anam, S., Hameed, I.A., Asif, S.H., Shakeel, A.: A survey on sentiment analysis in Urdu: a resource-poor language. Egypt. Inform. J. 22(1), 53–74 (2021)CrossRef
5.
go back to reference Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
6.
go back to reference Hassan, S., Irfan, A., Mirza, A., Siddiqi, I.: Cursive handwritten text recognition using bi-directional LSTMs: a case study on Urdu handwriting. In: 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), pp. 67–72 (2019) Hassan, S., Irfan, A., Mirza, A., Siddiqi, I.: Cursive handwritten text recognition using bi-directional LSTMs: a case study on Urdu handwriting. In: 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), pp. 67–72 (2019)
7.
go back to reference Husnain, M., Saad Missen, M.M., Mumtaz, S., Jhanidr, M.Z., Coustaty, M., Muzzamil, L.M., Ogier, J., Choi, G.S.: Recognition of Urdu handwritten characters using convolutional neural network. Appl. Sci. 9(13), 2758 (2019)CrossRef Husnain, M., Saad Missen, M.M., Mumtaz, S., Jhanidr, M.Z., Coustaty, M., Muzzamil, L.M., Ogier, J., Choi, G.S.: Recognition of Urdu handwritten characters using convolutional neural network. Appl. Sci. 9(13), 2758 (2019)CrossRef
8.
go back to reference Ul-Hasan, A., Ahmed, S.B., Rashid, F., Shafait, F., Breuel, T.M.: Offline printed Urdu Nastaleeq script recognition with bidirectional LSTM networks. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1061–1065 (2013) Ul-Hasan, A., Ahmed, S.B., Rashid, F., Shafait, F., Breuel, T.M.: Offline printed Urdu Nastaleeq script recognition with bidirectional LSTM networks. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1061–1065 (2013)
9.
go back to reference Naz, S., Umar, A.I., Ahmed, R., Siddiqi, I., Ahmed, S., Razzak, M.I., Shafait, F.: Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243, 80–87 (2017)CrossRef Naz, S., Umar, A.I., Ahmed, R., Siddiqi, I., Ahmed, S., Razzak, M.I., Shafait, F.: Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243, 80–87 (2017)CrossRef
10.
go back to reference Naz, S., Umar, A.I., Ahmed, R., Razzak, M.I., Rashid, S.F., Shafait, F.: Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5, 1–16 (2016)CrossRef Naz, S., Umar, A.I., Ahmed, R., Razzak, M.I., Rashid, S.F., Shafait, F.: Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5, 1–16 (2016)CrossRef
11.
go back to reference Khan, K., Haider, I.: Online recognition of multi-stroke handwritten Urdu characters. In: 2010 International Conference on Image Analysis and Signal Processing, pp. 284–290 (2010) Khan, K., Haider, I.: Online recognition of multi-stroke handwritten Urdu characters. In: 2010 International Conference on Image Analysis and Signal Processing, pp. 284–290 (2010)
12.
go back to reference Ahmed, S., Naz, S., Razzak, S.M., Umar, A.: Ucom offline dataset—a Urdu handwritten dataset generation. Int. Arab J. Inf. Technol. 14(2), 03 (2016) Ahmed, S., Naz, S., Razzak, S.M., Umar, A.: Ucom offline dataset—a Urdu handwritten dataset generation. Int. Arab J. Inf. Technol. 14(2), 03 (2016)
13.
go back to reference Devlin, J., Chang, M., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding Devlin, J., Chang, M., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding
14.
go back to reference Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–901 (2020) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–901 (2020)
15.
go back to reference Michael, J., Labahn, R., Gruning, T., Zollner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293. IEEE (2019) Michael, J., Labahn, R., Gruning, T., Zollner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293. IEEE (2019)
16.
go back to reference Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F.: Trocr: Transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021) Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F.: Trocr: Transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:​2109.​10282 (2021)
17.
go back to reference Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers amp; distillation through attention. In: Marina, M., Tong Z. (eds.), Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 139, pp. 10347–10357. PMLR, 18–24 (2021) Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers amp; distillation through attention. In: Marina, M., Tong Z. (eds.), Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 139, pp. 10347–10357. PMLR, 18–24 (2021)
18.
go back to reference Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers distillation through attention (2020) Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers distillation through attention (2020)
19.
go back to reference Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized Bert pretraining approach (2019) Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized Bert pretraining approach (2019)
20.
go back to reference Riaz, N., Latif, S., Latif, R.: From transformers to reformers. In: 2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2), pp. 1–6 (2021) Riaz, N., Latif, S., Latif, R.: From transformers to reformers. In: 2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2), pp. 1–6 (2021)
22.
go back to reference Naeem, M.F., Zia, N., Awan, A., Ul-Hasan, A., Shafait, F.: Impact of ligature coverage on training practical Urdu OCR systems. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 131–136 (2017) Naeem, M.F., Zia, N., Awan, A., Ul-Hasan, A., Shafait, F.: Impact of ligature coverage on training practical Urdu OCR systems. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 131–136 (2017)
23.
go back to reference Rehman, A., Ul-Hasan, A., Shafait, F.: High performance Urdu and Arabic video text recognition using convolutional recurrent neural networks. In: International Conference on Document Analysis and Recognition, pp. 336–352. Springer (2021) Rehman, A., Ul-Hasan, A., Shafait, F.: High performance Urdu and Arabic video text recognition using convolutional recurrent neural networks. In: International Conference on Document Analysis and Recognition, pp. 336–352. Springer (2021)
Metadata
Title
Conv-transformer architecture for unconstrained off-line Urdu handwriting recognition
Authors
Nauman Riaz
Haziq Arbab
Arooba Maqsood
Khuzaeymah Nasir
Adnan Ul-Hasan
Faisal Shafait
Publication date
23-09-2022
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 4/2022
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-022-00416-5

Other articles of this Issue 4/2022

International Journal on Document Analysis and Recognition (IJDAR) 4/2022 Go to the issue

Premium Partner