Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 2/2023

25-12-2022 | Original Paper

Refocus attention span networks for handwriting line recognition

Authors: Mohammed Hamdan, Himanshu Chaudhary, Ahmed Bali, Mohamed Cheriet

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recurrent neural networks have achieved outstanding recognition performance for handwriting identification despite the enormous variety observed across diverse handwriting structures and poor-quality scanned documents. We initially proposed a BiLSTM baseline model with a sequential architecture well-suited for modeling text lines due to its ability to learn probability distributions over character or word sequences. However, employing such recurrent paradigms prevents parallelization and suffers from vanishing gradients for long sequences during training. To alleviate these limitations, we propose four significant contributions to this work. First, we devised an end-to-end model composed of a split-attention CNN-backbone that serves as a feature extraction method and a self-attention Transformer encoder–decoder that serves as a transcriber method to recognize handwriting manuscripts. The multi-head self-attention layers in an encoder–decoder transformer-based enhance the model’s ability to tackle handwriting recognition and learn the linguistic dependencies of character sequences. Second, we conduct various studies on transfer learning (TL) from large datasets to a small database, determining which model layers require fine-tuning. Third, we attained an efficient paradigm by combining different strategies of TL with data augmentation (DA). Finally, since the robustness of the proposed model is lexicon-free and can recognize sentences not presented in the training phase, the model is only trained on a few labeled examples with no extra cost of generating and training on synthetic datasets. We recorded comparable and outperformed Character and Word Error Rates CER/WER on four benchmark datasets to the most recent (SOTA) models.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aberdam, A., Litman, R., Tsiper, S., Anschel, O., Slossberg, R., Mazor, S., Manmatha, R., Perona, P.: Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15302–15312 (2021) Aberdam, A., Litman, R., Tsiper, S., Anschel, O., Slossberg, R., Mazor, S., Manmatha, R., Perona, P.: Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15302–15312 (2021)
2.
go back to reference Aradillas, J.C., Murillo-Fuentes, J.J., Olmos, P.M.: Boosting offline handwritten text recognition in historical documents with few labeled lines. IEEE Access 9, 76674–76688 (2021)CrossRef Aradillas, J.C., Murillo-Fuentes, J.J., Olmos, P.M.: Boosting offline handwritten text recognition in historical documents with few labeled lines. IEEE Access 9, 76674–76688 (2021)CrossRef
3.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473 Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:​1409.​0473
4.
go back to reference Bhunia, A.K., Khan, S., Cholakkal, H., Anwer, R.M., Khan, F.S., Shah, M.: Handwriting transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1086–1094 (2021) Bhunia, A.K., Khan, S., Cholakkal, H., Anwer, R.M., Khan, F.S., Shah, M.: Handwriting transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1086–1094 (2021)
5.
go back to reference Bianne-Bernard, A.L., Menasri, F., Mohamad, R.A.H., Mokbel, C., Kermorvant, C., Likforman-Sulem, L.: Dynamic and contextual information in hmm modeling for handwritten word recognition. IEEE Transact. pattern. Anal. Mach. Intell. 33(10), 2066–2080 (2011)CrossRef Bianne-Bernard, A.L., Menasri, F., Mohamad, R.A.H., Mokbel, C., Kermorvant, C., Likforman-Sulem, L.: Dynamic and contextual information in hmm modeling for handwritten word recognition. IEEE Transact. pattern. Anal. Mach. Intell. 33(10), 2066–2080 (2011)CrossRef
6.
go back to reference Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 646–651. IEEE (2017) Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 646–651. IEEE (2017)
7.
go back to reference Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural Inform. Process. Syst. 33, 1877–1901 (2020) Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural Inform. Process. Syst. 33, 1877–1901 (2020)
8.
go back to reference Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer (2020) Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer (2020)
9.
go back to reference Cascianelli, S., Cornia, M., Baraldi, L., Cucchiara, R.: Boosting modern and historical handwritten text recognition with deformable convolutions. International Journal on Document Analysis and Recognition (IJDAR) 1–11 (2022) Cascianelli, S., Cornia, M., Baraldi, L., Cucchiara, R.: Boosting modern and historical handwritten text recognition with deformable convolutions. International Journal on Document Analysis and Recognition (IJDAR) 1–11 (2022)
10.
go back to reference Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. Digit. Humanit. Q. 6(2) (2012) Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. Digit. Humanit. Q. 6(2) (2012)
11.
go back to reference Chammas, E., Mokbel, C., Likforman-Sulem, L.: Handwriting recognition of historical documents with few labeled data. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 43–48. IEEE (2018) Chammas, E., Mokbel, C., Likforman-Sulem, L.: Handwriting recognition of historical documents with few labeled data. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 43–48. IEEE (2018)
12.
go back to reference Chen, K.N., Chen, C.H., Chang, C.C.: Efficient illumination compensation techniques for text images. Digital Signal Process. 22(5), 726–733 (2012)CrossRef Chen, K.N., Chen, C.H., Chang, C.C.: Efficient illumination compensation techniques for text images. Digital Signal Process. 22(5), 726–733 (2012)CrossRef
13.
go back to reference Chen, X., Mishra, N., Rohaninejad, M., Abbeel, P.: Pixelsnail: An improved autoregressive generative model. In: International Conference on Machine Learning, pp. 864–872. PMLR (2018) Chen, X., Mishra, N., Rohaninejad, M., Abbeel, P.: Pixelsnail: An improved autoregressive generative model. In: International Conference on Machine Learning, pp. 864–872. PMLR (2018)
14.
go back to reference Chen, Z., Wu, Y., Yin, F., Liu, C.L.: Simultaneous script identification and handwriting recognition via multi-task learning of recurrent neural networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 525–530. IEEE (2017) Chen, Z., Wu, Y., Yin, F., Liu, C.L.: Simultaneous script identification and handwriting recognition via multi-task learning of recurrent neural networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 525–530. IEEE (2017)
15.
16.
go back to reference Cui, Z., Ke, R., Pu, Z., Wang, Y.: Deep bidirectional and unidirectional lstm recurrent neural network for network-wide traffic speed prediction (2018). arXiv preprint arXiv:1801.02143 Cui, Z., Ke, R., Pu, Z., Wang, Y.: Deep bidirectional and unidirectional lstm recurrent neural network for network-wide traffic speed prediction (2018). arXiv preprint arXiv:​1801.​02143
17.
go back to reference de Sousa Neto, A.F., Bezerra, B.L.D., Toselli, A.H., Lima, E.B.: Htr-flor++ a handwritten text recognition system based on a pipeline of optical and language models. In: Proceedings of the ACM Symposium on Document Engineering 2020, pp. 1–4 (2020) de Sousa Neto, A.F., Bezerra, B.L.D., Toselli, A.H., Lima, E.B.: Htr-flor++ a handwritten text recognition system based on a pipeline of optical and language models. In: Proceedings of the ACM Symposium on Document Engineering 2020, pp. 1–4 (2020)
18.
go back to reference Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009) Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
19.
go back to reference Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805 Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:​1810.​04805
20.
go back to reference Dong, L., Xu, S., Xu, B.: Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5888. IEEE (2018) Dong, L., Xu, S., Xu, B.: Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5888. IEEE (2018)
21.
go back to reference Espana-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Transact. Pattern Anal. Mach. Intell. 33(4), 767–779 (2010)CrossRef Espana-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Transact. Pattern Anal. Mach. Intell. 33(4), 767–779 (2010)CrossRef
22.
go back to reference Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine learning, pp. 369–376 (2006) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine learning, pp. 369–376 (2006)
23.
go back to reference Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Transact. Pattern Anal. Mach. Intell. 31(5), 855–868 (2008)CrossRef Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Transact. Pattern Anal. Mach. Intell. 31(5), 855–868 (2008)CrossRef
25.
go back to reference Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: A recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471. PMLR (2015) Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: A recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471. PMLR (2015)
26.
go back to reference Grosicki, E., Carré, M., Brodin, J.M., Geoffrois, E.: Results of the rimes evaluation campaign for handwritten mail processing. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 941–945. IEEE (2009) Grosicki, E., Carré, M., Brodin, J.M., Geoffrois, E.: Results of the rimes evaluation campaign for handwritten mail processing. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 941–945. IEEE (2009)
27.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
28.
go back to reference Kane, S., Lehman, A., Partridge, E.: Indexing george washington’s handwritten manuscripts. Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst, MA 1003 (2001) Kane, S., Lehman, A., Partridge, E.: Indexing george washington’s handwritten manuscripts. Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst, MA 1003 (2001)
29.
go back to reference Kang, D., Lv, Y., Chen, Y.Y.: Short-term traffic flow prediction with lstm recurrent neural network. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 1–6. IEEE (2017) Kang, D., Lv, Y., Chen, Y.Y.: Short-term traffic flow prediction with lstm recurrent neural network. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), pp. 1–6. IEEE (2017)
30.
go back to reference Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: Non-recurrent handwritten text-line recognition (2020). arXiv preprint arXiv:2005.13044 Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: Non-recurrent handwritten text-line recognition (2020). arXiv preprint arXiv:​2005.​13044
31.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)CrossRef Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)CrossRef
32.
go back to reference Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
33.
go back to reference Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Inter. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)CrossRefMATH Marti, U.V., Bunke, H.: The iam-database: an english sentence database for offline handwriting recognition. Inter. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)CrossRefMATH
34.
go back to reference Mermelstein, P., Eyden, M.: A system for automatic recognition of handwritten words. In: Proceedings of the October 27-29, 1964, fall joint computer conference, part I, pp. 333–342 (1964) Mermelstein, P., Eyden, M.: A system for automatic recognition of handwritten words. In: Proceedings of the October 27-29, 1964, fall joint computer conference, part I, pp. 333–342 (1964)
35.
go back to reference Michael, J., Labahn, R., Grüning, T., Zöllner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293. IEEE (2019) Michael, J., Labahn, R., Grüning, T., Zöllner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1286–1293. IEEE (2019)
36.
go back to reference Moreno, P., Ho, P., Vasconcelos, N.: A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. Adv. Neural Inf. Process. Syst. 16 (2003) Moreno, P., Ho, P., Vasconcelos, N.: A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. Adv. Neural Inf. Process. Syst. 16 (2003)
38.
go back to reference Parikh, A.P., Täckström, O., Das, D., Uszkoreit, J.: A decomposable attention model for natural language inference (2016). arXiv preprint arXiv:1606.01933 Parikh, A.P., Täckström, O., Das, D., Uszkoreit, J.: A decomposable attention model for natural language inference (2016). arXiv preprint arXiv:​1606.​01933
39.
go back to reference Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 285–290. IEEE (2014) Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 285–290. IEEE (2014)
40.
go back to reference Plizzari, C., Cannici, M., Matteucci, M.: Skeleton-based action recognition via spatial and temporal transformer networks. Comput. Vision Image Understanding 208, 103219 (2021)CrossRef Plizzari, C., Cannici, M., Matteucci, M.: Skeleton-based action recognition via spatial and temporal transformer networks. Comput. Vision Image Understanding 208, 103219 (2021)CrossRef
41.
go back to reference Plötz, T., Fink, G.A.: Markov models for offline handwriting recognition: a survey. Inter. J. Doc. Anal. Recognit. (IJDAR) 12(4), 269–298 (2009)CrossRef Plötz, T., Fink, G.A.: Markov models for offline handwriting recognition: a survey. Inter. J. Doc. Anal. Recognit. (IJDAR) 12(4), 269–298 (2009)CrossRef
42.
go back to reference Poznanski, A., Wolf, L.: Cnn-n-gram for handwriting word recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2305–2314 (2016) Poznanski, A., Wolf, L.: Cnn-n-gram for handwriting word recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2305–2314 (2016)
43.
go back to reference Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72. IEEE (2017) Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72. IEEE (2017)
44.
go back to reference Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In:International Conference on Machine Learning, pp. 8748–8763. PMLR (2021) Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In:International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
45.
go back to reference Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
46.
go back to reference Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transact. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)CrossRef Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transact. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)CrossRef
47.
go back to reference Sueiras, J., Ruiz, V., Sanchez, A., Velez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289, 119–128 (2018)CrossRef Sueiras, J., Ruiz, V., Sanchez, A., Velez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289, 119–128 (2018)CrossRef
48.
go back to reference Vinciarelli, A., Luettin, J.: A new normalization technique for cursive handwritten words. Pattern Recognit. Lett. 22(9), 1043–1050 (2001)CrossRefMATH Vinciarelli, A., Luettin, J.: A new normalization technique for cursive handwritten words. Pattern Recognit. Lett. 22(9), 1043–1050 (2001)CrossRefMATH
49.
go back to reference Wigington, C., Stewart, S., Davis, B., Barrett, B., Price, B., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a cnn-lstm network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 639–645. IEEE (2017) Wigington, C., Stewart, S., Davis, B., Barrett, B., Price, B., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a cnn-lstm network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 639–645. IEEE (2017)
50.
go back to reference Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. J. Big Data 6(1), 1–18 (2019)CrossRef Yadav, S.S., Jadhav, S.M.: Deep convolutional neural network based medical image classification for disease diagnosis. J. Big Data 6(1), 1–18 (2019)CrossRef
51.
go back to reference Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016) Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016)
52.
go back to reference Yujian, L., Bo, L.: A normalized levenshtein distance metric. IEEE Transact. Pattern Anal. Mach Intell. 29(6), 1091–1095 (2007)CrossRef Yujian, L., Bo, L.: A normalized levenshtein distance metric. IEEE Transact. Pattern Anal. Mach Intell. 29(6), 1091–1095 (2007)CrossRef
53.
go back to reference Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019) Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019)
54.
go back to reference Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., Sun, Y., He, T., Mueller, J., Manmatha, R., et al.: Resnest: Split-attention networks (2020). arXiv preprint arXiv:2004.08955 Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., Sun, Y., He, T., Mueller, J., Manmatha, R., et al.: Resnest: Split-attention networks (2020). arXiv preprint arXiv:​2004.​08955
55.
go back to reference Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2740–2749 (2019) Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2740–2749 (2019)
Metadata
Title
Refocus attention span networks for handwriting line recognition
Authors
Mohammed Hamdan
Himanshu Chaudhary
Ahmed Bali
Mohamed Cheriet
Publication date
25-12-2022
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 2/2023
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-022-00422-7

Other articles of this Issue 2/2023

International Journal on Document Analysis and Recognition (IJDAR) 2/2023 Go to the issue

Premium Partner