Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 4/2022

14-09-2022 | Special Issue Paper

Character spotting and autonomous tagging: offline handwriting recognition for Bangla, Korean and other alphabetic scripts

Authors: Nishatul Majid, Elisa H. Barney Smith

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 4/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper demonstrates a framework for offline handwriting recognition using character spotting and autonomous tagging which works for any alphabetic script. Character spotting builds on the idea of object detection to find character elements in unsegmented word images. An autonomous tagging approach is introduced which automates the production of a character image training set by estimating character locations in a word based on typical character size. Although scripts can vary vividly from each other, our proposed approach provides a simple and powerful workflow for unconstrained offline recognition that should work for any alphabetic script with few adjustments. Here we demonstrate this approach with handwritten Bangla, obtaining a character recognition accuracy (CRA) of 94.8% and 91.12% with precision and autonomous tagging, respectively. Furthermore, we explained how character spotting and autonomous tagging can be implemented for other alphabetic scripts. We demonstrated that with handwritten Hangul/Korean obtaining a Jamo recognition accuracy (JRA) of 93.16% using a tiny fraction of the PE92 training set. The combination of character spotting and autonomous tagging takes away one of the biggest frustrations—data annotation by hand, and thus, we believe this has the potential to revolutionize the growth of offline recognition development.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Majid, N., Smith, E.H.B.: Segmentation-free Bangla offline handwriting recognition using sequential detection of characters and diacritics with a Faster R-CNN. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 228–233. IEEE (2019) Majid, N., Smith, E.H.B.: Segmentation-free Bangla offline handwriting recognition using sequential detection of characters and diacritics with a Faster R-CNN. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 228–233. IEEE (2019)
5.
go back to reference Malakar, S., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms. Neural Comput. Appl. 33(1), 449–468 (2021)CrossRef Malakar, S., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms. Neural Comput. Appl. 33(1), 449–468 (2021)CrossRef
6.
go back to reference Mitra, P., Bhattacharjee, K., Das, A., Dey, S.K., Chakraborty, D., Ghosal, A., Akhtar, S.: Character segmentation for handwritten Bangla words using image processing. Am. J. Electron. Commun. 1(3), 8–11 (2021) Mitra, P., Bhattacharjee, K., Das, A., Dey, S.K., Chakraborty, D., Ghosal, A., Akhtar, S.: Character segmentation for handwritten Bangla words using image processing. Am. J. Electron. Commun. 1(3), 8–11 (2021)
7.
go back to reference Kohli, M., Kumar, S.: Segmentation of handwritten words into characters. Multimed. Tools Appl. 80(14), 22121–22133 (2021)CrossRef Kohli, M., Kumar, S.: Segmentation of handwritten words into characters. Multimed. Tools Appl. 80(14), 22121–22133 (2021)CrossRef
8.
go back to reference Mahto, M.K., Bhatia, K., Sharma, R.K.: Robust offline Gurmukhi handwritten character recognition using multilayer histogram oriented gradient features. Int. J. Comput. Sci. Eng. 6(6), 915–925 (2018) Mahto, M.K., Bhatia, K., Sharma, R.K.: Robust offline Gurmukhi handwritten character recognition using multilayer histogram oriented gradient features. Int. J. Comput. Sci. Eng. 6(6), 915–925 (2018)
9.
go back to reference Javia, R.P., Goswami, M.M., Mitra, S.K.: Character segmentation from handwritten Gujarati isolated words using deep learning. In: 18th India Council International Conference (INDICON), pp. 1–6. IEEE (2021) Javia, R.P., Goswami, M.M., Mitra, S.K.: Character segmentation from handwritten Gujarati isolated words using deep learning. In: 18th India Council International Conference (INDICON), pp. 1–6. IEEE (2021)
10.
go back to reference Gupta, D., Bag, S.: Holistic versus segmentation-based recognition of handwritten Devanagari conjunct characters: a CNN-based experimental study. Neural Comput. Appl. 34(7), 5665–5681 (2022)CrossRef Gupta, D., Bag, S.: Holistic versus segmentation-based recognition of handwritten Devanagari conjunct characters: a CNN-based experimental study. Neural Comput. Appl. 34(7), 5665–5681 (2022)CrossRef
11.
go back to reference Parikh, M., Desai, A.: Segmentation of frequently used handwritten Gujarati conjunctive alphabet. In: 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), pp. 1–6. IEEE (2019) Parikh, M., Desai, A.: Segmentation of frequently used handwritten Gujarati conjunctive alphabet. In: 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), pp. 1–6. IEEE (2019)
12.
go back to reference Chaudhuri, B.B., Kundu, A.: Proceedings of the Internation Conference on Frontier in Handwriting Recognition (ICFHR) (2008) Chaudhuri, B.B., Kundu, A.: Proceedings of the Internation Conference on Frontier in Handwriting Recognition (ICFHR) (2008)
14.
go back to reference Ghosh, T., Abedin, M.-H.-Z., Al Banna, H., Mumenin, N., Abu Yousuf, M.: Performance analysis of state of the art convolutional neural network architectures in Bangla handwritten character recognition. Pattern Recognit. Image Anal. 31(1), 60–71 (2021)CrossRef Ghosh, T., Abedin, M.-H.-Z., Al Banna, H., Mumenin, N., Abu Yousuf, M.: Performance analysis of state of the art convolutional neural network architectures in Bangla handwritten character recognition. Pattern Recognit. Image Anal. 31(1), 60–71 (2021)CrossRef
15.
go back to reference Mishra, M., Choudhury, T., Sarkar, T.: Devanagari handwritten character recognition. In: 2021 IEEE India Council International Subsections Conference (INDISCON), pp. 1–6. IEEE (2021) Mishra, M., Choudhury, T., Sarkar, T.: Devanagari handwritten character recognition. In: 2021 IEEE India Council International Subsections Conference (INDISCON), pp. 1–6. IEEE (2021)
16.
go back to reference Mahto, M.K., Bhatia, K., Sharma, R.K.: Deep learning based models for offline Gurmukhi handwritten character and numeral recognition. ELCVIA Electron. Lett. Comput. Vis. Image Anal., 20(2), (2021) Mahto, M.K., Bhatia, K., Sharma, R.K.: Deep learning based models for offline Gurmukhi handwritten character and numeral recognition. ELCVIA Electron. Lett. Comput. Vis. Image Anal., 20(2), (2021)
17.
go back to reference Rani, N.S., Subramani, A.C., Kumar, A., Pushpa, BR.: Deep learning network architecture based Kannada handwritten character recognition. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 213–220. IEEE (2020) Rani, N.S., Subramani, A.C., Kumar, A., Pushpa, BR.: Deep learning network architecture based Kannada handwritten character recognition. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 213–220. IEEE (2020)
18.
go back to reference Vinotheni, C., Lakshmana Pandian, S., Lakshmi, G.: Modified convolutional neural network of Tamil character recognition. In: Advances in Distributed Computing and Machine Learning, pp. 469–480. Springer (2021) Vinotheni, C., Lakshmana Pandian, S., Lakshmi, G.: Modified convolutional neural network of Tamil character recognition. In: Advances in Distributed Computing and Machine Learning, pp. 469–480. Springer (2021)
19.
go back to reference Sonthi, V.K., Nagarajan, S., Krishnaraj, N.: An intelligent Telugu handwritten character recognition using multi-objective mayfly optimization with deep learning based DenseNet model. Trans. Asian Low-Resour. Lang. Inf. Process., (2022) Sonthi, V.K., Nagarajan, S., Krishnaraj, N.: An intelligent Telugu handwritten character recognition using multi-objective mayfly optimization with deep learning based DenseNet model. Trans. Asian Low-Resour. Lang. Inf. Process., (2022)
20.
go back to reference Jose, B., Pushpalatha, KP.: Intelligent handwritten character recognition for Malayalam scripts using deep learning approach. In: IOP Conference Series: Materials Science and Engineering, volume 1085, page 012022. IOP Publishing (2021) Jose, B., Pushpalatha, KP.: Intelligent handwritten character recognition for Malayalam scripts using deep learning approach. In: IOP Conference Series: Materials Science and Engineering, volume 1085, page 012022. IOP Publishing (2021)
21.
go back to reference Chauhan, V.K., Singh, S., Sharma, A.: HCR-Net: A deep learning based script independent handwritten character recognition network. arXiv:2108.06663, (2021) Chauhan, V.K., Singh, S., Sharma, A.: HCR-Net: A deep learning based script independent handwritten character recognition network. arXiv:​2108.​06663, (2021)
22.
go back to reference Park, G.-R., Kim, I.-J., Liu, C.-L.: An evaluation of statistical methods in handwritten Hangul recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 16(3), 273–283 (2013)CrossRef Park, G.-R., Kim, I.-J., Liu, C.-L.: An evaluation of statistical methods in handwritten Hangul recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 16(3), 273–283 (2013)CrossRef
23.
go back to reference Kim, I.-J., Xie, X.: Handwritten Hangul recognition using deep convolutional neural networks. Int. J. Doc. Anal. Recognit. (IJDAR) 18(1), 1–13 (2015)CrossRef Kim, I.-J., Xie, X.: Handwritten Hangul recognition using deep convolutional neural networks. Int. J. Doc. Anal. Recognit. (IJDAR) 18(1), 1–13 (2015)CrossRef
24.
go back to reference Dziubliuk, V., Zlotnyk, M., Viatchaninov, O.: Sequence learning model for syllables recognition arranged in two dimensions. In: International Conference on Document Analysis and Recognition, pp. 100–111. Springer (2021) Dziubliuk, V., Zlotnyk, M., Viatchaninov, O.: Sequence learning model for syllables recognition arranged in two dimensions. In: International Conference on Document Analysis and Recognition, pp. 100–111. Springer (2021)
25.
go back to reference Pramanik, R., Bag, S.: Handwritten Bangla city name word recognition using CNN-based transfer learning and fcn. Neural Comput. Appl. 33(15), 9329–9341 (2021)CrossRef Pramanik, R., Bag, S.: Handwritten Bangla city name word recognition using CNN-based transfer learning and fcn. Neural Comput. Appl. 33(15), 9329–9341 (2021)CrossRef
26.
go back to reference Sharma, S., Gupta, S., Gupta, D., Juneja, S., Singal, G., Dhiman, G., Kautish, S.: Recognition of Gurmukhi handwritten city names using deep learning and cloud computing. Sci. Programm. (2022) Sharma, S., Gupta, S., Gupta, D., Juneja, S., Singal, G., Dhiman, G., Kautish, S.: Recognition of Gurmukhi handwritten city names using deep learning and cloud computing. Sci. Programm. (2022)
27.
go back to reference Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.V.: Offline handwriting recognition on Devanagari using a new benchmark dataset. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 25–30. IEEE (2018) Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.V.: Offline handwriting recognition on Devanagari using a new benchmark dataset. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 25–30. IEEE (2018)
28.
go back to reference Jino, P.J., Balakrishnan, ., Bhattacharya, U.: Offline handwritten Malayalam word recognition using a deep architecture. In: Soft Computing for Problem Solving, pp. 913–925. Springer (2019) Jino, P.J., Balakrishnan, ., Bhattacharya, U.: Offline handwritten Malayalam word recognition using a deep architecture. In: Soft Computing for Problem Solving, pp. 913–925. Springer (2019)
29.
go back to reference Salunke, D., Sabne, P., Saini, H., Shivanagi, V., Jadhav, P.: Handwritten Devanagari word recognition using customized convolution neural network. In: 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1–5. IEEE (2021) Salunke, D., Sabne, P., Saini, H., Shivanagi, V., Jadhav, P.: Handwritten Devanagari word recognition using customized convolution neural network. In: 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1–5. IEEE (2021)
30.
go back to reference Adak, C., Chaudhuri, B.B., Blumenstein, M.: Offline cursive Bengali word recognition using CNNs with a recurrent model. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 429–434. IEEE (2016) Adak, C., Chaudhuri, B.B., Blumenstein, M.: Offline cursive Bengali word recognition using CNNs with a recurrent model. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 429–434. IEEE (2016)
31.
go back to reference Mondal, R., Malakar, S., Smith, E.H.B., Sarkar, Ram.: Handwritten English word recognition using a deep learning based object detection architecture. Multimed. Tools Appl., p 1–26, (2021) Mondal, R., Malakar, S., Smith, E.H.B., Sarkar, Ram.: Handwritten English word recognition using a deep learning based object detection architecture. Multimed. Tools Appl., p 1–26, (2021)
32.
34.
go back to reference Majid, N., Smith, E.H.B.: Introducing the Boise State Bangla Handwriting dataset and an efficient offline recognizer of isolated Bangla characters. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp 380–385. IEEE (2018) Majid, N., Smith, E.H.B.: Introducing the Boise State Bangla Handwriting dataset and an efficient offline recognizer of isolated Bangla characters. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp 380–385. IEEE (2018)
35.
go back to reference Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: CMATERdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image. Int. J. Doc. Anal. Recognit. (IJDAR) 15(1), 71–83 (2011)CrossRef Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: CMATERdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image. Int. J. Doc. Anal. Recognit. (IJDAR) 15(1), 71–83 (2011)CrossRef
36.
go back to reference Mukherjee, S., Kumar, P., Roy, P.P.: Fusion of spatio-temporal information for Indic word recognition combining online and offline text data. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 19(2), 1–24 (2019) Mukherjee, S., Kumar, P., Roy, P.P.: Fusion of spatio-temporal information for Indic word recognition combining online and offline text data. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 19(2), 1–24 (2019)
37.
go back to reference Clausner, C., Antonacopoulos, A., Derrick, T., Pletschacher, S.: ICDAR2019 competition on recognition of early Indian printed documents–REID2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1527–1532. IEEE (2019) Clausner, C., Antonacopoulos, A., Derrick, T., Pletschacher, S.: ICDAR2019 competition on recognition of early Indian printed documents–REID2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1527–1532. IEEE (2019)
Metadata
Title
Character spotting and autonomous tagging: offline handwriting recognition for Bangla, Korean and other alphabetic scripts
Authors
Nishatul Majid
Elisa H. Barney Smith
Publication date
14-09-2022
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 4/2022
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-022-00410-x

Other articles of this Issue 4/2022

International Journal on Document Analysis and Recognition (IJDAR) 4/2022 Go to the issue

Premium Partner