Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 4/2022

14.09.2022 | Special Issue Paper

Character spotting and autonomous tagging: offline handwriting recognition for Bangla, Korean and other alphabetic scripts

verfasst von: Nishatul Majid, Elisa H. Barney Smith

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper demonstrates a framework for offline handwriting recognition using character spotting and autonomous tagging which works for any alphabetic script. Character spotting builds on the idea of object detection to find character elements in unsegmented word images. An autonomous tagging approach is introduced which automates the production of a character image training set by estimating character locations in a word based on typical character size. Although scripts can vary vividly from each other, our proposed approach provides a simple and powerful workflow for unconstrained offline recognition that should work for any alphabetic script with few adjustments. Here we demonstrate this approach with handwritten Bangla, obtaining a character recognition accuracy (CRA) of 94.8% and 91.12% with precision and autonomous tagging, respectively. Furthermore, we explained how character spotting and autonomous tagging can be implemented for other alphabetic scripts. We demonstrated that with handwritten Hangul/Korean obtaining a Jamo recognition accuracy (JRA) of 93.16% using a tiny fraction of the PE92 training set. The combination of character spotting and autonomous tagging takes away one of the biggest frustrations—data annotation by hand, and thus, we believe this has the potential to revolutionize the growth of offline recognition development.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Majid, N., Smith, E.H.B.: Segmentation-free Bangla offline handwriting recognition using sequential detection of characters and diacritics with a Faster R-CNN. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 228–233. IEEE (2019) Majid, N., Smith, E.H.B.: Segmentation-free Bangla offline handwriting recognition using sequential detection of characters and diacritics with a Faster R-CNN. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 228–233. IEEE (2019)
5.
Zurück zum Zitat Malakar, S., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms. Neural Comput. Appl. 33(1), 449–468 (2021)CrossRef Malakar, S., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms. Neural Comput. Appl. 33(1), 449–468 (2021)CrossRef
6.
Zurück zum Zitat Mitra, P., Bhattacharjee, K., Das, A., Dey, S.K., Chakraborty, D., Ghosal, A., Akhtar, S.: Character segmentation for handwritten Bangla words using image processing. Am. J. Electron. Commun. 1(3), 8–11 (2021) Mitra, P., Bhattacharjee, K., Das, A., Dey, S.K., Chakraborty, D., Ghosal, A., Akhtar, S.: Character segmentation for handwritten Bangla words using image processing. Am. J. Electron. Commun. 1(3), 8–11 (2021)
7.
Zurück zum Zitat Kohli, M., Kumar, S.: Segmentation of handwritten words into characters. Multimed. Tools Appl. 80(14), 22121–22133 (2021)CrossRef Kohli, M., Kumar, S.: Segmentation of handwritten words into characters. Multimed. Tools Appl. 80(14), 22121–22133 (2021)CrossRef
8.
Zurück zum Zitat Mahto, M.K., Bhatia, K., Sharma, R.K.: Robust offline Gurmukhi handwritten character recognition using multilayer histogram oriented gradient features. Int. J. Comput. Sci. Eng. 6(6), 915–925 (2018) Mahto, M.K., Bhatia, K., Sharma, R.K.: Robust offline Gurmukhi handwritten character recognition using multilayer histogram oriented gradient features. Int. J. Comput. Sci. Eng. 6(6), 915–925 (2018)
9.
Zurück zum Zitat Javia, R.P., Goswami, M.M., Mitra, S.K.: Character segmentation from handwritten Gujarati isolated words using deep learning. In: 18th India Council International Conference (INDICON), pp. 1–6. IEEE (2021) Javia, R.P., Goswami, M.M., Mitra, S.K.: Character segmentation from handwritten Gujarati isolated words using deep learning. In: 18th India Council International Conference (INDICON), pp. 1–6. IEEE (2021)
10.
Zurück zum Zitat Gupta, D., Bag, S.: Holistic versus segmentation-based recognition of handwritten Devanagari conjunct characters: a CNN-based experimental study. Neural Comput. Appl. 34(7), 5665–5681 (2022)CrossRef Gupta, D., Bag, S.: Holistic versus segmentation-based recognition of handwritten Devanagari conjunct characters: a CNN-based experimental study. Neural Comput. Appl. 34(7), 5665–5681 (2022)CrossRef
11.
Zurück zum Zitat Parikh, M., Desai, A.: Segmentation of frequently used handwritten Gujarati conjunctive alphabet. In: 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), pp. 1–6. IEEE (2019) Parikh, M., Desai, A.: Segmentation of frequently used handwritten Gujarati conjunctive alphabet. In: 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), pp. 1–6. IEEE (2019)
12.
Zurück zum Zitat Chaudhuri, B.B., Kundu, A.: Proceedings of the Internation Conference on Frontier in Handwriting Recognition (ICFHR) (2008) Chaudhuri, B.B., Kundu, A.: Proceedings of the Internation Conference on Frontier in Handwriting Recognition (ICFHR) (2008)
14.
Zurück zum Zitat Ghosh, T., Abedin, M.-H.-Z., Al Banna, H., Mumenin, N., Abu Yousuf, M.: Performance analysis of state of the art convolutional neural network architectures in Bangla handwritten character recognition. Pattern Recognit. Image Anal. 31(1), 60–71 (2021)CrossRef Ghosh, T., Abedin, M.-H.-Z., Al Banna, H., Mumenin, N., Abu Yousuf, M.: Performance analysis of state of the art convolutional neural network architectures in Bangla handwritten character recognition. Pattern Recognit. Image Anal. 31(1), 60–71 (2021)CrossRef
15.
Zurück zum Zitat Mishra, M., Choudhury, T., Sarkar, T.: Devanagari handwritten character recognition. In: 2021 IEEE India Council International Subsections Conference (INDISCON), pp. 1–6. IEEE (2021) Mishra, M., Choudhury, T., Sarkar, T.: Devanagari handwritten character recognition. In: 2021 IEEE India Council International Subsections Conference (INDISCON), pp. 1–6. IEEE (2021)
16.
Zurück zum Zitat Mahto, M.K., Bhatia, K., Sharma, R.K.: Deep learning based models for offline Gurmukhi handwritten character and numeral recognition. ELCVIA Electron. Lett. Comput. Vis. Image Anal., 20(2), (2021) Mahto, M.K., Bhatia, K., Sharma, R.K.: Deep learning based models for offline Gurmukhi handwritten character and numeral recognition. ELCVIA Electron. Lett. Comput. Vis. Image Anal., 20(2), (2021)
17.
Zurück zum Zitat Rani, N.S., Subramani, A.C., Kumar, A., Pushpa, BR.: Deep learning network architecture based Kannada handwritten character recognition. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 213–220. IEEE (2020) Rani, N.S., Subramani, A.C., Kumar, A., Pushpa, BR.: Deep learning network architecture based Kannada handwritten character recognition. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 213–220. IEEE (2020)
18.
Zurück zum Zitat Vinotheni, C., Lakshmana Pandian, S., Lakshmi, G.: Modified convolutional neural network of Tamil character recognition. In: Advances in Distributed Computing and Machine Learning, pp. 469–480. Springer (2021) Vinotheni, C., Lakshmana Pandian, S., Lakshmi, G.: Modified convolutional neural network of Tamil character recognition. In: Advances in Distributed Computing and Machine Learning, pp. 469–480. Springer (2021)
19.
Zurück zum Zitat Sonthi, V.K., Nagarajan, S., Krishnaraj, N.: An intelligent Telugu handwritten character recognition using multi-objective mayfly optimization with deep learning based DenseNet model. Trans. Asian Low-Resour. Lang. Inf. Process., (2022) Sonthi, V.K., Nagarajan, S., Krishnaraj, N.: An intelligent Telugu handwritten character recognition using multi-objective mayfly optimization with deep learning based DenseNet model. Trans. Asian Low-Resour. Lang. Inf. Process., (2022)
20.
Zurück zum Zitat Jose, B., Pushpalatha, KP.: Intelligent handwritten character recognition for Malayalam scripts using deep learning approach. In: IOP Conference Series: Materials Science and Engineering, volume 1085, page 012022. IOP Publishing (2021) Jose, B., Pushpalatha, KP.: Intelligent handwritten character recognition for Malayalam scripts using deep learning approach. In: IOP Conference Series: Materials Science and Engineering, volume 1085, page 012022. IOP Publishing (2021)
21.
Zurück zum Zitat Chauhan, V.K., Singh, S., Sharma, A.: HCR-Net: A deep learning based script independent handwritten character recognition network. arXiv:2108.06663, (2021) Chauhan, V.K., Singh, S., Sharma, A.: HCR-Net: A deep learning based script independent handwritten character recognition network. arXiv:​2108.​06663, (2021)
22.
Zurück zum Zitat Park, G.-R., Kim, I.-J., Liu, C.-L.: An evaluation of statistical methods in handwritten Hangul recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 16(3), 273–283 (2013)CrossRef Park, G.-R., Kim, I.-J., Liu, C.-L.: An evaluation of statistical methods in handwritten Hangul recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 16(3), 273–283 (2013)CrossRef
23.
Zurück zum Zitat Kim, I.-J., Xie, X.: Handwritten Hangul recognition using deep convolutional neural networks. Int. J. Doc. Anal. Recognit. (IJDAR) 18(1), 1–13 (2015)CrossRef Kim, I.-J., Xie, X.: Handwritten Hangul recognition using deep convolutional neural networks. Int. J. Doc. Anal. Recognit. (IJDAR) 18(1), 1–13 (2015)CrossRef
24.
Zurück zum Zitat Dziubliuk, V., Zlotnyk, M., Viatchaninov, O.: Sequence learning model for syllables recognition arranged in two dimensions. In: International Conference on Document Analysis and Recognition, pp. 100–111. Springer (2021) Dziubliuk, V., Zlotnyk, M., Viatchaninov, O.: Sequence learning model for syllables recognition arranged in two dimensions. In: International Conference on Document Analysis and Recognition, pp. 100–111. Springer (2021)
25.
Zurück zum Zitat Pramanik, R., Bag, S.: Handwritten Bangla city name word recognition using CNN-based transfer learning and fcn. Neural Comput. Appl. 33(15), 9329–9341 (2021)CrossRef Pramanik, R., Bag, S.: Handwritten Bangla city name word recognition using CNN-based transfer learning and fcn. Neural Comput. Appl. 33(15), 9329–9341 (2021)CrossRef
26.
Zurück zum Zitat Sharma, S., Gupta, S., Gupta, D., Juneja, S., Singal, G., Dhiman, G., Kautish, S.: Recognition of Gurmukhi handwritten city names using deep learning and cloud computing. Sci. Programm. (2022) Sharma, S., Gupta, S., Gupta, D., Juneja, S., Singal, G., Dhiman, G., Kautish, S.: Recognition of Gurmukhi handwritten city names using deep learning and cloud computing. Sci. Programm. (2022)
27.
Zurück zum Zitat Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.V.: Offline handwriting recognition on Devanagari using a new benchmark dataset. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 25–30. IEEE (2018) Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.V.: Offline handwriting recognition on Devanagari using a new benchmark dataset. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 25–30. IEEE (2018)
28.
Zurück zum Zitat Jino, P.J., Balakrishnan, ., Bhattacharya, U.: Offline handwritten Malayalam word recognition using a deep architecture. In: Soft Computing for Problem Solving, pp. 913–925. Springer (2019) Jino, P.J., Balakrishnan, ., Bhattacharya, U.: Offline handwritten Malayalam word recognition using a deep architecture. In: Soft Computing for Problem Solving, pp. 913–925. Springer (2019)
29.
Zurück zum Zitat Salunke, D., Sabne, P., Saini, H., Shivanagi, V., Jadhav, P.: Handwritten Devanagari word recognition using customized convolution neural network. In: 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1–5. IEEE (2021) Salunke, D., Sabne, P., Saini, H., Shivanagi, V., Jadhav, P.: Handwritten Devanagari word recognition using customized convolution neural network. In: 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1–5. IEEE (2021)
30.
Zurück zum Zitat Adak, C., Chaudhuri, B.B., Blumenstein, M.: Offline cursive Bengali word recognition using CNNs with a recurrent model. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 429–434. IEEE (2016) Adak, C., Chaudhuri, B.B., Blumenstein, M.: Offline cursive Bengali word recognition using CNNs with a recurrent model. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 429–434. IEEE (2016)
31.
Zurück zum Zitat Mondal, R., Malakar, S., Smith, E.H.B., Sarkar, Ram.: Handwritten English word recognition using a deep learning based object detection architecture. Multimed. Tools Appl., p 1–26, (2021) Mondal, R., Malakar, S., Smith, E.H.B., Sarkar, Ram.: Handwritten English word recognition using a deep learning based object detection architecture. Multimed. Tools Appl., p 1–26, (2021)
32.
34.
Zurück zum Zitat Majid, N., Smith, E.H.B.: Introducing the Boise State Bangla Handwriting dataset and an efficient offline recognizer of isolated Bangla characters. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp 380–385. IEEE (2018) Majid, N., Smith, E.H.B.: Introducing the Boise State Bangla Handwriting dataset and an efficient offline recognizer of isolated Bangla characters. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp 380–385. IEEE (2018)
35.
Zurück zum Zitat Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: CMATERdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image. Int. J. Doc. Anal. Recognit. (IJDAR) 15(1), 71–83 (2011)CrossRef Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: CMATERdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image. Int. J. Doc. Anal. Recognit. (IJDAR) 15(1), 71–83 (2011)CrossRef
36.
Zurück zum Zitat Mukherjee, S., Kumar, P., Roy, P.P.: Fusion of spatio-temporal information for Indic word recognition combining online and offline text data. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 19(2), 1–24 (2019) Mukherjee, S., Kumar, P., Roy, P.P.: Fusion of spatio-temporal information for Indic word recognition combining online and offline text data. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 19(2), 1–24 (2019)
37.
Zurück zum Zitat Clausner, C., Antonacopoulos, A., Derrick, T., Pletschacher, S.: ICDAR2019 competition on recognition of early Indian printed documents–REID2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1527–1532. IEEE (2019) Clausner, C., Antonacopoulos, A., Derrick, T., Pletschacher, S.: ICDAR2019 competition on recognition of early Indian printed documents–REID2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1527–1532. IEEE (2019)
Metadaten
Titel
Character spotting and autonomous tagging: offline handwriting recognition for Bangla, Korean and other alphabetic scripts
verfasst von
Nishatul Majid
Elisa H. Barney Smith
Publikationsdatum
14.09.2022
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 4/2022
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-022-00410-x

Weitere Artikel der Ausgabe 4/2022

International Journal on Document Analysis and Recognition (IJDAR) 4/2022 Zur Ausgabe

Premium Partner