Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 1/2024

05.08.2023 | Original Paper

A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR

verfasst von: Koushik Roy, Md Sazzad Hossain, Pritom Kumar Saha, Shadman Rohan, Imranul Ashrafi, Ifty Mohammad Rezwan, Fuad Rahman, B. M. Mainul Hossain, Ahmedul Kabir, Nabeel Mohammed

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Bangla Optical Character Recognition (OCR) poses a unique challenge due to the presence of hundreds of diverse conjunct characters formed by the combination of two or more letters. In this paper, we propose two novel grapheme representation methods that improve the recognition of these conjunct characters and the overall performance of OCR in Bangla. We have utilized the popular Convolutional Recurrent Neural Network architecture and implemented our grapheme representation strategies to design the final labels of the model. Due to the absence of a large-scale Bangla word-level printed dataset, we created a synthetically generated Bangla corpus containing 2 million samples that are representative and sufficiently varied in terms of fonts, domain, and vocabulary size to train our Bangla OCR model. To test the various aspects of our model, we have also created 6 test protocols. Finally, to establish the generalizability of our grapheme representation methods, we have performed training and testing on external handwriting datasets. Experimental results proved the effectiveness of our novel approach. Furthermore, our synthetically generated training dataset and the test protocols are made available to serve as benchmarks for future Bangla OCR research.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)CrossRef Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)CrossRef
3.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
4.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
5.
Zurück zum Zitat Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)CrossRefPubMed Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)CrossRefPubMed
6.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014) Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:​1406.​2227 (2014)
7.
Zurück zum Zitat Feng, X., Yao, H., Zhang, S.: Focal CTC loss for Chinese optical character recognition on unbalanced datasets. Complexity 2019 (2019) Feng, X., Yao, H., Zhang, S.: Focal CTC loss for Chinese optical character recognition on unbalanced datasets. Complexity 2019 (2019)
8.
Zurück zum Zitat Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv preprint arXiv:2005.13044 (2020) Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv preprint arXiv:​2005.​13044 (2020)
10.
Zurück zum Zitat Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)CrossRefPubMed Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)CrossRefPubMed
11.
Zurück zum Zitat Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: Guided training of CTC towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11005–11012 (2020) Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: Guided training of CTC towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11005–11012 (2020)
12.
Zurück zum Zitat Rifat, M.J.R., Banik, M., Hasan, N., Nahar, J., Rahman, F.: A novel machine annotated balanced Bangla OCR corpus. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) Comput. Vis. Image Process., pp. 149–160. Springer, Singapore (2021)CrossRef Rifat, M.J.R., Banik, M., Hasan, N., Nahar, J., Rahman, F.: A novel machine annotated balanced Bangla OCR corpus. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) Comput. Vis. Image Process., pp. 149–160. Springer, Singapore (2021)CrossRef
13.
Zurück zum Zitat Anthimopoulos, M., Gatos, B., Pratikakis, I.: Detection of artificial and scene text in images and video frames. Pattern Anal. Appl. 16(3), 431–446 (2013)MathSciNetCrossRef Anthimopoulos, M., Gatos, B., Pratikakis, I.: Detection of artificial and scene text in images and video frames. Pattern Anal. Appl. 16(3), 431–446 (2013)MathSciNetCrossRef
14.
Zurück zum Zitat Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE International Conference on Image Processing, pp. 2609–2612 (2011). IEEE Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE International Conference on Image Processing, pp. 2609–2612 (2011). IEEE
15.
Zurück zum Zitat Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010). IEEE Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010). IEEE
16.
Zurück zum Zitat Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: European Conference on Computer Vision, pp. 497–511 (2014). Springer Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: European Conference on Computer Vision, pp. 497–511 (2014). Springer
17.
18.
Zurück zum Zitat Gordo, A.: Supervised mid-level features for word image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2956–2964 (2015) Gordo, A.: Supervised mid-level features for word image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2956–2964 (2015)
19.
Zurück zum Zitat Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp. 770–783 (2010). Springer Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp. 770–783 (2010). Springer
20.
Zurück zum Zitat Mishra, A., Alahari, K., Jawahar, C.: Image retrieval using textual cues. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3040–3047 (2013) Mishra, A., Alahari, K., Jawahar, C.: Image retrieval using textual cues. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3040–3047 (2013)
21.
Zurück zum Zitat Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633 (2007). IEEE Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633 (2007). IEEE
22.
Zurück zum Zitat Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016) Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
23.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefPubMed Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefPubMed
24.
Zurück zum Zitat Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013). IEEE Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013). IEEE
25.
Zurück zum Zitat Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019) Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
26.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:​1706.​03762 (2017)
27.
Zurück zum Zitat Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006) Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
29.
31.
Zurück zum Zitat Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., Ding, E.: Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122 (2020) Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., Ding, E.: Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122 (2020)
32.
Zurück zum Zitat Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 (2015). IEEE Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 (2015). IEEE
33.
34.
Zurück zum Zitat Atienza, R.: Vision transformer for fast and efficient scene text recognition. In: Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I, vol. 16, pp. 319–334 (2021). Springer Atienza, R.: Vision transformer for fast and efficient scene text recognition. In: Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I, vol. 16, pp. 319–334 (2021). Springer
35.
Zurück zum Zitat Wu, J., Peng, Y., Zhang, S., Qi, W., Zhang, J.: Masked vision-language transformers for scene text recognition. arXiv preprint arXiv:2211.04785 (2022) Wu, J., Peng, Y., Zhang, S., Qi, W., Zhang, J.: Masked vision-language transformers for scene text recognition. arXiv preprint arXiv:​2211.​04785 (2022)
36.
Zurück zum Zitat Wang, P., Da, C., Yao, C.: Multi-granularity prediction for scene text recognition. In: Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 339–355 (2022). Springer Wang, P., Da, C., Yao, C.: Multi-granularity prediction for scene text recognition. In: Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 339–355 (2022). Springer
37.
Zurück zum Zitat Xie, X., Fu, L., Zhang, Z., Wang, Z., Bai, X.: Toward understanding wordart: corner-guided transformer for scene text recognition. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 303–321 (2022). Springer Xie, X., Fu, L., Zhang, Z., Wang, Z., Bai, X.: Toward understanding wordart: corner-guided transformer for scene text recognition. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 303–321 (2022). Springer
38.
Zurück zum Zitat Aberdam, A., Ganz, R., Mazor, S., Litman, R.: Multimodal semi-supervised learning for text recognition. arXiv preprint arXiv:2205.03873 (2022) Aberdam, A., Ganz, R., Mazor, S., Litman, R.: Multimodal semi-supervised learning for text recognition. arXiv preprint arXiv:​2205.​03873 (2022)
39.
Zurück zum Zitat Yang, M., Liao, M., Lu, P., Wang, J., Zhu, S., Luo, H., Tian, Q., Bai, X.: Reading and writing: discriminative and generative modeling for self-supervised text recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4214–4223 (2022) Yang, M., Liao, M., Lu, P., Wang, J., Zhu, S., Luo, H., Tian, Q., Bai, X.: Reading and writing: discriminative and generative modeling for self-supervised text recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4214–4223 (2022)
40.
Zurück zum Zitat Chu, X., Wang, Y.: IterVM: iterative vision modeling module for scene text recognition. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 1393–1399 (2022). IEEE Chu, X., Wang, Y.: IterVM: iterative vision modeling module for scene text recognition. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 1393–1399 (2022). IEEE
41.
Zurück zum Zitat Du, Y., Chen, Z., Jia, C., Yin, X., Zheng, T., Li, C., Du, Y., Jiang, Y.-G.: Svtr: scene text recognition with a single visual model. arXiv preprint arXiv:2205.00159 (2022) Du, Y., Chen, Z., Jia, C., Yin, X., Zheng, T., Li, C., Du, Y., Jiang, Y.-G.: Svtr: scene text recognition with a single visual model. arXiv preprint arXiv:​2205.​00159 (2022)
42.
Zurück zum Zitat Zheng, C., Li, H., Rhee, S.-M., Han, S., Han, J.-J., Wang, P.: Pushing the performance limit of scene text recognizer without human annotation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14116–14125 (2022) Zheng, C., Li, H., Rhee, S.-M., Han, S., Han, J.-J., Wang, P.: Pushing the performance limit of scene text recognizer without human annotation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14116–14125 (2022)
43.
Zurück zum Zitat Chammas, E., Mokbel, C., Likforman-Sulem, L.: Handwriting recognition of historical documents with few labeled data. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 43–48 (2018). IEEE Chammas, E., Mokbel, C., Likforman-Sulem, L.: Handwriting recognition of historical documents with few labeled data. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 43–48 (2018). IEEE
44.
Zurück zum Zitat Kišš, M., Hradiš, M., Beneš, K., Buchal, P., Kula, M.: SoftCTC—Semi-Supervised Learning for Text Recognition using Soft Pseudo-labels. arXiv (2022). arXiv:2212.02135 Kišš, M., Hradiš, M., Beneš, K., Buchal, P., Kula, M.: SoftCTC—Semi-Supervised Learning for Text Recognition using Soft Pseudo-labels. arXiv (2022). arXiv:​2212.​02135
47.
Zurück zum Zitat Kass, D., Vats, E.: AttentionHTR: handwritten text recognition based on attention encoder–decoder networks. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems, pp. 507–522. Springer, Cham (2022)CrossRef Kass, D., Vats, E.: AttentionHTR: handwritten text recognition based on attention encoder–decoder networks. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems, pp. 507–522. Springer, Cham (2022)CrossRef
50.
Zurück zum Zitat Rahman, A., Kaykobad, M.: A complete Bengali OCR: a novel hybrid approach to handwritten Bengali character recognition. J. Comput. Inf. Technol. 6(4), 395–413 (1998) Rahman, A., Kaykobad, M.: A complete Bengali OCR: a novel hybrid approach to handwritten Bengali character recognition. J. Comput. Inf. Technol. 6(4), 395–413 (1998)
51.
Zurück zum Zitat Pal, U., Chaudhuri, B.B.: OCR in Bangla: an Indo-Bangladeshi language. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3—Conference C: Signal Processing (Cat. No.94CH3440-5), vol. 2, pp. 269–2732 (1994). https://doi.org/10.1109/ICPR.1994.576917 Pal, U., Chaudhuri, B.B.: OCR in Bangla: an Indo-Bangladeshi language. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3—Conference C: Signal Processing (Cat. No.94CH3440-5), vol. 2, pp. 269–2732 (1994). https://​doi.​org/​10.​1109/​ICPR.​1994.​576917
52.
Zurück zum Zitat Sattar, M., Rahman, S.: An experimental investigation on Bangla character recognition system. Bangladesh Comput. Soc. J. 4(1), 1–4 (1989) Sattar, M., Rahman, S.: An experimental investigation on Bangla character recognition system. Bangladesh Comput. Soc. J. 4(1), 1–4 (1989)
53.
Zurück zum Zitat Rahman, A.F.R., Fairhurst, M.: Multi-prototype classification: improved modelling of the variability of handwritten data using statistical clustering algorithms. Electron. Lett. 33(14), 1208–1210 (1997)ADSCrossRef Rahman, A.F.R., Fairhurst, M.: Multi-prototype classification: improved modelling of the variability of handwritten data using statistical clustering algorithms. Electron. Lett. 33(14), 1208–1210 (1997)ADSCrossRef
54.
Zurück zum Zitat Pal, U.: On the development of an optical character recognition (OCR) system for printed Bangla script. PhD thesis, Indian Statistical Institute, Calcutta (1997) Pal, U.: On the development of an optical character recognition (OCR) system for printed Bangla script. PhD thesis, Indian Statistical Institute, Calcutta (1997)
55.
Zurück zum Zitat Chaudhuri, B., Pal, U.: A complete printed Bangla OCR system. Pattern Recogn. 31(5), 531–549 (1998)ADSCrossRef Chaudhuri, B., Pal, U.: A complete printed Bangla OCR system. Pattern Recogn. 31(5), 531–549 (1998)ADSCrossRef
56.
Zurück zum Zitat Rahman, A.F.R., Fairhurst, M.C.: A new hybrid approach in combining multiple experts to recognise handwritten numerals. Pattern Recogn. Lett. 18(8), 781–790 (1997)ADSCrossRef Rahman, A.F.R., Fairhurst, M.C.: A new hybrid approach in combining multiple experts to recognise handwritten numerals. Pattern Recogn. Lett. 18(8), 781–790 (1997)ADSCrossRef
57.
Zurück zum Zitat Rahman, A.F.R., Rahman, R., Fairhurst, M.C.: Recognition of handwritten Bengali characters: a novel multistage approach. Pattern Recogn. 35(5), 997–1006 (2002)ADSCrossRef Rahman, A.F.R., Rahman, R., Fairhurst, M.C.: Recognition of handwritten Bengali characters: a novel multistage approach. Pattern Recogn. 35(5), 997–1006 (2002)ADSCrossRef
58.
Zurück zum Zitat Mahmud, J.U., Raihan, M.F., Rahman, C.M.: A complete OCR system for continuous Bengali characters. In: TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, vol. 4, pp. 1372–1376 (2003). IEEE Mahmud, J.U., Raihan, M.F., Rahman, C.M.: A complete OCR system for continuous Bengali characters. In: TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, vol. 4, pp. 1372–1376 (2003). IEEE
59.
Zurück zum Zitat Kamruzzaman, J., Aziz, S.: Improved machine recognition for Bangla characters. In: International Conference on Electrical and Computer Engineering 2004, pp. 557–560 (2004). ICECE 2004 Conference Secretariat, Bangladesh of Engineering and Technology Kamruzzaman, J., Aziz, S.: Improved machine recognition for Bangla characters. In: International Conference on Electrical and Computer Engineering 2004, pp. 557–560 (2004). ICECE 2004 Conference Secretariat, Bangladesh of Engineering and Technology
60.
Zurück zum Zitat Alam, M.M., Kashem, M.A.: A complete Bangla OCR system for printed characters. JCIT 1(01), 30–35 (2010) Alam, M.M., Kashem, M.A.: A complete Bangla OCR system for printed characters. JCIT 1(01), 30–35 (2010)
61.
Zurück zum Zitat Ahmed, S., Kashem, M.A.: Enhancing the character segmentation accuracy of Bangla OCR using BPNN. Int. J. Sci. Res. (IJSR) ISSN (Online), 2319–7064 (2013) Ahmed, S., Kashem, M.A.: Enhancing the character segmentation accuracy of Bangla OCR using BPNN. Int. J. Sci. Res. (IJSR) ISSN (Online), 2319–7064 (2013)
62.
Zurück zum Zitat Chowdhury, A.A., Ahmed, E., Ahmed, S., Hossain, S., Rahman, C.M.: Optical character recognition of Bangla characters using neural network: a better approach. In: 2nd ICEE (2002) Chowdhury, A.A., Ahmed, E., Ahmed, S., Hossain, S., Rahman, C.M.: Optical character recognition of Bangla characters using neural network: a better approach. In: 2nd ICEE (2002)
63.
Zurück zum Zitat Ahmed, S., Sakib, A.N., Ishtiaque Mahmud, M., Belali, H., Rahman, S.: The anatomy of Bangla OCR system for printed texts using back propagation neural network. Glob. J. Comput. Sci. Technol. (2012) Ahmed, S., Sakib, A.N., Ishtiaque Mahmud, M., Belali, H., Rahman, S.: The anatomy of Bangla OCR system for printed texts using back propagation neural network. Glob. J. Comput. Sci. Technol. (2012)
64.
Zurück zum Zitat Afroge, S., Ahmed, B., Hossain, A.: Bangla optical character recognition through segmentation using curvature distance and multilayer perceptron algorithm. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 253–257 (2017). IEEE Afroge, S., Ahmed, B., Hossain, A.: Bangla optical character recognition through segmentation using curvature distance and multilayer perceptron algorithm. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 253–257 (2017). IEEE
65.
Zurück zum Zitat Hossain, S.A., Tabassum, T.: Neural net based complete character recognition scheme for Bangla printed text books. In: 16th International Conference on Computer and Information Technology, pp. 71–75 (2014). IEEE Hossain, S.A., Tabassum, T.: Neural net based complete character recognition scheme for Bangla printed text books. In: 16th International Conference on Computer and Information Technology, pp. 71–75 (2014). IEEE
66.
Zurück zum Zitat Pramanik, R., Bag, S.: Shape decomposition-based handwritten compound character recognition for Bangla OCR. J. Vis. Commun. Image Represent. 50, 123–134 (2018)CrossRef Pramanik, R., Bag, S.: Shape decomposition-based handwritten compound character recognition for Bangla OCR. J. Vis. Commun. Image Represent. 50, 123–134 (2018)CrossRef
67.
Zurück zum Zitat Ghosh, R., Vamshi, C., Kumar, P.: RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recogn. 92, 203–218 (2019)ADSCrossRef Ghosh, R., Vamshi, C., Kumar, P.: RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recogn. 92, 203–218 (2019)ADSCrossRef
68.
Zurück zum Zitat Purkaystha, B., Datta, T., Islam, M.S.: Bengali handwritten character recognition using deep convolutional neural network. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–5 (2017). IEEE Purkaystha, B., Datta, T., Islam, M.S.: Bengali handwritten character recognition using deep convolutional neural network. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–5 (2017). IEEE
70.
Zurück zum Zitat Maity, S., Dey, A., Chowdhury, A., Banerjee, A.: Handwritten Bengali character recognition using deep convolution neural network. In: Bhattacharjee, A., Borgohain, S.K., Soni, B., Verma, G., Gao, X.-Z. (eds.) Machine Learning, Image Processing, Network Security and Data Sciences, pp. 84–92. Springer, Singapore (2020)CrossRef Maity, S., Dey, A., Chowdhury, A., Banerjee, A.: Handwritten Bengali character recognition using deep convolution neural network. In: Bhattacharjee, A., Borgohain, S.K., Soni, B., Verma, G., Gao, X.-Z. (eds.) Machine Learning, Image Processing, Network Security and Data Sciences, pp. 84–92. Springer, Singapore (2020)CrossRef
71.
Zurück zum Zitat Roy, A.: AKHCRNet: Bengali Handwritten Character Recognition Using Deep Learning (2020) Roy, A.: AKHCRNet: Bengali Handwritten Character Recognition Using Deep Learning (2020)
72.
Zurück zum Zitat Sharif, S., Mohammed, N., Momen, S., Mansoor, N.: Classification of Bangla compound characters using a HOG-CNN hybrid model. In: Proceedings of the International Conference on Computing and Communication Systems, pp. 403–411 (2018). Springer Sharif, S., Mohammed, N., Momen, S., Mansoor, N.: Classification of Bangla compound characters using a HOG-CNN hybrid model. In: Proceedings of the International Conference on Computing and Communication Systems, pp. 403–411 (2018). Springer
73.
Zurück zum Zitat Hasan, M.J., Wahid, M.F., Alom, M.S.: Bangla compound character recognition by combining deep convolutional neural network with bidirectional long short-term memory. In: 2019 4th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–4 (2019). IEEE Hasan, M.J., Wahid, M.F., Alom, M.S.: Bangla compound character recognition by combining deep convolutional neural network with bidirectional long short-term memory. In: 2019 4th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–4 (2019). IEEE
74.
Zurück zum Zitat Paul, D., Chaudhuri, B.B.: A BLSTM network for printed Bengali OCR system with high accuracy. arXiv preprint arXiv:1908.08674 (2019) Paul, D., Chaudhuri, B.B.: A BLSTM network for printed Bengali OCR system with high accuracy. arXiv preprint arXiv:​1908.​08674 (2019)
75.
Zurück zum Zitat Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010). JMLR Workshop and Conference Proceedings Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010). JMLR Workshop and Conference Proceedings
78.
Zurück zum Zitat Banik, M., Rifat, M.J.R., Nahar, J., Hasan, N., Rahman, F.: Okkhor: a synthetic corpus of Bangla printed characters. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Proceedings of the Future Technologies Conference (FTC) 2020, vol. 1, pp. 693–711. Springer, Cham (2021) Banik, M., Rifat, M.J.R., Nahar, J., Hasan, N., Rahman, F.: Okkhor: a synthetic corpus of Bangla printed characters. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Proceedings of the Future Technologies Conference (FTC) 2020, vol. 1, pp. 693–711. Springer, Cham (2021)
79.
Zurück zum Zitat Roark, B., Wolf-Sonkin, L., Kirov, C., Mielke, S.J., Johny, C., Demirsahin, I., Hall, K.: Processing South Asian languages written in the Latin script: the Dakshina dataset. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2413–2423. European Language Resources Association, Marseille, France (2020). https://aclanthology.org/2020.lrec-1.294 Roark, B., Wolf-Sonkin, L., Kirov, C., Mielke, S.J., Johny, C., Demirsahin, I., Hall, K.: Processing South Asian languages written in the Latin script: the Dakshina dataset. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2413–2423. European Language Resources Association, Marseille, France (2020). https://​aclanthology.​org/​2020.​lrec-1.​294
80.
Zurück zum Zitat Al Mumin, M.A., Shoeb, A.A.M., Selim, M.R., Iqbal, M.Z.: Sumono: a representative modern Bengali corpus. SUST J. Sci. Technol. 21(1), 78–86 (2014) Al Mumin, M.A., Shoeb, A.A.M., Selim, M.R., Iqbal, M.Z.: Sumono: a representative modern Bengali corpus. SUST J. Sci. Technol. 21(1), 78–86 (2014)
83.
Zurück zum Zitat Farahmand, A., Sarrafzadeh, H., Shanbehzadeh, J.: Document image noises and removal methods (2013) Farahmand, A., Sarrafzadeh, H., Shanbehzadeh, J.: Document image noises and removal methods (2013)
85.
87.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE Computer Society, Los Alamitos, CA, USA (2015). https://doi.org/10.1109/ICCV.2015.123 He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE Computer Society, Los Alamitos, CA, USA (2015). https://​doi.​org/​10.​1109/​ICCV.​2015.​123
88.
Zurück zum Zitat Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017)
Metadaten
Titel
A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR
verfasst von
Koushik Roy
Md Sazzad Hossain
Pritom Kumar Saha
Shadman Rohan
Imranul Ashrafi
Ifty Mohammad Rezwan
Fuad Rahman
B. M. Mainul Hossain
Ahmedul Kabir
Nabeel Mohammed
Publikationsdatum
05.08.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1/2024
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-023-00446-7

Weitere Artikel der Ausgabe 1/2024

International Journal on Document Analysis and Recognition (IJDAR) 1/2024 Zur Ausgabe

Premium Partner