nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

05.08.2023 | Original Paper

A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR

verfasst von: Koushik Roy, Md Sazzad Hossain, Pritom Kumar Saha, Shadman Rohan, Imranul Ashrafi, Ifty Mohammad Rezwan, Fuad Rahman, B. M. Mainul Hossain, Ahmedul Kabir, Nabeel Mohammed

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Bangla Optical Character Recognition (OCR) poses a unique challenge due to the presence of hundreds of diverse conjunct characters formed by the combination of two or more letters. In this paper, we propose two novel grapheme representation methods that improve the recognition of these conjunct characters and the overall performance of OCR in Bangla. We have utilized the popular Convolutional Recurrent Neural Network architecture and implemented our grapheme representation strategies to design the final labels of the model. Due to the absence of a large-scale Bangla word-level printed dataset, we created a synthetically generated Bangla corpus containing 2 million samples that are representative and sufficiently varied in terms of fonts, domain, and vocabulary size to train our Bangla OCR model. To test the various aspects of our model, we have also created 6 test protocols. Finally, to establish the generalizability of our grapheme representation methods, we have performed training and testing on external handwriting datasets. Experimental results proved the effectiveness of our novel approach. Furthermore, our synthetically generated training dataset and the test protocols are made available to serve as benchmarks for future Bangla OCR research.

Vorheriger Artikel Attribute-based document image retrieval

Nächster Artikel A deep learning-based solution for digitization of invoice images with automatic invoice generation and labelling

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

https://doi.org/10.6084/m9.figshare.20186825.

https://github.com/apurba-nsu-rnd-lab/bangla-ocr-crnn.

https://www.unicode.org/charts/PDF/U0980.pdf.

https://github.com/Belval/TextRecognitionDataGenerator.

Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)CrossRef

Congdon, P.: Bayesian Statistical Modelling. Wiley Series in Probability and Statistics, Wiley (2006). https://doi.org/10.1002/9780470035948CrossRef

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)CrossRefPubMed

Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)

Feng, X., Yao, H., Zhang, S.: Focal CTC loss for Chinese optical character recognition on unbalanced datasets. Complexity 2019 (2019)

Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv preprint arXiv:2005.13044 (2020)

Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016). https://doi.org/10.1007/s11263-015-0823-zMathSciNetCrossRef

10.

Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)CrossRefPubMed

11.

Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: Guided training of CTC towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11005–11012 (2020)

12.

Rifat, M.J.R., Banik, M., Hasan, N., Nahar, J., Rahman, F.: A novel machine annotated balanced Bangla OCR corpus. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) Comput. Vis. Image Process., pp. 149–160. Springer, Singapore (2021)CrossRef

13.

Anthimopoulos, M., Gatos, B., Pratikakis, I.: Detection of artificial and scene text in images and video frames. Pattern Anal. Appl. 16(3), 431–446 (2013)MathSciNetCrossRef

14.

Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE International Conference on Image Processing, pp. 2609–2612 (2011). IEEE

15.

Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010). IEEE

16.

Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: European Conference on Computer Vision, pp. 497–511 (2014). Springer

17.

Alsharif, O., Pineau, J.: End-to-end text recognition with hybrid hmm maxout models. arXiv preprint arXiv:1310.1811 (2013)

18.

Gordo, A.: Supervised mid-level features for word image representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2956–2964 (2015)

19.

Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp. 770–783 (2010). Springer

20.

Mishra, A., Alahari, K., Jawahar, C.: Image retrieval using textual cues. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3040–3047 (2013)

21.

Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633 (2007). IEEE

22.

Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)

23.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefPubMed

24.

Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013). IEEE

25.

Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

26.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

27.

Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)

28.

Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019). https://doi.org/10.1109/TPAMI.2018.2848939CrossRefPubMed

29.

Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5086–5094 (2017). https://doi.org/10.1109/ICCV.2017.543

30.

Litman, R., Anschel, O., Tsiper, S., Litman, R., Mazor, S., Manmatha, R.: Scatter: selective context attentional scene text recognizer. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11959–11969 (2020). https://doi.org/10.1109/CVPR42600.2020.01198

31.

Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., Ding, E.: Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122 (2020)

32.

Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 (2015). IEEE

33.

Feng, X., Yao, H., Qi, Y., Zhang, J., Zhang, S.: Scene text recognition via transformer. arXiv preprint arXiv:2003.08077 (2020)

34.

Atienza, R.: Vision transformer for fast and efficient scene text recognition. In: Document Analysis and Recognition–ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I, vol. 16, pp. 319–334 (2021). Springer

35.

Wu, J., Peng, Y., Zhang, S., Qi, W., Zhang, J.: Masked vision-language transformers for scene text recognition. arXiv preprint arXiv:2211.04785 (2022)

36.

Wang, P., Da, C., Yao, C.: Multi-granularity prediction for scene text recognition. In: Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 339–355 (2022). Springer

37.

Xie, X., Fu, L., Zhang, Z., Wang, Z., Bai, X.: Toward understanding wordart: corner-guided transformer for scene text recognition. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, pp. 303–321 (2022). Springer

38.

Aberdam, A., Ganz, R., Mazor, S., Litman, R.: Multimodal semi-supervised learning for text recognition. arXiv preprint arXiv:2205.03873 (2022)

39.

Yang, M., Liao, M., Lu, P., Wang, J., Zhu, S., Luo, H., Tian, Q., Bai, X.: Reading and writing: discriminative and generative modeling for self-supervised text recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4214–4223 (2022)

40.

Chu, X., Wang, Y.: IterVM: iterative vision modeling module for scene text recognition. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 1393–1399 (2022). IEEE

41.

Du, Y., Chen, Z., Jia, C., Yin, X., Zheng, T., Li, C., Du, Y., Jiang, Y.-G.: Svtr: scene text recognition with a single visual model. arXiv preprint arXiv:2205.00159 (2022)

42.

Zheng, C., Li, H., Rhee, S.-M., Han, S., Han, J.-J., Wang, P.: Pushing the performance limit of scene text recognizer without human annotation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14116–14125 (2022)

43.

Chammas, E., Mokbel, C., Likforman-Sulem, L.: Handwriting recognition of historical documents with few labeled data. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 43–48 (2018). IEEE

44.

Kišš, M., Hradiš, M., Beneš, K., Buchal, P., Kula, M.: SoftCTC—Semi-Supervised Learning for Text Recognition using Soft Pseudo-labels. arXiv (2022). arXiv:2212.02135

45.

Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recogn. 108, 107482 (2020). https://doi.org/10.1016/j.patcog.2020.107482CrossRef

46.

Maillette de Buy Wenniger, G., Schomaker, L., Way, A.: No padding please: efficient neural handwriting recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 355–362 (2019). https://doi.org/10.1109/ICDAR.2019.00064

47.

Kass, D., Vats, E.: AttentionHTR: handwritten text recognition based on attention encoder–decoder networks. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems, pp. 507–522. Springer, Cham (2022)CrossRef

48.

Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., Litman, R.: Scrabblegan: Semi-supervised varying length handwritten text generation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4323–4332 (2020). https://doi.org/10.1109/CVPR42600.2020.00438

49.

Souibgui, M.A., Fornés, A., Kessentini, Y., Megyesi, B.: Few shots are all you need: a progressive learning approach for low resource handwritten text recognition. Pattern Recogn. Lett. 160, 43–49 (2022). https://doi.org/10.1016/j.patrec.2022.06.003ADSCrossRef

50.

Rahman, A., Kaykobad, M.: A complete Bengali OCR: a novel hybrid approach to handwritten Bengali character recognition. J. Comput. Inf. Technol. 6(4), 395–413 (1998)

51.

Pal, U., Chaudhuri, B.B.: OCR in Bangla: an Indo-Bangladeshi language. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3—Conference C: Signal Processing (Cat. No.94CH3440-5), vol. 2, pp. 269–2732 (1994). https://doi.org/10.1109/ICPR.1994.576917

52.

Sattar, M., Rahman, S.: An experimental investigation on Bangla character recognition system. Bangladesh Comput. Soc. J. 4(1), 1–4 (1989)

53.

Rahman, A.F.R., Fairhurst, M.: Multi-prototype classification: improved modelling of the variability of handwritten data using statistical clustering algorithms. Electron. Lett. 33(14), 1208–1210 (1997)ADSCrossRef

54.

Pal, U.: On the development of an optical character recognition (OCR) system for printed Bangla script. PhD thesis, Indian Statistical Institute, Calcutta (1997)

55.

Chaudhuri, B., Pal, U.: A complete printed Bangla OCR system. Pattern Recogn. 31(5), 531–549 (1998)ADSCrossRef

56.

Rahman, A.F.R., Fairhurst, M.C.: A new hybrid approach in combining multiple experts to recognise handwritten numerals. Pattern Recogn. Lett. 18(8), 781–790 (1997)ADSCrossRef

57.

Rahman, A.F.R., Rahman, R., Fairhurst, M.C.: Recognition of handwritten Bengali characters: a novel multistage approach. Pattern Recogn. 35(5), 997–1006 (2002)ADSCrossRef

58.

Mahmud, J.U., Raihan, M.F., Rahman, C.M.: A complete OCR system for continuous Bengali characters. In: TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, vol. 4, pp. 1372–1376 (2003). IEEE

59.

Kamruzzaman, J., Aziz, S.: Improved machine recognition for Bangla characters. In: International Conference on Electrical and Computer Engineering 2004, pp. 557–560 (2004). ICECE 2004 Conference Secretariat, Bangladesh of Engineering and Technology

60.

Alam, M.M., Kashem, M.A.: A complete Bangla OCR system for printed characters. JCIT 1(01), 30–35 (2010)

61.

Ahmed, S., Kashem, M.A.: Enhancing the character segmentation accuracy of Bangla OCR using BPNN. Int. J. Sci. Res. (IJSR) ISSN (Online), 2319–7064 (2013)

62.

Chowdhury, A.A., Ahmed, E., Ahmed, S., Hossain, S., Rahman, C.M.: Optical character recognition of Bangla characters using neural network: a better approach. In: 2nd ICEE (2002)

63.

Ahmed, S., Sakib, A.N., Ishtiaque Mahmud, M., Belali, H., Rahman, S.: The anatomy of Bangla OCR system for printed texts using back propagation neural network. Glob. J. Comput. Sci. Technol. (2012)

64.

Afroge, S., Ahmed, B., Hossain, A.: Bangla optical character recognition through segmentation using curvature distance and multilayer perceptron algorithm. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 253–257 (2017). IEEE

65.

Hossain, S.A., Tabassum, T.: Neural net based complete character recognition scheme for Bangla printed text books. In: 16th International Conference on Computer and Information Technology, pp. 71–75 (2014). IEEE

66.

Pramanik, R., Bag, S.: Shape decomposition-based handwritten compound character recognition for Bangla OCR. J. Vis. Commun. Image Represent. 50, 123–134 (2018)CrossRef

67.

Ghosh, R., Vamshi, C., Kumar, P.: RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning. Pattern Recogn. 92, 203–218 (2019)ADSCrossRef

68.

Purkaystha, B., Datta, T., Islam, M.S.: Bengali handwritten character recognition using deep convolutional neural network. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–5 (2017). IEEE

69.

Islam, M.S., Rahman, M.M., Rahman, M.H., Rivolta, M.W., Aktaruzzaman, M.: Ratnet: a deep learning model for Bengali handwritten characters recognition. Multimed. Tools Appl. 81, 10631–10651 (2022). https://doi.org/10.1007/s11042-022-12070-4CrossRef

70.

Maity, S., Dey, A., Chowdhury, A., Banerjee, A.: Handwritten Bengali character recognition using deep convolution neural network. In: Bhattacharjee, A., Borgohain, S.K., Soni, B., Verma, G., Gao, X.-Z. (eds.) Machine Learning, Image Processing, Network Security and Data Sciences, pp. 84–92. Springer, Singapore (2020)CrossRef

71.

Roy, A.: AKHCRNet: Bengali Handwritten Character Recognition Using Deep Learning (2020)

72.

Sharif, S., Mohammed, N., Momen, S., Mansoor, N.: Classification of Bangla compound characters using a HOG-CNN hybrid model. In: Proceedings of the International Conference on Computing and Communication Systems, pp. 403–411 (2018). Springer

73.

Hasan, M.J., Wahid, M.F., Alom, M.S.: Bangla compound character recognition by combining deep convolutional neural network with bidirectional long short-term memory. In: 2019 4th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–4 (2019). IEEE

74.

Paul, D., Chaudhuri, B.B.: A BLSTM network for printed Bengali OCR system with high accuracy. arXiv preprint arXiv:1908.08674 (2019)

75.

Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010). JMLR Workshop and Conference Proceedings

76.

Rahman, M.A., Tabassum, N., Paul, M., Pal, R., Islam, M.K.: BN-HTRd: A Benchmark Dataset for Document Level Offline Bangla Handwritten Text Recognition (HTR) and Line Segmentation. arXiv (2022). https://doi.org/10.48550/ARXIV.2206.08977. https://arxiv.org/abs/2206.08977

77.

Mridha, M.F., Ohi, A.Q., Ali, M.A., Emon, M.I., Kabir, M.M.: Banglawriting: a multi-purpose offline Bangla handwriting dataset. Data Brief. 34, 106633 (2021). https://doi.org/10.1016/j.dib.2020.106633CrossRefPubMed

78.

Banik, M., Rifat, M.J.R., Nahar, J., Hasan, N., Rahman, F.: Okkhor: a synthetic corpus of Bangla printed characters. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Proceedings of the Future Technologies Conference (FTC) 2020, vol. 1, pp. 693–711. Springer, Cham (2021)

79.

Roark, B., Wolf-Sonkin, L., Kirov, C., Mielke, S.J., Johny, C., Demirsahin, I., Hall, K.: Processing South Asian languages written in the Latin script: the Dakshina dataset. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2413–2423. European Language Resources Association, Marseille, France (2020). https://aclanthology.org/2020.lrec-1.294

80.

Al Mumin, M.A., Shoeb, A.A.M., Selim, M.R., Iqbal, M.Z.: Sumono: a representative modern Bengali corpus. SUST J. Sci. Technol. 21(1), 78–86 (2014)

81.

Biswas, E.: Bangla Largest Newspaper Dataset. Kaggle (2021). https://doi.org/10.34740/KAGGLE/DSV/1857507. https://www.kaggle.com/dsv/1857507

82.

Ahmed, M.F., Mahmud, Z., Biash, Z.T., Ryen, A.A.N., Hossain, A., Ashraf, F.B.: Bangla Online Comments Dataset. Mendeley Data (2021). https://doi.org/10.17632/9xjx8twk8p.1. https://data.mendeley.com/datasets/9xjx8twk8p/1

83.

Farahmand, A., Sarrafzadeh, H., Shanbehzadeh, J.: Document image noises and removal methods (2013)

84.

Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2231–2239 (2016). https://doi.org/10.1109/CVPR.2016.245

85.

Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM (JACM) 21(1), 168–173 (1974)MathSciNetCrossRef

86.

Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information (2020). https://doi.org/10.3390/info11020125CrossRef

87.

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE Computer Society, Los Alamitos, CA, USA (2015). https://doi.org/10.1109/ICCV.2015.123

88.

Loshchilov, I., Hutter, F.: SGDR: Stochastic Gradient Descent with Warm Restarts (2017)

89.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Titel: A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR
verfasst von: Koushik Roy
Md Sazzad Hossain
Pritom Kumar Saha
Shadman Rohan
Imranul Ashrafi
Ifty Mohammad Rezwan
Fuad Rahman
B. M. Mainul Hossain
Ahmedul Kabir
Nabeel Mohammed
Publikationsdatum: 05.08.2023
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1/2024
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-023-00446-7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2024

BPFormNet: a lightweight block pyramid network for form segmentation and classification

Attribute-based document image retrieval

A deep learning-based solution for digitization of invoice images with automatic invoice generation and labelling

Correction to: MRZ code extraction from visa and passport documents using convolutional neural networks

Chinese text recognition enhanced by glyph and character semantic information

Chart classification: a survey and benchmarking of different state-of-the-art methods

Premium Partner