nach oben

International Journal of Multimedia Information Retrieval

Erschienen in:

01.12.2023 | Regular Paper

DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images

verfasst von: Shilpa Mahajan, Rajneesh Rani, Karan Trehan

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The recognition and detection of multioriented text from textual natural scene images are still challenging in the computer vision community. The segmentation on either word level or character level is a vital step in the entire end-to-end performance of the scene text recognition system. Many academicians and researchers have done work in the prominent field of segmenting the words or characters from complex document images as well as handwritten images for various non-Indian scripts. In this paper, we extensively presented a deep learning-based architecture named DELIGHT-Net which is derived from the general UNet architecture to segment the text at the word level from natural scene images. The method is mainly proposed to segment the Devanagari, Gurumukhi, and English scenic words from complete images collected from day-to-day life. To achieve this, we have introduced a new dataset, i.e., National Institute of Technology Jalandhar-Word Segmentation (NITJ-WS) which has around 2200 text blocks extracted from 1500 natural images containing unilingual, bilingual, and trilingual text. The benchmark comparative assessment of our dataset is performed with the proposed model and two state-of-the-art models, i.e., UNet and ResUNet. Statistical and visual results are evaluated using different evaluation parameters, which depict the efficiency of the proposed model. Some possible future directions are also recommended in the manuscript. We hope that our work is a stepping stone for academicians in the field of natural scene text recognition.

Vorheriger Artikel A comprehensive survey of multimodal fake news detection techniques: advances, challenges, and opportunities

Nächster Artikel Ornament image retrieval using few-shot learning

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Alghamdi A, Alluhaybi D, Almehmadi D, Alameer K, Siddeq SB, Alsubait T (2021) Text segmentation of historical Arabic handwritten manuscripts using projection profile. In: 2021 national computing colleges conference (NCCC), pp 1–6. https://doi.org/10.1109/NCCC49330.2021.9428836

Amara M, Zidi K, Ghedira K, Zidi S (2016) New rules to enhance the performances of histogram projection for segmenting small-sized Arabic words. In: International conference on hybrid intelligent systems. Springer, pp 167–176

Bansal V, Sinha RMK (2002) Segmentation of touching and fused Devanagari characters. Pattern Recogn 35:875–893. https://doi.org/10.1016/S0031-3203(01)00081-4CrossRefMATH

Basavaraju HT, Aradhya VN, Pavithra MS, Guru DS, Bhateja V (2021) Arbitrary oriented multilingual text detection and segmentation using level set and Gaussian mixture model. Evol Intell 14:881–894. https://doi.org/10.1007/s12065-020-00472-yCrossRef

Bhattacharya U, Parui SK, Mondal S (2009) Devanagari and Bangla text extraction from natural scene images. In: 2009 10th international conference on document analysis and recognition, pp 171–175. https://doi.org/10.1109/ICDAR.2009.178

Chaitra Y, Dinesh R (2022) An impact of radon transforms and filtering techniques for text localization in natural scene text images. In: ICT with intelligent applications: proceedings of ICTIS 2021, vol 1. Springer, pp 563–573

Chaitra Y, Dinesh R, Gopalakrishna M, Prakash BA (2021) Deep-cnntl: text localization from natural scene images using deep convolution neural network with transfer learning. Arab J Sci Eng. https://doi.org/10.1007/s13369-021-06309-9CrossRef

Chaitra Y, Dinesh R, Jeevan M, Arpitha M, Aishwarya V, Akshitha K (2022) An impact of yolov5 on text detection and recognition system using tesseractocr in images/video frames. In: 2022 IEEE international conference on data science and information system (ICDSIS). IEEE, pp 1–6

Dai Y, Huang Z, Gao Y, Xu Y, Chen K, Guo J, Qiu W (2018) Fused text segmentation networks for multi-oriented scene text detection. In: Proceedings: international conference on pattern recognition. IEEE, pp 3604–3609. https://doi.org/10.1109/ICPR.2018.8546066

10.

Dhok SB (2018) Multilingual character segmentation and recognition schemes for Indian document images. IEEE Access 6:10603–10617. https://doi.org/10.1109/ACCESS.2018.2795104CrossRef

11.

Diakogiannis FI, Waldner F, Caccetta P, Wu C (2020) Resunet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114CrossRef

12.

Firdaus FI, Khumaini A, Utaminingrum F (2017) Arabic letter segmentation using modified connected component labeling. In: 2017 international conference on sustainable information engineering and technology (SIET). IEEE, pp 392–397

13.

Jillani G, Hussain J, Yasmin M, Sharif M, Lawrence S (2018) A novel machine learning approach for scene text extraction. FuturE Gener Comput Syst 87:328–340. https://doi.org/10.1016/j.future.2018.04.074CrossRef

14.

Karaoglu S, Tao R, Gevers T, Smeulders AWM (2017) Words matter: scene text for image classification and retrieval. IEEE Trans Multimed 19:1063–1076. https://doi.org/10.1109/TMM.2016.2638622CrossRef

15.

Kaur RP, Jindal MK, Kumar M (2021) Text and graphics segmentation of newspapers printed in Gurmukhi script: a hybrid approach. Vis Comput 37:1637–1659. https://doi.org/10.1007/s00371-020-01927-0CrossRef

16.

Khare V, Shivakumara P, Chan CS, Lu T, Meng LK, Woon HH, Blumenstein M (2019) A novel character segmentation-reconstruction approach for license plate recognition. Expert Syst Appl 131:219–239CrossRef

17.

Kumar S, Gupta R, Khanna N, Chaudhury S, Joshi SD (2007) Text extraction and document image segmentation using matched wavelets and MRF model. IEEE Trans Image Process 16:2117–2128. https://doi.org/10.1109/TIP.2007.900098MathSciNetCrossRef

18.

Liao M, Pang G, Huang J, Hassner T, Bai X (2020) Mask textspotter v3: segmentation proposal network for robust scene text spotting. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, 23–28 Aug 2020, Proceedings, Part XI 16. Springer, pp 706–722

19.

Liu X (2005) An edge-based text region extraction algorithm for indoor mobile robot navigation. In: IEEE international conference mechatronics and automation, 2005, vol 2, pp 701–706. https://doi.org/10.1109/ICMA.2005.1626635

20.

Liu X (2006) Multiscale edge-based text extraction from complex images. Xiaoqing Liu and Jagath Samarabandu The University of Western Ontario Department of Electrical & Computer Engineering. Neural Computing and Applications, pp 1721–1724

21.

Lu T, Dooms A (2021) Probabilistic homogeneity for document image segmentation. Pattern Recognit. https://doi.org/10.1016/j.patcog.2020.107591CrossRef

22.

Ma J, Zhang H, Shan Y, Qie X, Xu X, Qi Z (2022) BTS: a bi-lingual benchmark for text segmentation in the wild. In: CVPR, pp 19152–19162

23.

Madi B, Droby A, El-Sana J (2022) Textline alignment on the image domain. Int J Doc Anal Recognit 25:415–427CrossRef

24.

Mahajan S, Rani R (2018) Text extraction from Indian and non-Indian natural scene images: a review. In: 2018 first international conference on secure cyber computing and communication (ICSCCC). IEEE, pp 584–588. https://doi.org/10.1109/ICSCCC.2018.8703369

25.

Mahajan S, Rani R (2019) A decade on script identification from natural images/videos: a review. In: 2019 international conference on issues and challenges in intelligent computing techniques (ICICT), pp 1–5. https://app.dimensions.ai/details/publication/pub.1124551290. https://doi.org/10.1109/icict46931.2019.8977630

26.

Mahajan S, Rani R (2021) Text detection and localization in scene images: a broad review. Artif Intell Rev 54:4317–4377CrossRef

27.

Mancas-Thillou C, Gosselin B (2005) Color text extraction from camera-based images: the impact of the choice of the clustering distance. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 312–316. https://doi.org/10.1109/ICDAR.2005.76

28.

Mechi O, Mehri M, Ingold R, Amara NEB (2019) Text line segmentation in historical document images using an adaptive U-net architecture. In: Proceedings of the international conference on document analysis and recognition, ICDAR, vol 1, pp 369–374. https://doi.org/10.1109/ICDAR.2019.00066

29.

Milosevic N, Gregson C, Hernandez R, Nenadic G (2019) A framework for information extraction from tables in biomedical literature. Int J Doc Anal Recognit 22:55–78CrossRef

30.

Nguyen DD (2022) Tablesegnet: a fully convolutional network for table detection and segmentation in document images. Int J Doc Anal Recognit 25:1–14CrossRef

31.

Papavassiliou V, Stafylakis T, Katsouros V, Carayannis G (2010) Handwritten document image segmentation into text lines and words. Pattern Recogn 43:369–377. https://doi.org/10.1016/j.patcog.2009.05.007CrossRefMATH

32.

Peng D, Jin L, Wu Y, Wang Z, Cai M (2019) A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 25–30. https://doi.org/10.1109/ICDAR.2019.00014

33.

Qomariyah F, Utaminingrum F, Mahmudy WF (2017) The segmentation of printed Arabic characters based on interest point. J Telecommun Electron Comput Eng 9:19–24

34.

Raj H, Ghosh R (2014) Devanagari text extraction from natural scene images. In: International conference on advances in computing,communications and informatics (ICACCI), pp 513–517

35.

Rajan V, Raj S (2017) Text detection and character extraction in natural scene images using fractional Poisson model. In: Proceedings of the IEEE 2017 international conference on computing methodologies and communication, pp 1136–1141

36.

Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14:618–627. https://doi.org/10.17485/ijst/v14i7.2146CrossRef

37.

Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14:618–627CrossRef

38.

Rong X, Yi C, Tian Y (2020) Unambiguous scene text segmentation with referring expression comprehension. IEEE Trans Image Process 29:591–601. https://doi.org/10.1109/TIP.2019.2930176MathSciNetCrossRefMATH

39.

Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241

40.

Saleem SI, Abdulazeez AM, Orman Z (2021) A new segmentation framework for Arabic handwritten text using machine learning techniques. Comput Mater Contin 68:2727–2754. https://doi.org/10.32604/cmc.2021.016447CrossRef

41.

Wang C, Zhao S, Zhu L, Luo K, Guo Y, Wang J, Liu S (2021) Semi-supervised pixel-level scene text segmentation by mutually guided network. IEEE Trans Image Process 30:8212–8221. https://doi.org/10.1109/TIP.2021.3113157CrossRef

42.

Xu X, Qi Z, Ma J, Zhang H, Shan Y, Qie X (2022) Bts: a bi-lingual benchmark for text segmentation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19152–19162

43.

Xu X, Zhang Z, Wang Z, Price B, Wang Z, Shi H (2021) Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12045–12055

44.

Yang H, Wu S, Member S, Deng C, Lin W, Member S (2015) Scale and orientation invariant text segmentation for born-digital compound images. IEEE Trans Cybern 45:519–533. https://doi.org/10.1109/TCYB.2014.2330657CrossRef

45.

Zhang C, Tao Y, Du K, Ding W, Wang B, Liu J, Wang W (2021) Character-level street view text spotting based on deep multisegmentation network for smarter autonomous driving. IEEE Trans Artif Intell 3:297–308. https://doi.org/10.1109/tai.2021.3116216CrossRef

46.

Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett 15:749–753CrossRef

Titel: DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images
verfasst von: Shilpa Mahajan
Rajneesh Rani
Karan Trehan
Publikationsdatum: 01.12.2023
Verlag: Springer London
Erschienen in: International Journal of Multimedia Information Retrieval / Ausgabe 2/2023
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-023-00293-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2023

Ornament image retrieval using few-shot learning

CoCoOpter: Pre-train, prompt, and fine-tune the vision-language model for few-shot image classification

SPSD: Similarity-preserving self-distillation for video–text retrieval

ConvST-LSTM-Net: convolutional spatiotemporal LSTM networks for skeleton-based human action recognition

FOF: a fine-grained object detection and feature extraction end-to-end network

Sentiment analysis using deep learning techniques: a comprehensive review

Premium Partner