Skip to main content
Erschienen in: International Journal of Multimedia Information Retrieval 2/2023

01.12.2023 | Regular Paper

DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images

verfasst von: Shilpa Mahajan, Rajneesh Rani, Karan Trehan

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The recognition and detection of multioriented text from textual natural scene images are still challenging in the computer vision community. The segmentation on either word level or character level is a vital step in the entire end-to-end performance of the scene text recognition system. Many academicians and researchers have done work in the prominent field of segmenting the words or characters from complex document images as well as handwritten images for various non-Indian scripts. In this paper, we extensively presented a deep learning-based architecture named DELIGHT-Net which is derived from the general UNet architecture to segment the text at the word level from natural scene images. The method is mainly proposed to segment the Devanagari, Gurumukhi, and English scenic words from complete images collected from day-to-day life. To achieve this, we have introduced a new dataset, i.e., National Institute of Technology Jalandhar-Word Segmentation (NITJ-WS) which has around 2200 text blocks extracted from 1500 natural images containing unilingual, bilingual, and trilingual text. The benchmark comparative assessment of our dataset is performed with the proposed model and two state-of-the-art models, i.e., UNet and ResUNet. Statistical and visual results are evaluated using different evaluation parameters, which depict the efficiency of the proposed model. Some possible future directions are also recommended in the manuscript. We hope that our work is a stepping stone for academicians in the field of natural scene text recognition.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Amara M, Zidi K, Ghedira K, Zidi S (2016) New rules to enhance the performances of histogram projection for segmenting small-sized Arabic words. In: International conference on hybrid intelligent systems. Springer, pp 167–176 Amara M, Zidi K, Ghedira K, Zidi S (2016) New rules to enhance the performances of histogram projection for segmenting small-sized Arabic words. In: International conference on hybrid intelligent systems. Springer, pp 167–176
6.
Zurück zum Zitat Chaitra Y, Dinesh R (2022) An impact of radon transforms and filtering techniques for text localization in natural scene text images. In: ICT with intelligent applications: proceedings of ICTIS 2021, vol 1. Springer, pp 563–573 Chaitra Y, Dinesh R (2022) An impact of radon transforms and filtering techniques for text localization in natural scene text images. In: ICT with intelligent applications: proceedings of ICTIS 2021, vol 1. Springer, pp 563–573
8.
Zurück zum Zitat Chaitra Y, Dinesh R, Jeevan M, Arpitha M, Aishwarya V, Akshitha K (2022) An impact of yolov5 on text detection and recognition system using tesseractocr in images/video frames. In: 2022 IEEE international conference on data science and information system (ICDSIS). IEEE, pp 1–6 Chaitra Y, Dinesh R, Jeevan M, Arpitha M, Aishwarya V, Akshitha K (2022) An impact of yolov5 on text detection and recognition system using tesseractocr in images/video frames. In: 2022 IEEE international conference on data science and information system (ICDSIS). IEEE, pp 1–6
11.
Zurück zum Zitat Diakogiannis FI, Waldner F, Caccetta P, Wu C (2020) Resunet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114CrossRef Diakogiannis FI, Waldner F, Caccetta P, Wu C (2020) Resunet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114CrossRef
12.
Zurück zum Zitat Firdaus FI, Khumaini A, Utaminingrum F (2017) Arabic letter segmentation using modified connected component labeling. In: 2017 international conference on sustainable information engineering and technology (SIET). IEEE, pp 392–397 Firdaus FI, Khumaini A, Utaminingrum F (2017) Arabic letter segmentation using modified connected component labeling. In: 2017 international conference on sustainable information engineering and technology (SIET). IEEE, pp 392–397
16.
Zurück zum Zitat Khare V, Shivakumara P, Chan CS, Lu T, Meng LK, Woon HH, Blumenstein M (2019) A novel character segmentation-reconstruction approach for license plate recognition. Expert Syst Appl 131:219–239CrossRef Khare V, Shivakumara P, Chan CS, Lu T, Meng LK, Woon HH, Blumenstein M (2019) A novel character segmentation-reconstruction approach for license plate recognition. Expert Syst Appl 131:219–239CrossRef
18.
Zurück zum Zitat Liao M, Pang G, Huang J, Hassner T, Bai X (2020) Mask textspotter v3: segmentation proposal network for robust scene text spotting. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, 23–28 Aug 2020, Proceedings, Part XI 16. Springer, pp 706–722 Liao M, Pang G, Huang J, Hassner T, Bai X (2020) Mask textspotter v3: segmentation proposal network for robust scene text spotting. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, 23–28 Aug 2020, Proceedings, Part XI 16. Springer, pp 706–722
20.
Zurück zum Zitat Liu X (2006) Multiscale edge-based text extraction from complex images. Xiaoqing Liu and Jagath Samarabandu The University of Western Ontario Department of Electrical & Computer Engineering. Neural Computing and Applications, pp 1721–1724 Liu X (2006) Multiscale edge-based text extraction from complex images. Xiaoqing Liu and Jagath Samarabandu The University of Western Ontario Department of Electrical & Computer Engineering. Neural Computing and Applications, pp 1721–1724
22.
Zurück zum Zitat Ma J, Zhang H, Shan Y, Qie X, Xu X, Qi Z (2022) BTS: a bi-lingual benchmark for text segmentation in the wild. In: CVPR, pp 19152–19162 Ma J, Zhang H, Shan Y, Qie X, Xu X, Qi Z (2022) BTS: a bi-lingual benchmark for text segmentation in the wild. In: CVPR, pp 19152–19162
23.
Zurück zum Zitat Madi B, Droby A, El-Sana J (2022) Textline alignment on the image domain. Int J Doc Anal Recognit 25:415–427CrossRef Madi B, Droby A, El-Sana J (2022) Textline alignment on the image domain. Int J Doc Anal Recognit 25:415–427CrossRef
26.
Zurück zum Zitat Mahajan S, Rani R (2021) Text detection and localization in scene images: a broad review. Artif Intell Rev 54:4317–4377CrossRef Mahajan S, Rani R (2021) Text detection and localization in scene images: a broad review. Artif Intell Rev 54:4317–4377CrossRef
27.
Zurück zum Zitat Mancas-Thillou C, Gosselin B (2005) Color text extraction from camera-based images: the impact of the choice of the clustering distance. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 312–316. https://doi.org/10.1109/ICDAR.2005.76 Mancas-Thillou C, Gosselin B (2005) Color text extraction from camera-based images: the impact of the choice of the clustering distance. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 312–316. https://​doi.​org/​10.​1109/​ICDAR.​2005.​76
28.
Zurück zum Zitat Mechi O, Mehri M, Ingold R, Amara NEB (2019) Text line segmentation in historical document images using an adaptive U-net architecture. In: Proceedings of the international conference on document analysis and recognition, ICDAR, vol 1, pp 369–374. https://doi.org/10.1109/ICDAR.2019.00066 Mechi O, Mehri M, Ingold R, Amara NEB (2019) Text line segmentation in historical document images using an adaptive U-net architecture. In: Proceedings of the international conference on document analysis and recognition, ICDAR, vol 1, pp 369–374. https://​doi.​org/​10.​1109/​ICDAR.​2019.​00066
29.
Zurück zum Zitat Milosevic N, Gregson C, Hernandez R, Nenadic G (2019) A framework for information extraction from tables in biomedical literature. Int J Doc Anal Recognit 22:55–78CrossRef Milosevic N, Gregson C, Hernandez R, Nenadic G (2019) A framework for information extraction from tables in biomedical literature. Int J Doc Anal Recognit 22:55–78CrossRef
30.
Zurück zum Zitat Nguyen DD (2022) Tablesegnet: a fully convolutional network for table detection and segmentation in document images. Int J Doc Anal Recognit 25:1–14CrossRef Nguyen DD (2022) Tablesegnet: a fully convolutional network for table detection and segmentation in document images. Int J Doc Anal Recognit 25:1–14CrossRef
32.
Zurück zum Zitat Peng D, Jin L, Wu Y, Wang Z, Cai M (2019) A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 25–30. https://doi.org/10.1109/ICDAR.2019.00014 Peng D, Jin L, Wu Y, Wang Z, Cai M (2019) A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In: Proceedings of the international conference on document analysis and recognition, ICDAR, pp 25–30. https://​doi.​org/​10.​1109/​ICDAR.​2019.​00014
33.
Zurück zum Zitat Qomariyah F, Utaminingrum F, Mahmudy WF (2017) The segmentation of printed Arabic characters based on interest point. J Telecommun Electron Comput Eng 9:19–24 Qomariyah F, Utaminingrum F, Mahmudy WF (2017) The segmentation of printed Arabic characters based on interest point. J Telecommun Electron Comput Eng 9:19–24
34.
Zurück zum Zitat Raj H, Ghosh R (2014) Devanagari text extraction from natural scene images. In: International conference on advances in computing,communications and informatics (ICACCI), pp 513–517 Raj H, Ghosh R (2014) Devanagari text extraction from natural scene images. In: International conference on advances in computing,communications and informatics (ICACCI), pp 513–517
35.
Zurück zum Zitat Rajan V, Raj S (2017) Text detection and character extraction in natural scene images using fractional Poisson model. In: Proceedings of the IEEE 2017 international conference on computing methodologies and communication, pp 1136–1141 Rajan V, Raj S (2017) Text detection and character extraction in natural scene images using fractional Poisson model. In: Proceedings of the IEEE 2017 international conference on computing methodologies and communication, pp 1136–1141
37.
Zurück zum Zitat Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14:618–627CrossRef Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14:618–627CrossRef
39.
Zurück zum Zitat Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241 Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
42.
Zurück zum Zitat Xu X, Qi Z, Ma J, Zhang H, Shan Y, Qie X (2022) Bts: a bi-lingual benchmark for text segmentation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19152–19162 Xu X, Qi Z, Ma J, Zhang H, Shan Y, Qie X (2022) Bts: a bi-lingual benchmark for text segmentation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19152–19162
43.
Zurück zum Zitat Xu X, Zhang Z, Wang Z, Price B, Wang Z, Shi H (2021) Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12045–12055 Xu X, Zhang Z, Wang Z, Price B, Wang Z, Shi H (2021) Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12045–12055
46.
Zurück zum Zitat Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett 15:749–753CrossRef Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sens Lett 15:749–753CrossRef
Metadaten
Titel
DELIGHT-Net: DEep and LIGHTweight network to segment Indian text at word level from wild scenic images
verfasst von
Shilpa Mahajan
Rajneesh Rani
Karan Trehan
Publikationsdatum
01.12.2023
Verlag
Springer London
Erschienen in
International Journal of Multimedia Information Retrieval / Ausgabe 2/2023
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-023-00293-6

Weitere Artikel der Ausgabe 2/2023

International Journal of Multimedia Information Retrieval 2/2023 Zur Ausgabe

Premium Partner