Skip to main content
Erschienen in: International Journal of Multimedia Information Retrieval 3/2022

05.07.2022 | Trends and Surveys

Text detection, recognition, and script identification in natural scene images: a Review

verfasst von: Veronica Naosekpam, Nilkanta Sahu

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text in natural scene images plays a vital role in scene understanding. It contains a rich and abundant amount of valuable semantic information useful in many applications such as analysis of products’ labels, autonomous driving, and blind navigation. Consequently, detection, recognition, and identification of scripts of texts present in scene images have recently received massive attention. This paper intends to walk through the advances on the mentioned topics, mainly focusing on the approaches proposed in the last 8–10 years. As per our knowledge, this paper is the first to provide a review on the scene text script identification. We also provide a clear and precise classification between conventional-, deep learning-, and hybrid-based methods, including their advantages and disadvantages. State-of-the-art evaluation metrics, benchmark datasets’ characteristics, and performances of the existing methods are also analyzed and discussed. Lastly, we present an insight into potential research directions to complete the review. We hope this review will provide a brief insight for the researchers into scene text understanding.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ali M, Foroosh H (2016) Character recognition in natural scene images using rank-1 tensor decomposition. In: 2016 IEEE International conference on image processing (ICIP), IEEE, pp 2891–2895 Ali M, Foroosh H (2016) Character recognition in natural scene images using rank-1 tensor decomposition. In: 2016 IEEE International conference on image processing (ICIP), IEEE, pp 2891–2895
3.
Zurück zum Zitat Atienza R (2021) Vision transformer for fast and efficient scene text recognition. In: International conference on document analysis and recognition, Springer, pp 319–334 Atienza R (2021) Vision transformer for fast and efficient scene text recognition. In: International conference on document analysis and recognition, Springer, pp 319–334
4.
Zurück zum Zitat Baek J, Kim G, Lee J, et al (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4715–4723 Baek J, Kim G, Lee J, et al (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4715–4723
5.
Zurück zum Zitat Bai B, Yin F, Liu CL (2014) A seed-based segmentation method for scene text extraction. In: 2014 11th IAPR international workshop on document analysis systems, IEEE, pp 262–266 Bai B, Yin F, Liu CL (2014) A seed-based segmentation method for scene text extraction. In: 2014 11th IAPR international workshop on document analysis systems, IEEE, pp 262–266
6.
Zurück zum Zitat Bai F, Cheng Z, Niu Y, et al (2018) Edit probability for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1508–1516 Bai F, Cheng Z, Niu Y, et al (2018) Edit probability for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1508–1516
7.
Zurück zum Zitat Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522CrossRef Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522CrossRef
8.
Zurück zum Zitat Berg AC, Berg TL, Malik J (2005) Shape matching and object recognition using low distortion correspondences. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 26–33 Berg AC, Berg TL, Malik J (2005) Shape matching and object recognition using low distortion correspondences. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 26–33
9.
Zurück zum Zitat Konwer A, Bhunia AK, Bhunia AK et al (2019) Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recognition 85:172–184CrossRef Konwer A, Bhunia AK, Bhunia AK et al (2019) Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recognition 85:172–184CrossRef
10.
Zurück zum Zitat Bissacco A, Cummins M, Netzer Y, et al (2013) Photoocr: uncontrolled conditions. In: Proceedings of the ieee international conference on computer vision, pp 785–792 Bissacco A, Cummins M, Netzer Y, et al (2013) Photoocr: uncontrolled conditions. In: Proceedings of the ieee international conference on computer vision, pp 785–792
11.
12.
Zurück zum Zitat Bookstein FL (1989) Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Mach Intell 11(6):567–585MATHCrossRef Bookstein FL (1989) Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Mach Intell 11(6):567–585MATHCrossRef
14.
Zurück zum Zitat Burie JC, Chazalon J, Coustaty M, et al (2015) Icdar2015 competition on smartphone document capture and ocr (smartdoc). In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp 1161–1165 Burie JC, Chazalon J, Coustaty M, et al (2015) Icdar2015 competition on smartphone document capture and ocr (smartdoc). In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp 1161–1165
16.
Zurück zum Zitat Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698CrossRef Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698CrossRef
18.
Zurück zum Zitat Chen D, Bourlard H, Thiran JP (2001) Text identification in complex background using svm. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, IEEE, pp II–II Chen D, Bourlard H, Thiran JP (2001) Text identification in complex background using svm. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, IEEE, pp II–II
20.
Zurück zum Zitat Chen J, Lian Z, Wang Y et al (2019) Irregular scene text detection via attention guided border labeling. Sci China Inf Sci 62(12):220103CrossRef Chen J, Lian Z, Wang Y et al (2019) Irregular scene text detection via attention guided border labeling. Sci China Inf Sci 62(12):220103CrossRef
22.
Zurück zum Zitat Ch’ng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 935–942 Ch’ng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 935–942
23.
Zurück zum Zitat Chng CK, Liu Y, Sun Y, et al (2019) Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1571–1576 Chng CK, Liu Y, Sun Y, et al (2019) Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1571–1576
24.
Zurück zum Zitat Cho MS, Seok JH, Lee S, et al (2011) Scene text extraction by superpixel crfs combining multiple character features. In: 2011 international conference on document analysis and recognition, IEEE, pp 1034–1038 Cho MS, Seok JH, Lee S, et al (2011) Scene text extraction by superpixel crfs combining multiple character features. In: 2011 international conference on document analysis and recognition, IEEE, pp 1034–1038
25.
Zurück zum Zitat Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRef Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRef
26.
Zurück zum Zitat Dai P, Zhang S, Zhang H, et al (2021) Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7393–7402 Dai P, Zhang S, Zhang H, et al (2021) Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7393–7402
27.
Zurück zum Zitat Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 886–893 Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 886–893
28.
Zurück zum Zitat Dastidar SG, Dutta K, Das N, et al (2021) Exploring knowledge distillation of a deep neural network for multi-script identification. In: International conference on computational intelligence in communications and business analytics, Springer, pp 150–162 Dastidar SG, Dutta K, Das N, et al (2021) Exploring knowledge distillation of a deep neural network for multi-script identification. In: International conference on computational intelligence in communications and business analytics, Springer, pp 150–162
29.
Zurück zum Zitat De Campos TE, Babu BR, Varma M et al (2009) Character recognition in natural images. VISAPP 2:7 De Campos TE, Babu BR, Varma M et al (2009) Character recognition in natural images. VISAPP 2:7
30.
Zurück zum Zitat Deng D, Liu H, Li X, et al (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence Deng D, Liu H, Li X, et al (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence
31.
Zurück zum Zitat Deng L, Gong Y, Lin Y et al (2019) Detecting multi-oriented text with corner-based region proposals. Neurocomputing 334:134–142CrossRef Deng L, Gong Y, Lin Y et al (2019) Detecting multi-oriented text with corner-based region proposals. Neurocomputing 334:134–142CrossRef
32.
Zurück zum Zitat Deng L, Gong Y, Lu X et al (2019) Stela: a real-time scene text detector with learned anchor. IEEE Access 7:153400–153407CrossRef Deng L, Gong Y, Lu X et al (2019) Stela: a real-time scene text detector with learned anchor. IEEE Access 7:153400–153407CrossRef
33.
Zurück zum Zitat Desolneux A, Moisan L, More JM (2003) A grouping principle and four applications. IEEE Trans Pattern Anal Mach Intell 25(4):508–513CrossRef Desolneux A, Moisan L, More JM (2003) A grouping principle and four applications. IEEE Trans Pattern Anal Mach Intell 25(4):508–513CrossRef
34.
Zurück zum Zitat Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:​2010.​11929
35.
Zurück zum Zitat Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2963–2970 Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2963–2970
36.
Zurück zum Zitat Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79CrossRef Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79CrossRef
37.
Zurück zum Zitat Feng Y, Song Y, Zhang Y (2016) Scene text detection based on multi-scale swt and edge filtering. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 645–650 Feng Y, Song Y, Zhang Y (2016) Scene text detection based on multi-scale swt and edge filtering. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 645–650
38.
Zurück zum Zitat Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: icml, Citeseer, pp 148–156 Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: icml, Citeseer, pp 148–156
39.
Zurück zum Zitat Fujii Y, Driesen K, Baccash J et al (2017) Sequence-to-label script identification for multilingual ocr. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 161–168 Fujii Y, Driesen K, Baccash J et al (2017) Sequence-to-label script identification for multilingual ocr. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 161–168
40.
Zurück zum Zitat Ghosh S, Chaudhuri BB (2011) Composite script identification and orientation detection for indian text images. In: 2011 international conference on document analysis and recognition, IEEE, pp 294–298 Ghosh S, Chaudhuri BB (2011) Composite script identification and orientation detection for indian text images. In: 2011 international conference on document analysis and recognition, IEEE, pp 294–298
43.
Zurück zum Zitat Gomez L, Karatzas D (2015) Object proposals for text extraction in the wild. In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp 206–210 Gomez L, Karatzas D (2015) Object proposals for text extraction in the wild. In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp 206–210
44.
Zurück zum Zitat Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 192–197 Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 192–197
45.
Zurück zum Zitat Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit 67:85–96CrossRef Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit 67:85–96CrossRef
46.
Zurück zum Zitat Gordo A (2015) Supervised mid-level features for word image representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2956–2964 Gordo A (2015) Supervised mid-level features for word image representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2956–2964
47.
Zurück zum Zitat Graves A, Fernández S, Gomez F et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369–376 Graves A, Fernández S, Gomez F et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369–376
48.
Zurück zum Zitat Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324 Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324
49.
Zurück zum Zitat Hanif SM, Prevost L (2009) Text detection and localization in complex scene images using constrained adaboost algorithm. In: 2009 10th international conference on document analysis and recognition, IEEE, pp 1–5 Hanif SM, Prevost L (2009) Text detection and localization in complex scene images using constrained adaboost algorithm. In: 2009 10th international conference on document analysis and recognition, IEEE, pp 1–5
50.
Zurück zum Zitat Hanif SM, Prevost L, Negri PA (2008) A cascade detector for text detection in natural scene images. In: 2008 19th international conference on pattern recognition, IEEE, pp 1–4 Hanif SM, Prevost L, Negri PA (2008) A cascade detector for text detection in natural scene images. In: 2008 19th international conference on pattern recognition, IEEE, pp 1–4
51.
Zurück zum Zitat He K, Gkioxari G, Dollar P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV) He K, Gkioxari G, Dollar P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV)
52.
Zurück zum Zitat He P, Huang W, Qiao Y et al (2016) Reading scene text in deep convolutional sequences. In: Proceedings of the AAAI conference on artificial intelligence He P, Huang W, Qiao Y et al (2016) Reading scene text in deep convolutional sequences. In: Proceedings of the AAAI conference on artificial intelligence
53.
Zurück zum Zitat He W, Zhang XY, Yin F et al (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE international conference on computer vision, pp 745–753 He W, Zhang XY, Yin F et al (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE international conference on computer vision, pp 745–753
54.
Zurück zum Zitat Hoiem D, Divvala SK, Hays JH (2009) Pascal voc 2008 challenge. World Lit Today Hoiem D, Divvala SK, Hays JH (2009) Pascal voc 2008 challenge. World Lit Today
55.
Zurück zum Zitat Hu H, Zhang C, Luo Y et al (2017) Wordsup: Exploiting word annotations for character based text detection. In: Proceedings of the IEEE international conference on computer vision (ICCV) Hu H, Zhang C, Luo Y et al (2017) Wordsup: Exploiting word annotations for character based text detection. In: Proceedings of the IEEE international conference on computer vision (ICCV)
56.
Zurück zum Zitat Huang L, Yang Y, Deng Y et al (2015) Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 Huang L, Yang Y, Deng Y et al (2015) Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:​1509.​04874
57.
Zurück zum Zitat Huang W, Lin Z, Yang J et al (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision, pp 1241–1248 Huang W, Lin Z, Yang J et al (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision, pp 1241–1248
58.
Zurück zum Zitat Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, Springer, pp 497–511 Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, Springer, pp 497–511
59.
Zurück zum Zitat Jaderberg M, Simonyan K, Vedaldi A et al (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 Jaderberg M, Simonyan K, Vedaldi A et al (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:​1406.​2227
60.
Zurück zum Zitat Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, Springer, pp 512–528 Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, Springer, pp 512–528
62.
Zurück zum Zitat Jaderberg M, Simonyan K, Vedaldi A et al (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef Jaderberg M, Simonyan K, Vedaldi A et al (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef
63.
Zurück zum Zitat Jiang Y, Zhu X, Wang X et al (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579 Jiang Y, Zhu X, Wang X et al (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:​1706.​09579
64.
Zurück zum Zitat Kang C, Kim G, Yoo S (2017) Detection and recognition of text embedded in online images via neural context models. In: Proceedings of the AAAI conference on artificial intelligence Kang C, Kim G, Yoo S (2017) Detection and recognition of text embedded in online images via neural context models. In: Proceedings of the AAAI conference on artificial intelligence
65.
Zurück zum Zitat Karatzas D, Shafait F, Uchida S et al (2013) Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition, IEEE, pp 1484–1493 Karatzas D, Shafait F, Uchida S et al (2013) Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition, IEEE, pp 1484–1493
66.
Zurück zum Zitat Karatzas D, Gomez-Bigorda L, Nicolaou A et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1156–1160 Karatzas D, Gomez-Bigorda L, Nicolaou A et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1156–1160
67.
68.
Zurück zum Zitat Kaur A, Shrawankar U (2017) Adverse conditions and techniques for cross-lingual text recognition. In: 2017 international conference on innovative mechanisms for industry applications (ICIMIA), IEEE, pp 70–74 Kaur A, Shrawankar U (2017) Adverse conditions and techniques for cross-lingual text recognition. In: 2017 international conference on innovative mechanisms for industry applications (ICIMIA), IEEE, pp 70–74
69.
Zurück zum Zitat Kaur H, Singh R (2018) Natural scene text localization by convolution neural network with svm Kaur H, Singh R (2018) Natural scene text localization by convolution neural network with svm
72.
Zurück zum Zitat Khatib T, Karajeh H, Mohammad H et al (2015) A hybrid multilevel text extraction algorithm in scene images. Sci Res Essays 10(3):105–113CrossRef Khatib T, Karajeh H, Mohammad H et al (2015) A hybrid multilevel text extraction algorithm in scene images. Sci Res Essays 10(3):105–113CrossRef
73.
Zurück zum Zitat Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans Pattern Anal Mach Intell 25(12):1631–1639CrossRef Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans Pattern Anal Mach Intell 25(12):1631–1639CrossRef
74.
Zurück zum Zitat Kobchaisawat T, Chalidabhongse TH, Satoh S (2020) Scene text detection with polygon offsetting and border augmentation. Electronics 9(1):117CrossRef Kobchaisawat T, Chalidabhongse TH, Satoh S (2020) Scene text detection with polygon offsetting and border augmentation. Electronics 9(1):117CrossRef
75.
Zurück zum Zitat Koo HI, Cho NI (2011) Text-line extraction in handwritten chinese documents based on an energy minimization framework. IEEE Trans Image Process 21(3):1169–1175MathSciNetMATH Koo HI, Cho NI (2011) Text-line extraction in handwritten chinese documents based on an energy minimization framework. IEEE Trans Image Process 21(3):1169–1175MathSciNetMATH
76.
Zurück zum Zitat Krasin I, Duerig T, Alldrin N, et al (2017) Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github com/openimages 2(3):18 Krasin I, Duerig T, Alldrin N, et al (2017) Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://​github com/openimages 2(3):18
77.
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
78.
Zurück zum Zitat Kumar D, Prasad MA, Ramakrishnan A (2013) Multi-script robust reading competition in icdar 2013. In: Proceedings of the 4th international workshop on multilingual OCR, pp 1–5 Kumar D, Prasad MA, Ramakrishnan A (2013) Multi-script robust reading competition in icdar 2013. In: Proceedings of the 4th international workshop on multilingual OCR, pp 1–5
79.
Zurück zum Zitat Lazebnik S, Schmid C, Ponce J (2005) A sparse texture representation using local affine regions. IEEE Trans Pattern Anal Mach Intell 27(8):1265–1278CrossRef Lazebnik S, Schmid C, Ponce J (2005) A sparse texture representation using local affine regions. IEEE Trans Pattern Anal Mach Intell 27(8):1265–1278CrossRef
80.
Zurück zum Zitat Lee CY, Bhardwaj A, Di W et al (2014) Region-based discriminative feature pooling for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4050–4057 Lee CY, Bhardwaj A, Di W et al (2014) Region-based discriminative feature pooling for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4050–4057
81.
Zurück zum Zitat Lee CY, Baek Y, Lee H (2019) Tedeval: A fair evaluation metric for scene text detectors. In: 2019 international conference on document analysis and recognition workshops (ICDARW), IEEE, pp 14–17 Lee CY, Baek Y, Lee H (2019) Tedeval: A fair evaluation metric for scene text detectors. In: 2019 international conference on document analysis and recognition workshops (ICDARW), IEEE, pp 14–17
82.
Zurück zum Zitat Lee S, Cho MS, Jung K, et al (2010) Scene text extraction with edge constraint and text collinearity. In: 2010 20th international conference on pattern recognition, IEEE, pp 3983–3986 Lee S, Cho MS, Jung K, et al (2010) Scene text extraction with edge constraint and text collinearity. In: 2010 20th international conference on pattern recognition, IEEE, pp 3983–3986
83.
Zurück zum Zitat Li Y, Lu H (2012) Scene text detection via stroke width. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 681–684 Li Y, Lu H (2012) Scene text detection via stroke width. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 681–684
84.
Zurück zum Zitat Liao M, Shi B, Bai X et al (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI conference on artificial intelligence Liao M, Shi B, Bai X et al (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI conference on artificial intelligence
85.
Zurück zum Zitat Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690MathSciNetMATHCrossRef Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690MathSciNetMATHCrossRef
86.
Zurück zum Zitat Liao M, Zhu Z, Shi B et al (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5909–5918 Liao M, Zhu Z, Shi B et al (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5909–5918
88.
Zurück zum Zitat Lienhart RW, Stuber F (1996) Automatic text recognition in digital videos. In: Image and video processing IV, international society for optics and photonics, pp 180–188 Lienhart RW, Stuber F (1996) Automatic text recognition in digital videos. In: Image and video processing IV, international society for optics and photonics, pp 180–188
89.
Zurück zum Zitat Lin CH, Lucey S (2017) Inverse compositional spatial transformer networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2568–2576 Lin CH, Lucey S (2017) Inverse compositional spatial transformer networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2568–2576
90.
Zurück zum Zitat Lin TY, Dollar P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Lin TY, Dollar P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
91.
Zurück zum Zitat Litman R, Anschel O, Tsiper S et al (2020) Scatter: selective context attentional scene text recognizer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Litman R, Anschel O, Tsiper S et al (2020) Scatter: selective context attentional scene text recognizer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
93.
Zurück zum Zitat Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37 Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
94.
Zurück zum Zitat Liu X, Kawanishi T, Wu X et al (2016) Scene text recognition with cnn classifier and wfst-based word labeling. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 3999–4004 Liu X, Kawanishi T, Wu X et al (2016) Scene text recognition with cnn classifier and wfst-based word labeling. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 3999–4004
95.
Zurück zum Zitat Liu X, Liang D, Yan S et al (2018) Fots: Fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Liu X, Liang D, Yan S et al (2018) Fots: Fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
96.
Zurück zum Zitat Liu Y, Jin L, Xie Z et al (2019) Tightness-aware evaluation protocol for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9612–9620 Liu Y, Jin L, Xie Z et al (2019) Tightness-aware evaluation protocol for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9612–9620
97.
Zurück zum Zitat Liu Y, Chen H, Shen C, et al (2020) Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9809–9818 Liu Y, Chen H, Shen C, et al (2020) Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9809–9818
98.
Zurück zum Zitat Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440 Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
99.
Zurück zum Zitat Long S, Ruan J, Zhang W et al (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36 Long S, Ruan J, Zhang W et al (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36
100.
Zurück zum Zitat Lowe G (2004) Sift-the scale invariant feature transform. Int J 2(91–110):2 Lowe G (2004) Sift-the scale invariant feature transform. Int J 2(91–110):2
101.
Zurück zum Zitat Lu L, Wu D, Tang Z et al (2021) Mining discriminative patches for script identification in natural scene images. J Intell Fuzzy Syst 40(1):551–563CrossRef Lu L, Wu D, Tang Z et al (2021) Mining discriminative patches for script identification in natural scene images. J Intell Fuzzy Syst 40(1):551–563CrossRef
102.
Zurück zum Zitat Lu M, Mou Y, Chen CL et al (2021) An efficient text detection model for street signs. Appl Sci 11(13):5962CrossRef Lu M, Mou Y, Chen CL et al (2021) An efficient text detection model for street signs. Appl Sci 11(13):5962CrossRef
104.
Zurück zum Zitat Lucas SM, Panaretos A, Sosa L et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. Int J Doc Anal Recognit (IJDAR) 7(2–3):105–122CrossRef Lucas SM, Panaretos A, Sosa L et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. Int J Doc Anal Recognit (IJDAR) 7(2–3):105–122CrossRef
105.
Zurück zum Zitat Luo C, Jin L, Sun Z (2019) Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognit 90:109–118CrossRef Luo C, Jin L, Sun Z (2019) Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognit 90:109–118CrossRef
106.
Zurück zum Zitat Lyu P, Liao M, Yao C et al (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV) Lyu P, Liao M, Yao C et al (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV)
107.
Zurück zum Zitat Lyu P, Yao C, Wu W et al (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563 Lyu P, Yao C, Wu W et al (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563
108.
Zurück zum Zitat Ma J, Shao W, Ye H et al (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122CrossRef Ma J, Shao W, Ye H et al (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122CrossRef
109.
Zurück zum Zitat Ma M, Wang QF, Huang S et al (2021) Residual attention-based multi-scale script identification in scene text images. Neurocomputing 421:222–233CrossRef Ma M, Wang QF, Huang S et al (2021) Residual attention-based multi-scale script identification in scene text images. Neurocomputing 421:222–233CrossRef
110.
Zurück zum Zitat Mahajan S, Rani R (2022) Word level script identification using convolutional neural network enhancement for scenic images. Trans Asian Low-Res Lang Inf Process 21(4):1–29CrossRef Mahajan S, Rani R (2022) Word level script identification using convolutional neural network enhancement for scenic images. Trans Asian Low-Res Lang Inf Process 21(4):1–29CrossRef
111.
Zurück zum Zitat Matas J, Chum O, Urban M et al (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767CrossRef Matas J, Chum O, Urban M et al (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767CrossRef
112.
Zurück zum Zitat Mathew M, Jain M, Jawahar C (2017) Benchmarking scene text recognition in devanagari, telugu and malayalam. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 42–46 Mathew M, Jain M, Jawahar C (2017) Benchmarking scene text recognition in devanagari, telugu and malayalam. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 42–46
113.
Zurück zum Zitat Mei J, Dai L, Shi B et al (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 4053–4058 Mei J, Dai L, Shi B et al (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 4053–4058
114.
Zurück zum Zitat Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: BMVC-British machine vision conference, BMVA Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: BMVC-British machine vision conference, BMVA
115.
Zurück zum Zitat Mohanty S, Dutta T, Gupta HP (2018) Recurrent global convolutional network for scene text detection. In: 2018 25th IEEE international conference on image processing (ICIP), IEEE, pp 2750–2754 Mohanty S, Dutta T, Gupta HP (2018) Recurrent global convolutional network for scene text detection. In: 2018 25th IEEE international conference on image processing (ICIP), IEEE, pp 2750–2754
116.
Zurück zum Zitat Nagaoka Y, Miyazaki T, Sugaya Y et al (2021) Text detection using multi-stage region proposal network sensitive to text scale. Sensors 21(4):1232CrossRef Nagaoka Y, Miyazaki T, Sugaya Y et al (2021) Text detection using multi-stage region proposal network sensitive to text scale. Sensors 21(4):1232CrossRef
117.
Zurück zum Zitat Naosekpam V, Bhowmick A, Hazarika SM (2019) Superpixel correspondence for non-parametric scene parsing of natural images. In: International conference on pattern recognition and machine intelligence, Springer, pp 614–622 Naosekpam V, Bhowmick A, Hazarika SM (2019) Superpixel correspondence for non-parametric scene parsing of natural images. In: International conference on pattern recognition and machine intelligence, Springer, pp 614–622
118.
Zurück zum Zitat Naosekpam V, Paul N, Bhowmick A (2019) Dense and partial correspondence in non-parametric scene parsing. In: International conference on machine intelligence and signal processing, Springer, pp 339–350 Naosekpam V, Paul N, Bhowmick A (2019) Dense and partial correspondence in non-parametric scene parsing. In: International conference on machine intelligence and signal processing, Springer, pp 339–350
120.
Zurück zum Zitat Naosekpam V, Shishir AS, Sahu N (2021) Scene text recognition with orientation rectification via ic-stn. In: TENCON 2021-2021 IEEE region 10 conference (TENCON), IEEE, pp 664–669 Naosekpam V, Shishir AS, Sahu N (2021) Scene text recognition with orientation rectification via ic-stn. In: TENCON 2021-2021 IEEE region 10 conference (TENCON), IEEE, pp 664–669
121.
Zurück zum Zitat Naosekpam V, Aggarwal S, Sahu N (2022) Utextnet: a unet based arbitrary shaped scene text detector. In: Abraham A, Gandhi N, Hanne T et al (eds) Intell Syst Des Appl. Springer International Publishing, Cham, pp 368–378 Naosekpam V, Aggarwal S, Sahu N (2022) Utextnet: a unet based arbitrary shaped scene text detector. In: Abraham A, Gandhi N, Hanne T et al (eds) Intell Syst Des Appl. Springer International Publishing, Cham, pp 368–378
122.
Zurück zum Zitat Nayef N, Yin F, Bizid I et al (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1454–1459 Nayef N, Yin F, Bizid I et al (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1454–1459
123.
Zurück zum Zitat Nayef N, Patel Y, Busta M et al (2019) Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition-rrc-mlt-2019. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1582–1587 Nayef N, Patel Y, Busta M et al (2019) Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition-rrc-mlt-2019. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1582–1587
124.
Zurück zum Zitat Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Springer, pp 770–783 Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Springer, pp 770–783
125.
Zurück zum Zitat Nicolaou A, Bagdanov AD, Liwicki M et al (2015) Sparse radial sampling lbp for writer identification. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 716–720 Nicolaou A, Bagdanov AD, Liwicki M et al (2015) Sparse radial sampling lbp for writer identification. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 716–720
126.
Zurück zum Zitat Novikova T, Barinova O, Kohli P et al (2012) Large-lexicon attribute-consistent text recognition in natural images. In: European conference on computer vision, Springer, pp 752–765 Novikova T, Barinova O, Kohli P et al (2012) Large-lexicon attribute-consistent text recognition in natural images. In: European conference on computer vision, Springer, pp 752–765
127.
Zurück zum Zitat Pan YF, Hou X, Liu CL (2009) Text localization in natural scene images based on conditional random field. In: 2009 10th international conference on document analysis and recognition, IEEE, pp 6–10 Pan YF, Hou X, Liu CL (2009) Text localization in natural scene images based on conditional random field. In: 2009 10th international conference on document analysis and recognition, IEEE, pp 6–10
128.
Zurück zum Zitat Pan YF, Hou X, Liu CL (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813MathSciNetMATH Pan YF, Hou X, Liu CL (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813MathSciNetMATH
130.
Zurück zum Zitat Qiao Z, Zhou Y, Yang D et al (2020) Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Qiao Z, Zhou Y, Yang D et al (2020) Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
131.
Zurück zum Zitat Qin H, Zhang H, Wang H et al (2019) An algorithm for scene text detection using multibox and semantic segmentation. Appl Sci 9(6):1054CrossRef Qin H, Zhang H, Wang H et al (2019) An algorithm for scene text detection using multibox and semantic segmentation. Appl Sci 9(6):1054CrossRef
133.
Zurück zum Zitat Raghunandan K, Shivakumara P, Roy S et al (2018) Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans Circuits Syst Video Technol 29(4):1145–1162CrossRef Raghunandan K, Shivakumara P, Roy S et al (2018) Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans Circuits Syst Video Technol 29(4):1145–1162CrossRef
134.
Zurück zum Zitat Rahul Y, Sharma RK (2019) Eeg signal-based movement control for mobile robots. Curr Sci 116(12):1993–2000CrossRef Rahul Y, Sharma RK (2019) Eeg signal-based movement control for mobile robots. Curr Sci 116(12):1993–2000CrossRef
136.
Zurück zum Zitat Redmon J, Divvala S, Girshick R et al (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788 Redmon J, Divvala S, Girshick R et al (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
137.
Zurück zum Zitat Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:​1506.​01497
138.
Zurück zum Zitat Risnumawan A, Shivakumara P, Chan CS et al (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048CrossRef Risnumawan A, Shivakumara P, Chan CS et al (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048CrossRef
139.
Zurück zum Zitat Rodriguez-Serrano JA, Gordo A, Perronnin F (2015) Label embedding: a frugal baseline for text recognition. Int J Comput Vis 113(3):193–207CrossRef Rodriguez-Serrano JA, Gordo A, Perronnin F (2015) Label embedding: a frugal baseline for text recognition. Int J Comput Vis 113(3):193–207CrossRef
140.
Zurück zum Zitat Rusinol M, Chazalon J, Ogier JM (2014) Combining focus measure operators to predict ocr accuracy in mobile-captured document images. In: 2014 11th IAPR international workshop on document analysis systems, IEEE, pp 181–185 Rusinol M, Chazalon J, Ogier JM (2014) Combining focus measure operators to predict ocr accuracy in mobile-captured document images. In: 2014 11th IAPR international workshop on document analysis systems, IEEE, pp 181–185
141.
Zurück zum Zitat Sen P, Das A, Sahu N (2021) End-to-end scene text recognition system for devanagari and bengali text. In: International conference on intelligent computing & optimization, Springer, pp 352–359 Sen P, Das A, Sahu N (2021) End-to-end scene text recognition system for devanagari and bengali text. In: International conference on intelligent computing & optimization, Springer, pp 352–359
142.
Zurück zum Zitat Shahab A, Shafait F, Dengel A (2011) Icdar 2011 robust reading competition challenge 2: scene images. In: 2011 international conference on document analysis and recognition, IEEE, pp 1491–1496 Shahab A, Shafait F, Dengel A (2011) Icdar 2011 robust reading competition challenge 2: scene images. In: 2011 international conference on document analysis and recognition, IEEE, pp 1491–1496
143.
Zurück zum Zitat Shao HL, Ji Y, Li Y et al (2021) Bdfpn: Bi-direction feature pyramid network for scene text detection. In: 2021 international joint conference on neural networks (IJCNN), IEEE, pp 1–8 Shao HL, Ji Y, Li Y et al (2021) Bdfpn: Bi-direction feature pyramid network for scene text detection. In: 2021 international joint conference on neural networks (IJCNN), IEEE, pp 1–8
144.
Zurück zum Zitat Sharma N, Mandal R, Sharma R et al (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1196–1200 Sharma N, Mandal R, Sharma R et al (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1196–1200
145.
Zurück zum Zitat Shi B, Yao C, Zhang C et al (2015) Automatic script identification in the wild. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 531–535 Shi B, Yao C, Zhang C et al (2015) Automatic script identification in the wild. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 531–535
146.
Zurück zum Zitat Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304CrossRef Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304CrossRef
147.
Zurück zum Zitat Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recognit 52:448–458CrossRef Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recognit 52:448–458CrossRef
149.
Zurück zum Zitat Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558 Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558
150.
Zurück zum Zitat Shi B, Yao C, Liao M et al (2017) Icdar2017 competition on reading chinese text in the wild (rctw-17). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1429–1434 Shi B, Yao C, Liao M et al (2017) Icdar2017 competition on reading chinese text in the wild (rctw-17). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1429–1434
151.
Zurück zum Zitat Shi B, Yang M, Wang X et al (2018) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41(9):2035–2048CrossRef Shi B, Yang M, Wang X et al (2018) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41(9):2035–2048CrossRef
152.
Zurück zum Zitat Singh AK, Mishra A, Dabral P et al (2016) A simple and effective solution for script identification in the wild. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 428–433 Singh AK, Mishra A, Dabral P et al (2016) A simple and effective solution for script identification in the wild. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 428–433
153.
Zurück zum Zitat Su B, Lu S (2014) Accurate scene text recognition based on recurrent neural network. In: Asian conference on computer vision, Springer, pp 35–48 Su B, Lu S (2014) Accurate scene text recognition based on recurrent neural network. In: Asian conference on computer vision, Springer, pp 35–48
154.
Zurück zum Zitat Tian Z, Huang W, He T et al (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Springer, pp 56–72 Tian Z, Huang W, He T et al (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Springer, pp 56–72
155.
Zurück zum Zitat Varma M, Zisserman A (2002) Classifying images of materials: achieving viewpoint and illumination independence. In: European conference on computer vision, Springer, pp 255–271 Varma M, Zisserman A (2002) Classifying images of materials: achieving viewpoint and illumination independence. In: European conference on computer vision, Springer, pp 255–271
156.
Zurück zum Zitat Varma M, Zisserman A (2003) Texture classification: are filter banks necessary? In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings., IEEE, pp II–691 Varma M, Zisserman A (2003) Texture classification: are filter banks necessary? In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings., IEEE, pp II–691
157.
Zurück zum Zitat Veit A, Matera T, Neumann L et al (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 Veit A, Matera T, Neumann L et al (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:​1601.​07140
158.
Zurück zum Zitat Verma M, Sood N, Roy PP et al (2017) Script identification in natural scene images: a dataset and texture-feature based performance evaluation. In: Proceedings of international conference on computer vision and image processing, Springer, pp 309–319 Verma M, Sood N, Roy PP et al (2017) Script identification in natural scene images: a dataset and texture-feature based performance evaluation. In: Proceedings of international conference on computer vision and image processing, Springer, pp 309–319
160.
Zurück zum Zitat Wang K, Belongie S (2010) Word spotting in the wild. In: European conference on computer vision, Springer, pp 591–604 Wang K, Belongie S (2010) Word spotting in the wild. In: European conference on computer vision, Springer, pp 591–604
161.
Zurück zum Zitat Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 international conference on computer vision, IEEE, pp 1457–1464 Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 international conference on computer vision, IEEE, pp 1457–1464
162.
Zurück zum Zitat Wang Q, Zheng Y, Betke M (2020) A method for detecting text of arbitrary shapes in natural scenes that improves text spotting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops Wang Q, Zheng Y, Betke M (2020) A method for detecting text of arbitrary shapes in natural scenes that improves text spotting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops
163.
Zurück zum Zitat Wang T, Wu DJ, Coates A et al (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 3304–3308 Wang T, Wu DJ, Coates A et al (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 3304–3308
164.
Zurück zum Zitat Wang W, Xie E, Li X et al (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9336–9345 Wang W, Xie E, Li X et al (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9336–9345
165.
Zurück zum Zitat Wang W, Xie E, Song X et al (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8440–8449 Wang W, Xie E, Song X et al (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8440–8449
166.
Zurück zum Zitat Wang X, Jiang Y, Luo Z et al (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6449–6458 Wang X, Jiang Y, Luo Z et al (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6449–6458
167.
Zurück zum Zitat Wang Y, Shi C, Xiao B et al (2015) Mrf based text binarization in complex images using stroke feature. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 821–825 Wang Y, Shi C, Xiao B et al (2015) Mrf based text binarization in complex images using stroke feature. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 821–825
168.
Zurück zum Zitat Wang Y, Xie H, Zha ZJ et al (2020) Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11753–11762 Wang Y, Xie H, Zha ZJ et al (2020) Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11753–11762
169.
Zurück zum Zitat Wu Y, Natarajan P (2017) Self-organized text detection with minimal post-processing via border learning. In: Proceedings of the IEEE international conference on computer vision, pp 5000–5009 Wu Y, Natarajan P (2017) Self-organized text detection with minimal post-processing via border learning. In: Proceedings of the IEEE international conference on computer vision, pp 5000–5009
170.
Zurück zum Zitat Xiang D, Guo Q, Xia Y (2016) Robust text detection with vertically-regressed proposal network. In: European conference on computer vision, Springer, pp 351–363 Xiang D, Guo Q, Xia Y (2016) Robust text detection with vertically-regressed proposal network. In: European conference on computer vision, Springer, pp 351–363
171.
Zurück zum Zitat Xie E, Zang Y, Shao S et al (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9038–9045 Xie E, Zang Y, Shao S et al (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9038–9045
173.
Zurück zum Zitat Yan R, Peng L, Xiao S et al (2021) Primitive representation learning for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 284–293 Yan R, Peng L, Xiao S et al (2021) Primitive representation learning for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 284–293
174.
Zurück zum Zitat Yang C, Yin XC, Li Z et al (2017) Adadnns: adaptive ensemble of deep neural networks for scene text recognition. arXiv preprint arXiv:1710.03425 Yang C, Yin XC, Li Z et al (2017) Adadnns: adaptive ensemble of deep neural networks for scene text recognition. arXiv preprint arXiv:​1710.​03425
175.
Zurück zum Zitat Yang Q, Cheng M, Zhou W et al (2018) Inceptext: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. CoRR abs/1805.01167. arXiv:1805.01167 Yang Q, Cheng M, Zhou W et al (2018) Inceptext: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. CoRR abs/1805.01167. arXiv:​1805.​01167
176.
Zurück zum Zitat Yang X, He D, Zhou Z et al (2017) Learning to read irregular text with attention mechanisms. In: IJCAI, p 3 Yang X, He D, Zhou Z et al (2017) Learning to read irregular text with attention mechanisms. In: IJCAI, p 3
177.
Zurück zum Zitat Yao C (2017) Msra text detection 500 database Yao C (2017) Msra text detection 500 database
178.
Zurück zum Zitat Yao C, Bai X, Liu W et al (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1083–1090 Yao C, Bai X, Liu W et al (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1083–1090
179.
Zurück zum Zitat Yao C, Bai X, Sang N et al (2016) Scene text detection via holistic, multi-channel prediction. CoRR abs/1606.09002. arXiv:1606.09002 Yao C, Bai X, Sang N et al (2016) Scene text detection via holistic, multi-channel prediction. CoRR abs/1606.09002. arXiv:​1606.​09002
180.
Zurück zum Zitat Yin XC, Yin X, Huang K et al (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983 Yin XC, Yin X, Huang K et al (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
182.
Zurück zum Zitat Yuliang L, Lianwen J, Shuaitao Z et al (2017) Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170 Yuliang L, Lianwen J, Shuaitao Z et al (2017) Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:​1712.​02170
184.
Zurück zum Zitat Zhan F, Lu S (2019) Esir: End-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2059–2068 Zhan F, Lu S (2019) Esir: End-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2059–2068
185.
Zurück zum Zitat Zhang C, Liang B, Huang Z et al (2019) Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10552–10561 Zhang C, Liang B, Huang Z et al (2019) Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10552–10561
186.
Zurück zum Zitat Zhang Z, Shen W, Yao C et al (2015) Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Zhang Z, Shen W, Yao C et al (2015) Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
187.
Zurück zum Zitat Zhang Z, Zhang C, Shen W et al (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Zhang Z, Zhang C, Shen W et al (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
188.
Zurück zum Zitat Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. Pattern Recognit 28(10):1523–1535CrossRef Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. Pattern Recognit 28(10):1523–1535CrossRef
189.
Zurück zum Zitat Zhong Z, Jin L, Zhang S et al (2016) Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314 Zhong Z, Jin L, Zhang S et al (2016) Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:​1605.​07314
190.
Zurück zum Zitat Zhou X, Yao C, Wen H et al (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560 Zhou X, Yao C, Wen H et al (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560
191.
Zurück zum Zitat Zhou Y, Liu S, Zhang Y et al (2014) Text localization in natural scene images with stroke width histogram and superpixel. Signal and information processing association annual summit and conference (APSIPA). Asia-Pacific, IEEE, pp 1–4 Zhou Y, Liu S, Zhang Y et al (2014) Text localization in natural scene images with stroke width histogram and superpixel. Signal and information processing association annual summit and conference (APSIPA). Asia-Pacific, IEEE, pp 1–4
192.
Zurück zum Zitat Zhou Z, Wu S, Kong S et al (2019) Curve text detection with local segmentation network and curve connection. arXiv preprint arXiv:1903.09837 Zhou Z, Wu S, Kong S et al (2019) Curve text detection with local segmentation network and curve connection. arXiv preprint arXiv:​1903.​09837
193.
Zurück zum Zitat Zhu X, Jiang Y, Yang S et al (2017) Deep residual text detection network for scene text. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 807–812 Zhu X, Jiang Y, Yang S et al (2017) Deep residual text detection network for scene text. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 807–812
Metadaten
Titel
Text detection, recognition, and script identification in natural scene images: a Review
verfasst von
Veronica Naosekpam
Nilkanta Sahu
Publikationsdatum
05.07.2022
Verlag
Springer London
Erschienen in
International Journal of Multimedia Information Retrieval / Ausgabe 3/2022
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-022-00243-8

Weitere Artikel der Ausgabe 3/2022

International Journal of Multimedia Information Retrieval 3/2022 Zur Ausgabe

Premium Partner