Skip to main content
Top
Published in: International Journal of Multimedia Information Retrieval 3/2022

05-07-2022 | Trends and Surveys

Text detection, recognition, and script identification in natural scene images: a Review

Authors: Veronica Naosekpam, Nilkanta Sahu

Published in: International Journal of Multimedia Information Retrieval | Issue 3/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Text in natural scene images plays a vital role in scene understanding. It contains a rich and abundant amount of valuable semantic information useful in many applications such as analysis of products’ labels, autonomous driving, and blind navigation. Consequently, detection, recognition, and identification of scripts of texts present in scene images have recently received massive attention. This paper intends to walk through the advances on the mentioned topics, mainly focusing on the approaches proposed in the last 8–10 years. As per our knowledge, this paper is the first to provide a review on the scene text script identification. We also provide a clear and precise classification between conventional-, deep learning-, and hybrid-based methods, including their advantages and disadvantages. State-of-the-art evaluation metrics, benchmark datasets’ characteristics, and performances of the existing methods are also analyzed and discussed. Lastly, we present an insight into potential research directions to complete the review. We hope this review will provide a brief insight for the researchers into scene text understanding.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ali M, Foroosh H (2016) Character recognition in natural scene images using rank-1 tensor decomposition. In: 2016 IEEE International conference on image processing (ICIP), IEEE, pp 2891–2895 Ali M, Foroosh H (2016) Character recognition in natural scene images using rank-1 tensor decomposition. In: 2016 IEEE International conference on image processing (ICIP), IEEE, pp 2891–2895
3.
go back to reference Atienza R (2021) Vision transformer for fast and efficient scene text recognition. In: International conference on document analysis and recognition, Springer, pp 319–334 Atienza R (2021) Vision transformer for fast and efficient scene text recognition. In: International conference on document analysis and recognition, Springer, pp 319–334
4.
go back to reference Baek J, Kim G, Lee J, et al (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4715–4723 Baek J, Kim G, Lee J, et al (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4715–4723
5.
go back to reference Bai B, Yin F, Liu CL (2014) A seed-based segmentation method for scene text extraction. In: 2014 11th IAPR international workshop on document analysis systems, IEEE, pp 262–266 Bai B, Yin F, Liu CL (2014) A seed-based segmentation method for scene text extraction. In: 2014 11th IAPR international workshop on document analysis systems, IEEE, pp 262–266
6.
go back to reference Bai F, Cheng Z, Niu Y, et al (2018) Edit probability for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1508–1516 Bai F, Cheng Z, Niu Y, et al (2018) Edit probability for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1508–1516
7.
go back to reference Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522CrossRef Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522CrossRef
8.
go back to reference Berg AC, Berg TL, Malik J (2005) Shape matching and object recognition using low distortion correspondences. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 26–33 Berg AC, Berg TL, Malik J (2005) Shape matching and object recognition using low distortion correspondences. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 26–33
9.
go back to reference Konwer A, Bhunia AK, Bhunia AK et al (2019) Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recognition 85:172–184CrossRef Konwer A, Bhunia AK, Bhunia AK et al (2019) Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recognition 85:172–184CrossRef
10.
go back to reference Bissacco A, Cummins M, Netzer Y, et al (2013) Photoocr: uncontrolled conditions. In: Proceedings of the ieee international conference on computer vision, pp 785–792 Bissacco A, Cummins M, Netzer Y, et al (2013) Photoocr: uncontrolled conditions. In: Proceedings of the ieee international conference on computer vision, pp 785–792
11.
12.
go back to reference Bookstein FL (1989) Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Mach Intell 11(6):567–585MATHCrossRef Bookstein FL (1989) Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Mach Intell 11(6):567–585MATHCrossRef
14.
go back to reference Burie JC, Chazalon J, Coustaty M, et al (2015) Icdar2015 competition on smartphone document capture and ocr (smartdoc). In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp 1161–1165 Burie JC, Chazalon J, Coustaty M, et al (2015) Icdar2015 competition on smartphone document capture and ocr (smartdoc). In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp 1161–1165
16.
go back to reference Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698CrossRef Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698CrossRef
18.
go back to reference Chen D, Bourlard H, Thiran JP (2001) Text identification in complex background using svm. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, IEEE, pp II–II Chen D, Bourlard H, Thiran JP (2001) Text identification in complex background using svm. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, IEEE, pp II–II
20.
go back to reference Chen J, Lian Z, Wang Y et al (2019) Irregular scene text detection via attention guided border labeling. Sci China Inf Sci 62(12):220103CrossRef Chen J, Lian Z, Wang Y et al (2019) Irregular scene text detection via attention guided border labeling. Sci China Inf Sci 62(12):220103CrossRef
22.
go back to reference Ch’ng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 935–942 Ch’ng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 935–942
23.
go back to reference Chng CK, Liu Y, Sun Y, et al (2019) Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1571–1576 Chng CK, Liu Y, Sun Y, et al (2019) Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1571–1576
24.
go back to reference Cho MS, Seok JH, Lee S, et al (2011) Scene text extraction by superpixel crfs combining multiple character features. In: 2011 international conference on document analysis and recognition, IEEE, pp 1034–1038 Cho MS, Seok JH, Lee S, et al (2011) Scene text extraction by superpixel crfs combining multiple character features. In: 2011 international conference on document analysis and recognition, IEEE, pp 1034–1038
25.
go back to reference Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRef Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRef
26.
go back to reference Dai P, Zhang S, Zhang H, et al (2021) Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7393–7402 Dai P, Zhang S, Zhang H, et al (2021) Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7393–7402
27.
go back to reference Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 886–893 Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 886–893
28.
go back to reference Dastidar SG, Dutta K, Das N, et al (2021) Exploring knowledge distillation of a deep neural network for multi-script identification. In: International conference on computational intelligence in communications and business analytics, Springer, pp 150–162 Dastidar SG, Dutta K, Das N, et al (2021) Exploring knowledge distillation of a deep neural network for multi-script identification. In: International conference on computational intelligence in communications and business analytics, Springer, pp 150–162
29.
go back to reference De Campos TE, Babu BR, Varma M et al (2009) Character recognition in natural images. VISAPP 2:7 De Campos TE, Babu BR, Varma M et al (2009) Character recognition in natural images. VISAPP 2:7
30.
go back to reference Deng D, Liu H, Li X, et al (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence Deng D, Liu H, Li X, et al (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence
31.
go back to reference Deng L, Gong Y, Lin Y et al (2019) Detecting multi-oriented text with corner-based region proposals. Neurocomputing 334:134–142CrossRef Deng L, Gong Y, Lin Y et al (2019) Detecting multi-oriented text with corner-based region proposals. Neurocomputing 334:134–142CrossRef
32.
go back to reference Deng L, Gong Y, Lu X et al (2019) Stela: a real-time scene text detector with learned anchor. IEEE Access 7:153400–153407CrossRef Deng L, Gong Y, Lu X et al (2019) Stela: a real-time scene text detector with learned anchor. IEEE Access 7:153400–153407CrossRef
33.
go back to reference Desolneux A, Moisan L, More JM (2003) A grouping principle and four applications. IEEE Trans Pattern Anal Mach Intell 25(4):508–513CrossRef Desolneux A, Moisan L, More JM (2003) A grouping principle and four applications. IEEE Trans Pattern Anal Mach Intell 25(4):508–513CrossRef
34.
go back to reference Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:​2010.​11929
35.
go back to reference Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2963–2970 Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2963–2970
36.
go back to reference Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79CrossRef Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79CrossRef
37.
go back to reference Feng Y, Song Y, Zhang Y (2016) Scene text detection based on multi-scale swt and edge filtering. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 645–650 Feng Y, Song Y, Zhang Y (2016) Scene text detection based on multi-scale swt and edge filtering. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 645–650
38.
go back to reference Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: icml, Citeseer, pp 148–156 Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: icml, Citeseer, pp 148–156
39.
go back to reference Fujii Y, Driesen K, Baccash J et al (2017) Sequence-to-label script identification for multilingual ocr. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 161–168 Fujii Y, Driesen K, Baccash J et al (2017) Sequence-to-label script identification for multilingual ocr. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 161–168
40.
go back to reference Ghosh S, Chaudhuri BB (2011) Composite script identification and orientation detection for indian text images. In: 2011 international conference on document analysis and recognition, IEEE, pp 294–298 Ghosh S, Chaudhuri BB (2011) Composite script identification and orientation detection for indian text images. In: 2011 international conference on document analysis and recognition, IEEE, pp 294–298
43.
go back to reference Gomez L, Karatzas D (2015) Object proposals for text extraction in the wild. In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp 206–210 Gomez L, Karatzas D (2015) Object proposals for text extraction in the wild. In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp 206–210
44.
go back to reference Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 192–197 Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 192–197
45.
go back to reference Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit 67:85–96CrossRef Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit 67:85–96CrossRef
46.
go back to reference Gordo A (2015) Supervised mid-level features for word image representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2956–2964 Gordo A (2015) Supervised mid-level features for word image representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2956–2964
47.
go back to reference Graves A, Fernández S, Gomez F et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369–376 Graves A, Fernández S, Gomez F et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369–376
48.
go back to reference Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324 Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324
49.
go back to reference Hanif SM, Prevost L (2009) Text detection and localization in complex scene images using constrained adaboost algorithm. In: 2009 10th international conference on document analysis and recognition, IEEE, pp 1–5 Hanif SM, Prevost L (2009) Text detection and localization in complex scene images using constrained adaboost algorithm. In: 2009 10th international conference on document analysis and recognition, IEEE, pp 1–5
50.
go back to reference Hanif SM, Prevost L, Negri PA (2008) A cascade detector for text detection in natural scene images. In: 2008 19th international conference on pattern recognition, IEEE, pp 1–4 Hanif SM, Prevost L, Negri PA (2008) A cascade detector for text detection in natural scene images. In: 2008 19th international conference on pattern recognition, IEEE, pp 1–4
51.
go back to reference He K, Gkioxari G, Dollar P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV) He K, Gkioxari G, Dollar P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV)
52.
go back to reference He P, Huang W, Qiao Y et al (2016) Reading scene text in deep convolutional sequences. In: Proceedings of the AAAI conference on artificial intelligence He P, Huang W, Qiao Y et al (2016) Reading scene text in deep convolutional sequences. In: Proceedings of the AAAI conference on artificial intelligence
53.
go back to reference He W, Zhang XY, Yin F et al (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE international conference on computer vision, pp 745–753 He W, Zhang XY, Yin F et al (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE international conference on computer vision, pp 745–753
54.
go back to reference Hoiem D, Divvala SK, Hays JH (2009) Pascal voc 2008 challenge. World Lit Today Hoiem D, Divvala SK, Hays JH (2009) Pascal voc 2008 challenge. World Lit Today
55.
go back to reference Hu H, Zhang C, Luo Y et al (2017) Wordsup: Exploiting word annotations for character based text detection. In: Proceedings of the IEEE international conference on computer vision (ICCV) Hu H, Zhang C, Luo Y et al (2017) Wordsup: Exploiting word annotations for character based text detection. In: Proceedings of the IEEE international conference on computer vision (ICCV)
56.
go back to reference Huang L, Yang Y, Deng Y et al (2015) Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 Huang L, Yang Y, Deng Y et al (2015) Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:​1509.​04874
57.
go back to reference Huang W, Lin Z, Yang J et al (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision, pp 1241–1248 Huang W, Lin Z, Yang J et al (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision, pp 1241–1248
58.
go back to reference Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, Springer, pp 497–511 Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, Springer, pp 497–511
59.
go back to reference Jaderberg M, Simonyan K, Vedaldi A et al (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 Jaderberg M, Simonyan K, Vedaldi A et al (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:​1406.​2227
60.
go back to reference Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, Springer, pp 512–528 Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, Springer, pp 512–528
62.
go back to reference Jaderberg M, Simonyan K, Vedaldi A et al (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef Jaderberg M, Simonyan K, Vedaldi A et al (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef
63.
go back to reference Jiang Y, Zhu X, Wang X et al (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579 Jiang Y, Zhu X, Wang X et al (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:​1706.​09579
64.
go back to reference Kang C, Kim G, Yoo S (2017) Detection and recognition of text embedded in online images via neural context models. In: Proceedings of the AAAI conference on artificial intelligence Kang C, Kim G, Yoo S (2017) Detection and recognition of text embedded in online images via neural context models. In: Proceedings of the AAAI conference on artificial intelligence
65.
go back to reference Karatzas D, Shafait F, Uchida S et al (2013) Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition, IEEE, pp 1484–1493 Karatzas D, Shafait F, Uchida S et al (2013) Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition, IEEE, pp 1484–1493
66.
go back to reference Karatzas D, Gomez-Bigorda L, Nicolaou A et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1156–1160 Karatzas D, Gomez-Bigorda L, Nicolaou A et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1156–1160
67.
68.
go back to reference Kaur A, Shrawankar U (2017) Adverse conditions and techniques for cross-lingual text recognition. In: 2017 international conference on innovative mechanisms for industry applications (ICIMIA), IEEE, pp 70–74 Kaur A, Shrawankar U (2017) Adverse conditions and techniques for cross-lingual text recognition. In: 2017 international conference on innovative mechanisms for industry applications (ICIMIA), IEEE, pp 70–74
69.
go back to reference Kaur H, Singh R (2018) Natural scene text localization by convolution neural network with svm Kaur H, Singh R (2018) Natural scene text localization by convolution neural network with svm
72.
go back to reference Khatib T, Karajeh H, Mohammad H et al (2015) A hybrid multilevel text extraction algorithm in scene images. Sci Res Essays 10(3):105–113CrossRef Khatib T, Karajeh H, Mohammad H et al (2015) A hybrid multilevel text extraction algorithm in scene images. Sci Res Essays 10(3):105–113CrossRef
73.
go back to reference Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans Pattern Anal Mach Intell 25(12):1631–1639CrossRef Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans Pattern Anal Mach Intell 25(12):1631–1639CrossRef
74.
go back to reference Kobchaisawat T, Chalidabhongse TH, Satoh S (2020) Scene text detection with polygon offsetting and border augmentation. Electronics 9(1):117CrossRef Kobchaisawat T, Chalidabhongse TH, Satoh S (2020) Scene text detection with polygon offsetting and border augmentation. Electronics 9(1):117CrossRef
75.
go back to reference Koo HI, Cho NI (2011) Text-line extraction in handwritten chinese documents based on an energy minimization framework. IEEE Trans Image Process 21(3):1169–1175MathSciNetMATH Koo HI, Cho NI (2011) Text-line extraction in handwritten chinese documents based on an energy minimization framework. IEEE Trans Image Process 21(3):1169–1175MathSciNetMATH
76.
go back to reference Krasin I, Duerig T, Alldrin N, et al (2017) Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github com/openimages 2(3):18 Krasin I, Duerig T, Alldrin N, et al (2017) Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://​github com/openimages 2(3):18
77.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
78.
go back to reference Kumar D, Prasad MA, Ramakrishnan A (2013) Multi-script robust reading competition in icdar 2013. In: Proceedings of the 4th international workshop on multilingual OCR, pp 1–5 Kumar D, Prasad MA, Ramakrishnan A (2013) Multi-script robust reading competition in icdar 2013. In: Proceedings of the 4th international workshop on multilingual OCR, pp 1–5
79.
go back to reference Lazebnik S, Schmid C, Ponce J (2005) A sparse texture representation using local affine regions. IEEE Trans Pattern Anal Mach Intell 27(8):1265–1278CrossRef Lazebnik S, Schmid C, Ponce J (2005) A sparse texture representation using local affine regions. IEEE Trans Pattern Anal Mach Intell 27(8):1265–1278CrossRef
80.
go back to reference Lee CY, Bhardwaj A, Di W et al (2014) Region-based discriminative feature pooling for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4050–4057 Lee CY, Bhardwaj A, Di W et al (2014) Region-based discriminative feature pooling for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4050–4057
81.
go back to reference Lee CY, Baek Y, Lee H (2019) Tedeval: A fair evaluation metric for scene text detectors. In: 2019 international conference on document analysis and recognition workshops (ICDARW), IEEE, pp 14–17 Lee CY, Baek Y, Lee H (2019) Tedeval: A fair evaluation metric for scene text detectors. In: 2019 international conference on document analysis and recognition workshops (ICDARW), IEEE, pp 14–17
82.
go back to reference Lee S, Cho MS, Jung K, et al (2010) Scene text extraction with edge constraint and text collinearity. In: 2010 20th international conference on pattern recognition, IEEE, pp 3983–3986 Lee S, Cho MS, Jung K, et al (2010) Scene text extraction with edge constraint and text collinearity. In: 2010 20th international conference on pattern recognition, IEEE, pp 3983–3986
83.
go back to reference Li Y, Lu H (2012) Scene text detection via stroke width. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 681–684 Li Y, Lu H (2012) Scene text detection via stroke width. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 681–684
84.
go back to reference Liao M, Shi B, Bai X et al (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI conference on artificial intelligence Liao M, Shi B, Bai X et al (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI conference on artificial intelligence
85.
86.
go back to reference Liao M, Zhu Z, Shi B et al (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5909–5918 Liao M, Zhu Z, Shi B et al (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5909–5918
88.
go back to reference Lienhart RW, Stuber F (1996) Automatic text recognition in digital videos. In: Image and video processing IV, international society for optics and photonics, pp 180–188 Lienhart RW, Stuber F (1996) Automatic text recognition in digital videos. In: Image and video processing IV, international society for optics and photonics, pp 180–188
89.
go back to reference Lin CH, Lucey S (2017) Inverse compositional spatial transformer networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2568–2576 Lin CH, Lucey S (2017) Inverse compositional spatial transformer networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2568–2576
90.
go back to reference Lin TY, Dollar P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Lin TY, Dollar P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
91.
go back to reference Litman R, Anschel O, Tsiper S et al (2020) Scatter: selective context attentional scene text recognizer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Litman R, Anschel O, Tsiper S et al (2020) Scatter: selective context attentional scene text recognizer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
93.
go back to reference Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37 Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
94.
go back to reference Liu X, Kawanishi T, Wu X et al (2016) Scene text recognition with cnn classifier and wfst-based word labeling. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 3999–4004 Liu X, Kawanishi T, Wu X et al (2016) Scene text recognition with cnn classifier and wfst-based word labeling. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 3999–4004
95.
go back to reference Liu X, Liang D, Yan S et al (2018) Fots: Fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Liu X, Liang D, Yan S et al (2018) Fots: Fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
96.
go back to reference Liu Y, Jin L, Xie Z et al (2019) Tightness-aware evaluation protocol for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9612–9620 Liu Y, Jin L, Xie Z et al (2019) Tightness-aware evaluation protocol for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9612–9620
97.
go back to reference Liu Y, Chen H, Shen C, et al (2020) Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9809–9818 Liu Y, Chen H, Shen C, et al (2020) Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9809–9818
98.
go back to reference Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440 Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
99.
go back to reference Long S, Ruan J, Zhang W et al (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36 Long S, Ruan J, Zhang W et al (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36
100.
go back to reference Lowe G (2004) Sift-the scale invariant feature transform. Int J 2(91–110):2 Lowe G (2004) Sift-the scale invariant feature transform. Int J 2(91–110):2
101.
go back to reference Lu L, Wu D, Tang Z et al (2021) Mining discriminative patches for script identification in natural scene images. J Intell Fuzzy Syst 40(1):551–563CrossRef Lu L, Wu D, Tang Z et al (2021) Mining discriminative patches for script identification in natural scene images. J Intell Fuzzy Syst 40(1):551–563CrossRef
102.
go back to reference Lu M, Mou Y, Chen CL et al (2021) An efficient text detection model for street signs. Appl Sci 11(13):5962CrossRef Lu M, Mou Y, Chen CL et al (2021) An efficient text detection model for street signs. Appl Sci 11(13):5962CrossRef
104.
go back to reference Lucas SM, Panaretos A, Sosa L et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. Int J Doc Anal Recognit (IJDAR) 7(2–3):105–122CrossRef Lucas SM, Panaretos A, Sosa L et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. Int J Doc Anal Recognit (IJDAR) 7(2–3):105–122CrossRef
105.
go back to reference Luo C, Jin L, Sun Z (2019) Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognit 90:109–118CrossRef Luo C, Jin L, Sun Z (2019) Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognit 90:109–118CrossRef
106.
go back to reference Lyu P, Liao M, Yao C et al (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV) Lyu P, Liao M, Yao C et al (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV)
107.
go back to reference Lyu P, Yao C, Wu W et al (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563 Lyu P, Yao C, Wu W et al (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563
108.
go back to reference Ma J, Shao W, Ye H et al (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122CrossRef Ma J, Shao W, Ye H et al (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122CrossRef
109.
go back to reference Ma M, Wang QF, Huang S et al (2021) Residual attention-based multi-scale script identification in scene text images. Neurocomputing 421:222–233CrossRef Ma M, Wang QF, Huang S et al (2021) Residual attention-based multi-scale script identification in scene text images. Neurocomputing 421:222–233CrossRef
110.
go back to reference Mahajan S, Rani R (2022) Word level script identification using convolutional neural network enhancement for scenic images. Trans Asian Low-Res Lang Inf Process 21(4):1–29CrossRef Mahajan S, Rani R (2022) Word level script identification using convolutional neural network enhancement for scenic images. Trans Asian Low-Res Lang Inf Process 21(4):1–29CrossRef
111.
go back to reference Matas J, Chum O, Urban M et al (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767CrossRef Matas J, Chum O, Urban M et al (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767CrossRef
112.
go back to reference Mathew M, Jain M, Jawahar C (2017) Benchmarking scene text recognition in devanagari, telugu and malayalam. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 42–46 Mathew M, Jain M, Jawahar C (2017) Benchmarking scene text recognition in devanagari, telugu and malayalam. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 42–46
113.
go back to reference Mei J, Dai L, Shi B et al (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 4053–4058 Mei J, Dai L, Shi B et al (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 4053–4058
114.
go back to reference Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: BMVC-British machine vision conference, BMVA Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: BMVC-British machine vision conference, BMVA
115.
go back to reference Mohanty S, Dutta T, Gupta HP (2018) Recurrent global convolutional network for scene text detection. In: 2018 25th IEEE international conference on image processing (ICIP), IEEE, pp 2750–2754 Mohanty S, Dutta T, Gupta HP (2018) Recurrent global convolutional network for scene text detection. In: 2018 25th IEEE international conference on image processing (ICIP), IEEE, pp 2750–2754
116.
go back to reference Nagaoka Y, Miyazaki T, Sugaya Y et al (2021) Text detection using multi-stage region proposal network sensitive to text scale. Sensors 21(4):1232CrossRef Nagaoka Y, Miyazaki T, Sugaya Y et al (2021) Text detection using multi-stage region proposal network sensitive to text scale. Sensors 21(4):1232CrossRef
117.
go back to reference Naosekpam V, Bhowmick A, Hazarika SM (2019) Superpixel correspondence for non-parametric scene parsing of natural images. In: International conference on pattern recognition and machine intelligence, Springer, pp 614–622 Naosekpam V, Bhowmick A, Hazarika SM (2019) Superpixel correspondence for non-parametric scene parsing of natural images. In: International conference on pattern recognition and machine intelligence, Springer, pp 614–622
118.
go back to reference Naosekpam V, Paul N, Bhowmick A (2019) Dense and partial correspondence in non-parametric scene parsing. In: International conference on machine intelligence and signal processing, Springer, pp 339–350 Naosekpam V, Paul N, Bhowmick A (2019) Dense and partial correspondence in non-parametric scene parsing. In: International conference on machine intelligence and signal processing, Springer, pp 339–350
120.
go back to reference Naosekpam V, Shishir AS, Sahu N (2021) Scene text recognition with orientation rectification via ic-stn. In: TENCON 2021-2021 IEEE region 10 conference (TENCON), IEEE, pp 664–669 Naosekpam V, Shishir AS, Sahu N (2021) Scene text recognition with orientation rectification via ic-stn. In: TENCON 2021-2021 IEEE region 10 conference (TENCON), IEEE, pp 664–669
121.
go back to reference Naosekpam V, Aggarwal S, Sahu N (2022) Utextnet: a unet based arbitrary shaped scene text detector. In: Abraham A, Gandhi N, Hanne T et al (eds) Intell Syst Des Appl. Springer International Publishing, Cham, pp 368–378 Naosekpam V, Aggarwal S, Sahu N (2022) Utextnet: a unet based arbitrary shaped scene text detector. In: Abraham A, Gandhi N, Hanne T et al (eds) Intell Syst Des Appl. Springer International Publishing, Cham, pp 368–378
122.
go back to reference Nayef N, Yin F, Bizid I et al (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1454–1459 Nayef N, Yin F, Bizid I et al (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1454–1459
123.
go back to reference Nayef N, Patel Y, Busta M et al (2019) Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition-rrc-mlt-2019. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1582–1587 Nayef N, Patel Y, Busta M et al (2019) Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition-rrc-mlt-2019. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1582–1587
124.
go back to reference Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Springer, pp 770–783 Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Springer, pp 770–783
125.
go back to reference Nicolaou A, Bagdanov AD, Liwicki M et al (2015) Sparse radial sampling lbp for writer identification. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 716–720 Nicolaou A, Bagdanov AD, Liwicki M et al (2015) Sparse radial sampling lbp for writer identification. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 716–720
126.
go back to reference Novikova T, Barinova O, Kohli P et al (2012) Large-lexicon attribute-consistent text recognition in natural images. In: European conference on computer vision, Springer, pp 752–765 Novikova T, Barinova O, Kohli P et al (2012) Large-lexicon attribute-consistent text recognition in natural images. In: European conference on computer vision, Springer, pp 752–765
127.
go back to reference Pan YF, Hou X, Liu CL (2009) Text localization in natural scene images based on conditional random field. In: 2009 10th international conference on document analysis and recognition, IEEE, pp 6–10 Pan YF, Hou X, Liu CL (2009) Text localization in natural scene images based on conditional random field. In: 2009 10th international conference on document analysis and recognition, IEEE, pp 6–10
128.
go back to reference Pan YF, Hou X, Liu CL (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813MathSciNetMATH Pan YF, Hou X, Liu CL (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813MathSciNetMATH
130.
go back to reference Qiao Z, Zhou Y, Yang D et al (2020) Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) Qiao Z, Zhou Y, Yang D et al (2020) Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
131.
go back to reference Qin H, Zhang H, Wang H et al (2019) An algorithm for scene text detection using multibox and semantic segmentation. Appl Sci 9(6):1054CrossRef Qin H, Zhang H, Wang H et al (2019) An algorithm for scene text detection using multibox and semantic segmentation. Appl Sci 9(6):1054CrossRef
133.
go back to reference Raghunandan K, Shivakumara P, Roy S et al (2018) Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans Circuits Syst Video Technol 29(4):1145–1162CrossRef Raghunandan K, Shivakumara P, Roy S et al (2018) Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans Circuits Syst Video Technol 29(4):1145–1162CrossRef
134.
go back to reference Rahul Y, Sharma RK (2019) Eeg signal-based movement control for mobile robots. Curr Sci 116(12):1993–2000CrossRef Rahul Y, Sharma RK (2019) Eeg signal-based movement control for mobile robots. Curr Sci 116(12):1993–2000CrossRef
136.
go back to reference Redmon J, Divvala S, Girshick R et al (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788 Redmon J, Divvala S, Girshick R et al (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
137.
go back to reference Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:​1506.​01497
138.
go back to reference Risnumawan A, Shivakumara P, Chan CS et al (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048CrossRef Risnumawan A, Shivakumara P, Chan CS et al (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048CrossRef
139.
go back to reference Rodriguez-Serrano JA, Gordo A, Perronnin F (2015) Label embedding: a frugal baseline for text recognition. Int J Comput Vis 113(3):193–207CrossRef Rodriguez-Serrano JA, Gordo A, Perronnin F (2015) Label embedding: a frugal baseline for text recognition. Int J Comput Vis 113(3):193–207CrossRef
140.
go back to reference Rusinol M, Chazalon J, Ogier JM (2014) Combining focus measure operators to predict ocr accuracy in mobile-captured document images. In: 2014 11th IAPR international workshop on document analysis systems, IEEE, pp 181–185 Rusinol M, Chazalon J, Ogier JM (2014) Combining focus measure operators to predict ocr accuracy in mobile-captured document images. In: 2014 11th IAPR international workshop on document analysis systems, IEEE, pp 181–185
141.
go back to reference Sen P, Das A, Sahu N (2021) End-to-end scene text recognition system for devanagari and bengali text. In: International conference on intelligent computing & optimization, Springer, pp 352–359 Sen P, Das A, Sahu N (2021) End-to-end scene text recognition system for devanagari and bengali text. In: International conference on intelligent computing & optimization, Springer, pp 352–359
142.
go back to reference Shahab A, Shafait F, Dengel A (2011) Icdar 2011 robust reading competition challenge 2: scene images. In: 2011 international conference on document analysis and recognition, IEEE, pp 1491–1496 Shahab A, Shafait F, Dengel A (2011) Icdar 2011 robust reading competition challenge 2: scene images. In: 2011 international conference on document analysis and recognition, IEEE, pp 1491–1496
143.
go back to reference Shao HL, Ji Y, Li Y et al (2021) Bdfpn: Bi-direction feature pyramid network for scene text detection. In: 2021 international joint conference on neural networks (IJCNN), IEEE, pp 1–8 Shao HL, Ji Y, Li Y et al (2021) Bdfpn: Bi-direction feature pyramid network for scene text detection. In: 2021 international joint conference on neural networks (IJCNN), IEEE, pp 1–8
144.
go back to reference Sharma N, Mandal R, Sharma R et al (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1196–1200 Sharma N, Mandal R, Sharma R et al (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1196–1200
145.
go back to reference Shi B, Yao C, Zhang C et al (2015) Automatic script identification in the wild. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 531–535 Shi B, Yao C, Zhang C et al (2015) Automatic script identification in the wild. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 531–535
146.
go back to reference Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304CrossRef Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304CrossRef
147.
go back to reference Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recognit 52:448–458CrossRef Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recognit 52:448–458CrossRef
149.
go back to reference Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558 Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558
150.
go back to reference Shi B, Yao C, Liao M et al (2017) Icdar2017 competition on reading chinese text in the wild (rctw-17). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1429–1434 Shi B, Yao C, Liao M et al (2017) Icdar2017 competition on reading chinese text in the wild (rctw-17). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1429–1434
151.
go back to reference Shi B, Yang M, Wang X et al (2018) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41(9):2035–2048CrossRef Shi B, Yang M, Wang X et al (2018) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41(9):2035–2048CrossRef
152.
go back to reference Singh AK, Mishra A, Dabral P et al (2016) A simple and effective solution for script identification in the wild. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 428–433 Singh AK, Mishra A, Dabral P et al (2016) A simple and effective solution for script identification in the wild. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 428–433
153.
go back to reference Su B, Lu S (2014) Accurate scene text recognition based on recurrent neural network. In: Asian conference on computer vision, Springer, pp 35–48 Su B, Lu S (2014) Accurate scene text recognition based on recurrent neural network. In: Asian conference on computer vision, Springer, pp 35–48
154.
go back to reference Tian Z, Huang W, He T et al (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Springer, pp 56–72 Tian Z, Huang W, He T et al (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Springer, pp 56–72
155.
go back to reference Varma M, Zisserman A (2002) Classifying images of materials: achieving viewpoint and illumination independence. In: European conference on computer vision, Springer, pp 255–271 Varma M, Zisserman A (2002) Classifying images of materials: achieving viewpoint and illumination independence. In: European conference on computer vision, Springer, pp 255–271
156.
go back to reference Varma M, Zisserman A (2003) Texture classification: are filter banks necessary? In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings., IEEE, pp II–691 Varma M, Zisserman A (2003) Texture classification: are filter banks necessary? In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings., IEEE, pp II–691
157.
go back to reference Veit A, Matera T, Neumann L et al (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 Veit A, Matera T, Neumann L et al (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:​1601.​07140
158.
go back to reference Verma M, Sood N, Roy PP et al (2017) Script identification in natural scene images: a dataset and texture-feature based performance evaluation. In: Proceedings of international conference on computer vision and image processing, Springer, pp 309–319 Verma M, Sood N, Roy PP et al (2017) Script identification in natural scene images: a dataset and texture-feature based performance evaluation. In: Proceedings of international conference on computer vision and image processing, Springer, pp 309–319
160.
go back to reference Wang K, Belongie S (2010) Word spotting in the wild. In: European conference on computer vision, Springer, pp 591–604 Wang K, Belongie S (2010) Word spotting in the wild. In: European conference on computer vision, Springer, pp 591–604
161.
go back to reference Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 international conference on computer vision, IEEE, pp 1457–1464 Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 international conference on computer vision, IEEE, pp 1457–1464
162.
go back to reference Wang Q, Zheng Y, Betke M (2020) A method for detecting text of arbitrary shapes in natural scenes that improves text spotting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops Wang Q, Zheng Y, Betke M (2020) A method for detecting text of arbitrary shapes in natural scenes that improves text spotting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops
163.
go back to reference Wang T, Wu DJ, Coates A et al (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 3304–3308 Wang T, Wu DJ, Coates A et al (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 3304–3308
164.
go back to reference Wang W, Xie E, Li X et al (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9336–9345 Wang W, Xie E, Li X et al (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9336–9345
165.
go back to reference Wang W, Xie E, Song X et al (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8440–8449 Wang W, Xie E, Song X et al (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8440–8449
166.
go back to reference Wang X, Jiang Y, Luo Z et al (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6449–6458 Wang X, Jiang Y, Luo Z et al (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6449–6458
167.
go back to reference Wang Y, Shi C, Xiao B et al (2015) Mrf based text binarization in complex images using stroke feature. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 821–825 Wang Y, Shi C, Xiao B et al (2015) Mrf based text binarization in complex images using stroke feature. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 821–825
168.
go back to reference Wang Y, Xie H, Zha ZJ et al (2020) Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11753–11762 Wang Y, Xie H, Zha ZJ et al (2020) Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11753–11762
169.
go back to reference Wu Y, Natarajan P (2017) Self-organized text detection with minimal post-processing via border learning. In: Proceedings of the IEEE international conference on computer vision, pp 5000–5009 Wu Y, Natarajan P (2017) Self-organized text detection with minimal post-processing via border learning. In: Proceedings of the IEEE international conference on computer vision, pp 5000–5009
170.
go back to reference Xiang D, Guo Q, Xia Y (2016) Robust text detection with vertically-regressed proposal network. In: European conference on computer vision, Springer, pp 351–363 Xiang D, Guo Q, Xia Y (2016) Robust text detection with vertically-regressed proposal network. In: European conference on computer vision, Springer, pp 351–363
171.
go back to reference Xie E, Zang Y, Shao S et al (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9038–9045 Xie E, Zang Y, Shao S et al (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9038–9045
173.
go back to reference Yan R, Peng L, Xiao S et al (2021) Primitive representation learning for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 284–293 Yan R, Peng L, Xiao S et al (2021) Primitive representation learning for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 284–293
174.
go back to reference Yang C, Yin XC, Li Z et al (2017) Adadnns: adaptive ensemble of deep neural networks for scene text recognition. arXiv preprint arXiv:1710.03425 Yang C, Yin XC, Li Z et al (2017) Adadnns: adaptive ensemble of deep neural networks for scene text recognition. arXiv preprint arXiv:​1710.​03425
175.
go back to reference Yang Q, Cheng M, Zhou W et al (2018) Inceptext: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. CoRR abs/1805.01167. arXiv:1805.01167 Yang Q, Cheng M, Zhou W et al (2018) Inceptext: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. CoRR abs/1805.01167. arXiv:​1805.​01167
176.
go back to reference Yang X, He D, Zhou Z et al (2017) Learning to read irregular text with attention mechanisms. In: IJCAI, p 3 Yang X, He D, Zhou Z et al (2017) Learning to read irregular text with attention mechanisms. In: IJCAI, p 3
177.
go back to reference Yao C (2017) Msra text detection 500 database Yao C (2017) Msra text detection 500 database
178.
go back to reference Yao C, Bai X, Liu W et al (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1083–1090 Yao C, Bai X, Liu W et al (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1083–1090
179.
go back to reference Yao C, Bai X, Sang N et al (2016) Scene text detection via holistic, multi-channel prediction. CoRR abs/1606.09002. arXiv:1606.09002 Yao C, Bai X, Sang N et al (2016) Scene text detection via holistic, multi-channel prediction. CoRR abs/1606.09002. arXiv:​1606.​09002
180.
go back to reference Yin XC, Yin X, Huang K et al (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983 Yin XC, Yin X, Huang K et al (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
182.
go back to reference Yuliang L, Lianwen J, Shuaitao Z et al (2017) Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170 Yuliang L, Lianwen J, Shuaitao Z et al (2017) Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:​1712.​02170
184.
go back to reference Zhan F, Lu S (2019) Esir: End-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2059–2068 Zhan F, Lu S (2019) Esir: End-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2059–2068
185.
go back to reference Zhang C, Liang B, Huang Z et al (2019) Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10552–10561 Zhang C, Liang B, Huang Z et al (2019) Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10552–10561
186.
go back to reference Zhang Z, Shen W, Yao C et al (2015) Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Zhang Z, Shen W, Yao C et al (2015) Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
187.
go back to reference Zhang Z, Zhang C, Shen W et al (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Zhang Z, Zhang C, Shen W et al (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
188.
go back to reference Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. Pattern Recognit 28(10):1523–1535CrossRef Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. Pattern Recognit 28(10):1523–1535CrossRef
189.
go back to reference Zhong Z, Jin L, Zhang S et al (2016) Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314 Zhong Z, Jin L, Zhang S et al (2016) Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:​1605.​07314
190.
go back to reference Zhou X, Yao C, Wen H et al (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560 Zhou X, Yao C, Wen H et al (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560
191.
go back to reference Zhou Y, Liu S, Zhang Y et al (2014) Text localization in natural scene images with stroke width histogram and superpixel. Signal and information processing association annual summit and conference (APSIPA). Asia-Pacific, IEEE, pp 1–4 Zhou Y, Liu S, Zhang Y et al (2014) Text localization in natural scene images with stroke width histogram and superpixel. Signal and information processing association annual summit and conference (APSIPA). Asia-Pacific, IEEE, pp 1–4
192.
go back to reference Zhou Z, Wu S, Kong S et al (2019) Curve text detection with local segmentation network and curve connection. arXiv preprint arXiv:1903.09837 Zhou Z, Wu S, Kong S et al (2019) Curve text detection with local segmentation network and curve connection. arXiv preprint arXiv:​1903.​09837
193.
go back to reference Zhu X, Jiang Y, Yang S et al (2017) Deep residual text detection network for scene text. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 807–812 Zhu X, Jiang Y, Yang S et al (2017) Deep residual text detection network for scene text. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 807–812
Metadata
Title
Text detection, recognition, and script identification in natural scene images: a Review
Authors
Veronica Naosekpam
Nilkanta Sahu
Publication date
05-07-2022
Publisher
Springer London
Published in
International Journal of Multimedia Information Retrieval / Issue 3/2022
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-022-00243-8

Other articles of this Issue 3/2022

International Journal of Multimedia Information Retrieval 3/2022 Go to the issue

Premium Partner