Top

International Journal of Multimedia Information Retrieval

Published in:

05-07-2022 | Trends and Surveys

Text detection, recognition, and script identification in natural scene images: a Review

Authors: Veronica Naosekpam, Nilkanta Sahu

Published in: International Journal of Multimedia Information Retrieval | Issue 3/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Text in natural scene images plays a vital role in scene understanding. It contains a rich and abundant amount of valuable semantic information useful in many applications such as analysis of products’ labels, autonomous driving, and blind navigation. Consequently, detection, recognition, and identification of scripts of texts present in scene images have recently received massive attention. This paper intends to walk through the advances on the mentioned topics, mainly focusing on the approaches proposed in the last 8–10 years. As per our knowledge, this paper is the first to provide a review on the scene text script identification. We also provide a clear and precise classification between conventional-, deep learning-, and hybrid-based methods, including their advantages and disadvantages. State-of-the-art evaluation metrics, benchmark datasets’ characteristics, and performances of the existing methods are also analyzed and discussed. Lastly, we present an insight into potential research directions to complete the review. We hope this review will provide a brief insight for the researchers into scene text understanding.

previous article A literature review and perspectives in deepfakes: generation, detection, and applications

next article Organ segmentation from computed tomography images using the 3D convolutional neural network: a systematic review

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Ali M, Foroosh H (2016) Character recognition in natural scene images using rank-1 tensor decomposition. In: 2016 IEEE International conference on image processing (ICIP), IEEE, pp 2891–2895

Ansari GJ, Shah JH, Yasmin M et al (2018) A novel machine learning approach for scene text extraction. Future Gener Comput Syst 87:328–340. https://doi.org/10.1016/j.future.2018.04.074, https://www.sciencedirect.com/science/article/pii/S0167739X17321520

Atienza R (2021) Vision transformer for fast and efficient scene text recognition. In: International conference on document analysis and recognition, Springer, pp 319–334

Baek J, Kim G, Lee J, et al (2019) What is wrong with scene text recognition model comparisons? dataset and model analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4715–4723

Bai B, Yin F, Liu CL (2014) A seed-based segmentation method for scene text extraction. In: 2014 11th IAPR international workshop on document analysis systems, IEEE, pp 262–266

Bai F, Cheng Z, Niu Y, et al (2018) Edit probability for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1508–1516

Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522CrossRef

Berg AC, Berg TL, Malik J (2005) Shape matching and object recognition using low distortion correspondences. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 26–33

Konwer A, Bhunia AK, Bhunia AK et al (2019) Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recognition 85:172–184CrossRef

10.

Bissacco A, Cummins M, Netzer Y, et al (2013) Photoocr: uncontrolled conditions. In: Proceedings of the ieee international conference on computer vision, pp 785–792

11.

Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

12.

Bookstein FL (1989) Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Mach Intell 11(6):567–585MATHCrossRef

13.

Bosch A, Zisserman A, Munoz X (2007) Image classification using random forests and ferns. In: 2007 IEEE 11th international conference on computer vision, pp 1–8, https://doi.org/10.1109/ICCV.2007.4409066

14.

Burie JC, Chazalon J, Coustaty M, et al (2015) Icdar2015 competition on smartphone document capture and ocr (smartdoc). In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp 1161–1165

15.

Cai Y, Wang W, Ren H et al (2019) Spn: short path network for scene text detection. Neural Comput Appl. https://doi.org/10.1007/s00521-019-04093-0CrossRef

16.

Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698CrossRef

17.

Chakraborty N, Kundu S, Paul S et al (2020) Language identification from multi-lingual scene text images: a cnn based classifier ensemble approach. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02528-4CrossRef

18.

Chen D, Bourlard H, Thiran JP (2001) Text identification in complex background using svm. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, IEEE, pp II–II

19.

Chen H, Tsai SS, Schroth G, et al (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE international conference on image processing, pp 2609–2612, https://doi.org/10.1109/ICIP.2011.6116200

20.

Chen J, Lian Z, Wang Y et al (2019) Irregular scene text detection via attention guided border labeling. Sci China Inf Sci 62(12):220103CrossRef

21.

Chen X, Yuille AL (2004) Detecting and natural scenes. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004., pp II–366–II–373 Vol.2, https://doi.org/10.1109/CVPR.2004.1315187

22.

Ch’ng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 935–942

23.

Chng CK, Liu Y, Sun Y, et al (2019) Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1571–1576

24.

Cho MS, Seok JH, Lee S, et al (2011) Scene text extraction by superpixel crfs combining multiple character features. In: 2011 international conference on document analysis and recognition, IEEE, pp 1034–1038

25.

Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRef

26.

Dai P, Zhang S, Zhang H, et al (2021) Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7393–7402

27.

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, pp 886–893

28.

Dastidar SG, Dutta K, Das N, et al (2021) Exploring knowledge distillation of a deep neural network for multi-script identification. In: International conference on computational intelligence in communications and business analytics, Springer, pp 150–162

29.

De Campos TE, Babu BR, Varma M et al (2009) Character recognition in natural images. VISAPP 2:7

30.

Deng D, Liu H, Li X, et al (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence

31.

Deng L, Gong Y, Lin Y et al (2019) Detecting multi-oriented text with corner-based region proposals. Neurocomputing 334:134–142CrossRef

32.

Deng L, Gong Y, Lu X et al (2019) Stela: a real-time scene text detector with learned anchor. IEEE Access 7:153400–153407CrossRef

33.

Desolneux A, Moisan L, More JM (2003) A grouping principle and four applications. IEEE Trans Pattern Anal Mach Intell 25(4):508–513CrossRef

34.

Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

35.

Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2963–2970

36.

Felzenszwalb PF, Huttenlocher DP (2005) Pictorial structures for object recognition. Int J Comput Vis 61(1):55–79CrossRef

37.

Feng Y, Song Y, Zhang Y (2016) Scene text detection based on multi-scale swt and edge filtering. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 645–650

38.

Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: icml, Citeseer, pp 148–156

39.

Fujii Y, Driesen K, Baccash J et al (2017) Sequence-to-label script identification for multilingual ocr. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 161–168

40.

Ghosh S, Chaudhuri BB (2011) Composite script identification and orientation detection for indian text images. In: 2011 international conference on document analysis and recognition, IEEE, pp 294–298

41.

Gllavata J, Freisleben B (2005) Script recognition in images with complex backgrounds. In: Proceedings of the fifth IEEE international symposium on signal processing and information technology 2005:589–594. https://doi.org/10.1109/ISSPIT.2005.1577163

42.

Goel V, Mishra A, Alahari K et al (2013) Whole is greater than sum of parts: Recognizing scene text words. In: 2013 12th international conference on document analysis and recognition, pp 398–402, https://doi.org/10.1109/ICDAR.2013.87

43.

Gomez L, Karatzas D (2015) Object proposals for text extraction in the wild. In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, pp 206–210

44.

Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 192–197

45.

Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit 67:85–96CrossRef

46.

Gordo A (2015) Supervised mid-level features for word image representation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2956–2964

47.

Graves A, Fernández S, Gomez F et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369–376

48.

Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324

49.

Hanif SM, Prevost L (2009) Text detection and localization in complex scene images using constrained adaboost algorithm. In: 2009 10th international conference on document analysis and recognition, IEEE, pp 1–5

50.

Hanif SM, Prevost L, Negri PA (2008) A cascade detector for text detection in natural scene images. In: 2008 19th international conference on pattern recognition, IEEE, pp 1–4

51.

He K, Gkioxari G, Dollar P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV)

52.

He P, Huang W, Qiao Y et al (2016) Reading scene text in deep convolutional sequences. In: Proceedings of the AAAI conference on artificial intelligence

53.

He W, Zhang XY, Yin F et al (2017) Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE international conference on computer vision, pp 745–753

54.

Hoiem D, Divvala SK, Hays JH (2009) Pascal voc 2008 challenge. World Lit Today

55.

Hu H, Zhang C, Luo Y et al (2017) Wordsup: Exploiting word annotations for character based text detection. In: Proceedings of the IEEE international conference on computer vision (ICCV)

56.

Huang L, Yang Y, Deng Y et al (2015) Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874

57.

Huang W, Lin Z, Yang J et al (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision, pp 1241–1248

58.

Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, Springer, pp 497–511

59.

Jaderberg M, Simonyan K, Vedaldi A et al (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227

60.

Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, Springer, pp 512–528

61.

Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. arXiv preprint arXiv:1506.02025

62.

Jaderberg M, Simonyan K, Vedaldi A et al (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNetCrossRef

63.

Jiang Y, Zhu X, Wang X et al (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579

64.

Kang C, Kim G, Yoo S (2017) Detection and recognition of text embedded in online images via neural context models. In: Proceedings of the AAAI conference on artificial intelligence

65.

Karatzas D, Shafait F, Uchida S et al (2013) Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition, IEEE, pp 1484–1493

66.

Karatzas D, Gomez-Bigorda L, Nicolaou A et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1156–1160

67.

Karen S, Zisserman A (2014) Deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

68.

Kaur A, Shrawankar U (2017) Adverse conditions and techniques for cross-lingual text recognition. In: 2017 international conference on innovative mechanisms for industry applications (ICIMIA), IEEE, pp 70–74

69.

Kaur H, Singh R (2018) Natural scene text localization by convolution neural network with svm

70.

Keserwani P, De K, Roy PP et al (2019) Zero shot learning based script identification in the wild. In: 2019 international conference on document analysis and recognition (ICDAR), pp 987–992, https://doi.org/10.1109/ICDAR.2019.00162

71.

Keserwani P, Dhankhar A, Saini R et al (2021) Quadbox: quadrilateral bounding box based scene text detection using vector regression. IEEE Access 9:36802–36818. https://doi.org/10.1109/ACCESS.2021.3063030CrossRef

72.

Khatib T, Karajeh H, Mohammad H et al (2015) A hybrid multilevel text extraction algorithm in scene images. Sci Res Essays 10(3):105–113CrossRef

73.

Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans Pattern Anal Mach Intell 25(12):1631–1639CrossRef

74.

Kobchaisawat T, Chalidabhongse TH, Satoh S (2020) Scene text detection with polygon offsetting and border augmentation. Electronics 9(1):117CrossRef

75.

Koo HI, Cho NI (2011) Text-line extraction in handwritten chinese documents based on an energy minimization framework. IEEE Trans Image Process 21(3):1169–1175MathSciNetMATH

76.

Krasin I, Duerig T, Alldrin N, et al (2017) Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github com/openimages 2(3):18

77.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

78.

Kumar D, Prasad MA, Ramakrishnan A (2013) Multi-script robust reading competition in icdar 2013. In: Proceedings of the 4th international workshop on multilingual OCR, pp 1–5

79.

Lazebnik S, Schmid C, Ponce J (2005) A sparse texture representation using local affine regions. IEEE Trans Pattern Anal Mach Intell 27(8):1265–1278CrossRef

80.

Lee CY, Bhardwaj A, Di W et al (2014) Region-based discriminative feature pooling for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4050–4057

81.

Lee CY, Baek Y, Lee H (2019) Tedeval: A fair evaluation metric for scene text detectors. In: 2019 international conference on document analysis and recognition workshops (ICDARW), IEEE, pp 14–17

82.

Lee S, Cho MS, Jung K, et al (2010) Scene text extraction with edge constraint and text collinearity. In: 2010 20th international conference on pattern recognition, IEEE, pp 3983–3986

83.

Li Y, Lu H (2012) Scene text detection via stroke width. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 681–684

84.

Liao M, Shi B, Bai X et al (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI conference on artificial intelligence

85.

Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690MathSciNetMATHCrossRef

86.

Liao M, Zhu Z, Shi B et al (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5909–5918

87.

Liao M, Wan Z, Yao C et al (2020) Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI conference on artificial intelligence 34(07):11,474–11,481. https://doi.org/10.1609/aaai.v34i07.6812, https://ojs.aaai.org/index.php/AAAI/article/view/6812

88.

Lienhart RW, Stuber F (1996) Automatic text recognition in digital videos. In: Image and video processing IV, international society for optics and photonics, pp 180–188

89.

Lin CH, Lucey S (2017) Inverse compositional spatial transformer networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2568–2576

90.

Lin TY, Dollar P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

91.

Litman R, Anschel O, Tsiper S et al (2020) Scatter: selective context attentional scene text recognizer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

92.

Liu J, Liu X, Sheng J et al (2019) Pyramid mask text detector. CoRR abs/1903.11800. arXiv:1903.11800

93.

Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37

94.

Liu X, Kawanishi T, Wu X et al (2016) Scene text recognition with cnn classifier and wfst-based word labeling. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 3999–4004

95.

Liu X, Liang D, Yan S et al (2018) Fots: Fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

96.

Liu Y, Jin L, Xie Z et al (2019) Tightness-aware evaluation protocol for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9612–9620

97.

Liu Y, Chen H, Shen C, et al (2020) Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9809–9818

98.

Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

99.

Long S, Ruan J, Zhang W et al (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36

100.

Lowe G (2004) Sift-the scale invariant feature transform. Int J 2(91–110):2

101.

Lu L, Wu D, Tang Z et al (2021) Mining discriminative patches for script identification in natural scene images. J Intell Fuzzy Syst 40(1):551–563CrossRef

102.

Lu M, Mou Y, Chen CL et al (2021) An efficient text detection model for street signs. Appl Sci 11(13):5962CrossRef

103.

Lucas SM (2005) Icdar 2005 text locating competition results. In: Eighth international conference on document analysis and recognition (ICDAR’05), pp 80–84 Vol. 1, https://doi.org/10.1109/ICDAR.2005.231

104.

Lucas SM, Panaretos A, Sosa L et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. Int J Doc Anal Recognit (IJDAR) 7(2–3):105–122CrossRef

105.

Luo C, Jin L, Sun Z (2019) Moran: A multi-object rectified attention network for scene text recognition. Pattern Recognit 90:109–118CrossRef

106.

Lyu P, Liao M, Yao C et al (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV)

107.

Lyu P, Yao C, Wu W et al (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563

108.

Ma J, Shao W, Ye H et al (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122CrossRef

109.

Ma M, Wang QF, Huang S et al (2021) Residual attention-based multi-scale script identification in scene text images. Neurocomputing 421:222–233CrossRef

110.

Mahajan S, Rani R (2022) Word level script identification using convolutional neural network enhancement for scenic images. Trans Asian Low-Res Lang Inf Process 21(4):1–29CrossRef

111.

Matas J, Chum O, Urban M et al (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767CrossRef

112.

Mathew M, Jain M, Jawahar C (2017) Benchmarking scene text recognition in devanagari, telugu and malayalam. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 42–46

113.

Mei J, Dai L, Shi B et al (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 4053–4058

114.

Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: BMVC-British machine vision conference, BMVA

115.

Mohanty S, Dutta T, Gupta HP (2018) Recurrent global convolutional network for scene text detection. In: 2018 25th IEEE international conference on image processing (ICIP), IEEE, pp 2750–2754

116.

Nagaoka Y, Miyazaki T, Sugaya Y et al (2021) Text detection using multi-stage region proposal network sensitive to text scale. Sensors 21(4):1232CrossRef

117.

Naosekpam V, Bhowmick A, Hazarika SM (2019) Superpixel correspondence for non-parametric scene parsing of natural images. In: International conference on pattern recognition and machine intelligence, Springer, pp 614–622

118.

Naosekpam V, Paul N, Bhowmick A (2019) Dense and partial correspondence in non-parametric scene parsing. In: International conference on machine intelligence and signal processing, Springer, pp 339–350

119.

Naosekpam V, Kumar N, Sahu N (2021) Multi-lingual indian text detector for mobile devices. In: Communications in computer and information science. Springer Singapore, pp 243–254, https://doi.org/10.1007/978-981-16-1092-9_21,

120.

Naosekpam V, Shishir AS, Sahu N (2021) Scene text recognition with orientation rectification via ic-stn. In: TENCON 2021-2021 IEEE region 10 conference (TENCON), IEEE, pp 664–669

121.

Naosekpam V, Aggarwal S, Sahu N (2022) Utextnet: a unet based arbitrary shaped scene text detector. In: Abraham A, Gandhi N, Hanne T et al (eds) Intell Syst Des Appl. Springer International Publishing, Cham, pp 368–378

122.

Nayef N, Yin F, Bizid I et al (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1454–1459

123.

Nayef N, Patel Y, Busta M et al (2019) Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition-rrc-mlt-2019. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1582–1587

124.

Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision, Springer, pp 770–783

125.

Nicolaou A, Bagdanov AD, Liwicki M et al (2015) Sparse radial sampling lbp for writer identification. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 716–720

126.

Novikova T, Barinova O, Kohli P et al (2012) Large-lexicon attribute-consistent text recognition in natural images. In: European conference on computer vision, Springer, pp 752–765

127.

Pan YF, Hou X, Liu CL (2009) Text localization in natural scene images based on conditional random field. In: 2009 10th international conference on document analysis and recognition, IEEE, pp 6–10

128.

Pan YF, Hou X, Liu CL (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813MathSciNetMATH

129.

Phan TQ, Shivakumara P, Ding Z et al (2011) Video script identification based on text lines. In: 2011 international conference on document analysis and recognition, pp 1240–1244, https://doi.org/10.1109/ICDAR.2011.250

130.

Qiao Z, Zhou Y, Yang D et al (2020) Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

131.

Qin H, Zhang H, Wang H et al (2019) An algorithm for scene text detection using multibox and semantic segmentation. Appl Sci 9(6):1054CrossRef

132.

Qin S, Manduchi R (2017) Cascaded segmentation-detection networks for word-level text spotting. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), pp 1275–1282, https://doi.org/10.1109/ICDAR.2017.210

133.

Raghunandan K, Shivakumara P, Roy S et al (2018) Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans Circuits Syst Video Technol 29(4):1145–1162CrossRef

134.

Rahul Y, Sharma RK (2019) Eeg signal-based movement control for mobile robots. Curr Sci 116(12):1993–2000CrossRef

135.

Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

136.

Redmon J, Divvala S, Girshick R et al (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

137.

Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497

138.

Risnumawan A, Shivakumara P, Chan CS et al (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048CrossRef

139.

Rodriguez-Serrano JA, Gordo A, Perronnin F (2015) Label embedding: a frugal baseline for text recognition. Int J Comput Vis 113(3):193–207CrossRef

140.

Rusinol M, Chazalon J, Ogier JM (2014) Combining focus measure operators to predict ocr accuracy in mobile-captured document images. In: 2014 11th IAPR international workshop on document analysis systems, IEEE, pp 181–185

141.

Sen P, Das A, Sahu N (2021) End-to-end scene text recognition system for devanagari and bengali text. In: International conference on intelligent computing & optimization, Springer, pp 352–359

142.

Shahab A, Shafait F, Dengel A (2011) Icdar 2011 robust reading competition challenge 2: scene images. In: 2011 international conference on document analysis and recognition, IEEE, pp 1491–1496

143.

Shao HL, Ji Y, Li Y et al (2021) Bdfpn: Bi-direction feature pyramid network for scene text detection. In: 2021 international joint conference on neural networks (IJCNN), IEEE, pp 1–8

144.

Sharma N, Mandal R, Sharma R et al (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1196–1200

145.

Shi B, Yao C, Zhang C et al (2015) Automatic script identification in the wild. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 531–535

146.

Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304CrossRef

147.

Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recognit 52:448–458CrossRef

148.

Shi B, Wang X, Lyu P et al (2016) Robust scene text recognition with automatic rectification. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4168–4176, https://doi.org/10.1109/CVPR.2016.452

149.

Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558

150.

Shi B, Yao C, Liao M et al (2017) Icdar2017 competition on reading chinese text in the wild (rctw-17). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 1429–1434

151.

Shi B, Yang M, Wang X et al (2018) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41(9):2035–2048CrossRef

152.

Singh AK, Mishra A, Dabral P et al (2016) A simple and effective solution for script identification in the wild. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 428–433

153.

Su B, Lu S (2014) Accurate scene text recognition based on recurrent neural network. In: Asian conference on computer vision, Springer, pp 35–48

154.

Tian Z, Huang W, He T et al (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, Springer, pp 56–72

155.

Varma M, Zisserman A (2002) Classifying images of materials: achieving viewpoint and illumination independence. In: European conference on computer vision, Springer, pp 255–271

156.

Varma M, Zisserman A (2003) Texture classification: are filter banks necessary? In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings., IEEE, pp II–691

157.

Veit A, Matera T, Neumann L et al (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140

158.

Verma M, Sood N, Roy PP et al (2017) Script identification in natural scene images: a dataset and texture-feature based performance evaluation. In: Proceedings of international conference on computer vision and image processing, Springer, pp 309–319

159.

Wang G (2020) Scene text recognition with finer grid rectification. arXiv preprint arXiv:2001.09389

160.

Wang K, Belongie S (2010) Word spotting in the wild. In: European conference on computer vision, Springer, pp 591–604

161.

Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 international conference on computer vision, IEEE, pp 1457–1464

162.

Wang Q, Zheng Y, Betke M (2020) A method for detecting text of arbitrary shapes in natural scenes that improves text spotting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops

163.

Wang T, Wu DJ, Coates A et al (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), IEEE, pp 3304–3308

164.

Wang W, Xie E, Li X et al (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9336–9345

165.

Wang W, Xie E, Song X et al (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8440–8449

166.

Wang X, Jiang Y, Luo Z et al (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6449–6458

167.

Wang Y, Shi C, Xiao B et al (2015) Mrf based text binarization in complex images using stroke feature. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 821–825

168.

Wang Y, Xie H, Zha ZJ et al (2020) Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11753–11762

169.

Wu Y, Natarajan P (2017) Self-organized text detection with minimal post-processing via border learning. In: Proceedings of the IEEE international conference on computer vision, pp 5000–5009

170.

Xiang D, Guo Q, Xia Y (2016) Robust text detection with vertically-regressed proposal network. In: European conference on computer vision, Springer, pp 351–363

171.

Xie E, Zang Y, Shao S et al (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9038–9045

172.

Xu Y, Wang Y, Zhou W et al (2019) Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579. https://doi.org/10.1109/TIP.2019.2900589MathSciNetCrossRefMATH

173.

Yan R, Peng L, Xiao S et al (2021) Primitive representation learning for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 284–293

174.

Yang C, Yin XC, Li Z et al (2017) Adadnns: adaptive ensemble of deep neural networks for scene text recognition. arXiv preprint arXiv:1710.03425

175.

Yang Q, Cheng M, Zhou W et al (2018) Inceptext: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. CoRR abs/1805.01167. arXiv:1805.01167

176.

Yang X, He D, Zhou Z et al (2017) Learning to read irregular text with attention mechanisms. In: IJCAI, p 3

177.

Yao C (2017) Msra text detection 500 database

178.

Yao C, Bai X, Liu W et al (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1083–1090

179.

Yao C, Bai X, Sang N et al (2016) Scene text detection via holistic, multi-channel prediction. CoRR abs/1606.09002. arXiv:1606.09002

180.

Yin XC, Yin X, Huang K et al (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

181.

Yuan TL, Zhu Z, Xu K et al (2018) Chinese text in the wild. arXiv preprint arXiv:1803.00085

182.

Yuliang L, Lianwen J, Shuaitao Z et al (2017) Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170

183.

Zdenek J, Nakayama H (2017) Bag of local convolutional triplets for script identification in scene text. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), pp 369–375, https://doi.org/10.1109/ICDAR.2017.68

184.

Zhan F, Lu S (2019) Esir: End-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2059–2068

185.

Zhang C, Liang B, Huang Z et al (2019) Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10552–10561

186.

Zhang Z, Shen W, Yao C et al (2015) Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

187.

Zhang Z, Zhang C, Shen W et al (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

188.

Zhong Y, Karu K, Jain AK (1995) Locating text in complex color images. Pattern Recognit 28(10):1523–1535CrossRef

189.

Zhong Z, Jin L, Zhang S et al (2016) Deeptext: A unified framework for text proposal generation and text detection in natural images. arXiv preprint arXiv:1605.07314

190.

Zhou X, Yao C, Wen H et al (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560

191.

Zhou Y, Liu S, Zhang Y et al (2014) Text localization in natural scene images with stroke width histogram and superpixel. Signal and information processing association annual summit and conference (APSIPA). Asia-Pacific, IEEE, pp 1–4

192.

Zhou Z, Wu S, Kong S et al (2019) Curve text detection with local segmentation network and curve connection. arXiv preprint arXiv:1903.09837

193.

Zhu X, Jiang Y, Yang S et al (2017) Deep residual text detection network for scene text. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 807–812

Title: Text detection, recognition, and script identification in natural scene images: a Review
Authors: Veronica Naosekpam
Nilkanta Sahu
Publication date: 05-07-2022
Publisher: Springer London
Published in: International Journal of Multimedia Information Retrieval / Issue 3/2022
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-022-00243-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2022

Music emotion recognition based on segment-level two-stage learning

Cross-domain image retrieval: methods and applications

Semantic-enhanced discriminative embedding learning for cross-modal retrieval

Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey

How can users’ comments posted on social media videos be a source of effective tags?

Organ segmentation from computed tomography images using the 3D convolutional neural network: a systematic review

Premium Partner