Skip to main content
Top
Published in: Artificial Intelligence Review 12/2023

18-06-2023

Scene text understanding: recapitulating the past decade

Authors: Mridul Ghosh, Himadri Mukherjee, Sk Md Obaidullah, Xiao-Zhi Gao, Kaushik Roy

Published in: Artificial Intelligence Review | Issue 12/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Computational perception has indeed been dramatically modified and reformed from handcrafted feature-based techniques to the advent of deep learning. Scene text identification and recognition have inexorably been touched by this bow effort of upheaval, ushering in the period of deep learning. It is an important aspect of machine vision. Society has seen significant improvements in thinking, approach, and effectiveness over time. The goal of this study is to summarize and analyze the important developments and notable advancements in scene text identification and recognition over the past decade. We have discussed the significant handcrafted feature-based techniques which had been regarded as flagship systems in the past. They were succeeded by deep learning-based techniques. We have discussed such approaches from their inception to the development of complex models which have taken scene text identification to the next stage.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
Literature
go back to reference Aberdam A, Litman R, Tsiper S, Anschel O, Slossberg R, Mazor S, Manmatha R, Perona P (2021) Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15302–15312 Aberdam A, Litman R, Tsiper S, Anschel O, Slossberg R, Mazor S, Manmatha R, Perona P (2021) Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15302–15312
go back to reference Afzal MZ, Pastor-Pellicer J, Shafait F, Breuel TM, Dengel A, Liwicki M (2015) Document image binarization using lstm: a sequence learning approach. In: Proceedings of the 3rd international workshop on historical document imaging and processing, pp 79–84 Afzal MZ, Pastor-Pellicer J, Shafait F, Breuel TM, Dengel A, Liwicki M (2015) Document image binarization using lstm: a sequence learning approach. In: Proceedings of the 3rd international workshop on historical document imaging and processing, pp 79–84
go back to reference Agrawal P, Varma R (2012) Text extraction from images. IJCSET 2(4):1083–1087 Agrawal P, Varma R (2012) Text extraction from images. IJCSET 2(4):1083–1087
go back to reference Akata Z, Perronnin F, Harchaoui Z, Schmid C (2013) Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826 Akata Z, Perronnin F, Harchaoui Z, Schmid C (2013) Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826
go back to reference Angadi S, Kodabagi M (2010) Text region extraction from low resolution natural scene images using texture features. In: 2010 IEEE 2nd international advance computing conference (IACC). IEEE, pp 121–128 Angadi S, Kodabagi M (2010) Text region extraction from low resolution natural scene images using texture features. In: 2010 IEEE 2nd international advance computing conference (IACC). IEEE, pp 121–128
go back to reference Atienza R (2021a) Data augmentation for scene text recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1561–1570 Atienza R (2021a) Data augmentation for scene text recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1561–1570
go back to reference Atienza R (2021b) Vision transformer for fast and efficient scene text recognition. In: International conference on document analysis and recognition. Springer, New York, pp 319–334 Atienza R (2021b) Vision transformer for fast and efficient scene text recognition. In: International conference on document analysis and recognition. Springer, New York, pp 319–334
go back to reference Azadboni MK, Samadhiya A, Khatri P (2014) Multi-orientation text detection by skeletonization (motds). In: 2014 2nd international symposium on computational and business intelligence. IEEE, pp 5–9 Azadboni MK, Samadhiya A, Khatri P (2014) Multi-orientation text detection by skeletonization (motds). In: 2014 2nd international symposium on computational and business intelligence. IEEE, pp 5–9
go back to reference Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4715–4723 Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4715–4723
go back to reference Bai X, Shi B, Zhang C, Cai X, Qi L (2017) Text/non-text image classification in the wild with convolutional neural networks. Pattern Recogn 66:437–446 Bai X, Shi B, Zhang C, Cai X, Qi L (2017) Text/non-text image classification in the wild with convolutional neural networks. Pattern Recogn 66:437–446
go back to reference Bai F, Cheng Z, Niu Y, Pu S, Zhou S (2018) Edit probability for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1508–1516 Bai F, Cheng Z, Niu Y, Pu S, Zhou S (2018) Edit probability for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1508–1516
go back to reference Bhattacharyya S, Kumar J, Ghoshal K (2020) Mathematical modeling and computational tools: ICACM 2018, Kharagpur, India, November 23–25, vol 320. Springer, New York Bhattacharyya S, Kumar J, Ghoshal K (2020) Mathematical modeling and computational tools: ICACM 2018, Kharagpur, India, November 23–25, vol 320. Springer, New York
go back to reference Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recogn 85:172–184 Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-lstm network. Pattern Recogn 85:172–184
go back to reference Bissacco A, Cummins M, Netzer Y, Neven H (2013) Photoocr: reading text in uncontrolled conditions. In: Proceedings of the Ieee international conference on computer vision, pp 785–792 Bissacco A, Cummins M, Netzer Y, Neven H (2013) Photoocr: reading text in uncontrolled conditions. In: Proceedings of the Ieee international conference on computer vision, pp 785–792
go back to reference Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 71–79 Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 71–79
go back to reference Boureau Y-L, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2559–2566 Boureau Y-L, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2559–2566
go back to reference Busta M, Neumann L, Matas J (2017) Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE international conference on computer vision, pp 2204–2212 Busta M, Neumann L, Matas J (2017) Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE international conference on computer vision, pp 2204–2212
go back to reference Calvo-Zaragoza J, Gallego A-J (2019) A selectional auto-encoder approach for document image binarization. Pattern Recogn 86:37–47 Calvo-Zaragoza J, Gallego A-J (2019) A selectional auto-encoder approach for document image binarization. Pattern Recogn 86:37–47
go back to reference Cao Y, Ma S, Pan H (2020) Fdta: fully convolutional scene text detection with text attention. IEEE Access 8:155441–155449 Cao Y, Ma S, Pan H (2020) Fdta: fully convolutional scene text detection with text attention. IEEE Access 8:155441–155449
go back to reference Cao D, Dang J, Zhong Y (2021) Towards accurate scene text detection with bidirectional feature pyramid network. Symmetry 13(3):486 Cao D, Dang J, Zhong Y (2021) Towards accurate scene text detection with bidirectional feature pyramid network. Symmetry 13(3):486
go back to reference Chen X, Yuille AL (2004) Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition. CVPR 2004, vol. 2. IEEE Chen X, Yuille AL (2004) Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition. CVPR 2004, vol. 2. IEEE
go back to reference Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE international conference on image processing. IEEE, pp 2609–2612 Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 2011 18th IEEE international conference on image processing. IEEE, pp 2609–2612
go back to reference Chen X, Jin L, Zhu Y, Luo C, Wang T (2021) Text recognition in the wild: a survey. ACM Comput Surv (CSUR) 54(2):1–35 Chen X, Jin L, Zhu Y, Luo C, Wang T (2021) Text recognition in the wild: a survey. ACM Comput Surv (CSUR) 54(2):1–35
go back to reference Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE international conference on computer vision, pp 5076–5084 Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE international conference on computer vision, pp 5076–5084
go back to reference Cheng Z, Xu Y, Bai F, Niu Y, Pu S, Zhou S (2018) Aon: towards arbitrarily-oriented text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5571–5579 Cheng Z, Xu Y, Bai F, Niu Y, Pu S, Zhou S (2018) Aon: towards arbitrarily-oriented text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5571–5579
go back to reference Cheng C, Huang Q, Bai X, Feng B, Liu W (2019) Patch aggregator for scene text script identification. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1077–1083 Cheng C, Huang Q, Bai X, Feng B, Liu W (2019) Patch aggregator for scene text script identification. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1077–1083
go back to reference Ch’ng CK, Chan CS (2017) Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 935–942 Ch’ng CK, Chan CS (2017) Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 935–942
go back to reference Chng CK, Liu Y, Sun Y, Ng CC, Luo C, Ni Z, Fang C, Zhang S, Han J, Ding E, et al (2019) Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1571–1576 Chng CK, Liu Y, Sun Y, Ng CC, Luo C, Ni Z, Fang C, Zhang S, Han J, Ding E, et al (2019) Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1571–1576
go back to reference Chowdhury AR, Bhattacharya U, Parui SK (2011) Text detection of two major Indian scripts in natural scene images. In: International workshop on camera-based document analysis and recognition. Springer, New York, pp 42–57 Chowdhury AR, Bhattacharya U, Parui SK (2011) Text detection of two major Indian scripts in natural scene images. In: International workshop on camera-based document analysis and recognition. Springer, New York, pp 42–57
go back to reference Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu DJ, Ng AY (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: 2011 international conference on document analysis and recognition. IEEE, pp 440–445 Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, Wu DJ, Ng AY (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: 2011 international conference on document analysis and recognition. IEEE, pp 440–445
go back to reference Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893 Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 886–893
go back to reference Darab M, Rahmati M (2012) A hybrid approach to localize farsi text in natural scene images. Procedia Comput. Sci. 13:171–184 Darab M, Rahmati M (2012) A hybrid approach to localize farsi text in natural scene images. Procedia Comput. Sci. 13:171–184
go back to reference Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Methods Eng. 27(4):1071–1092MathSciNet Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Methods Eng. 27(4):1071–1092MathSciNet
go back to reference Dasgupta K, Das S, Bhattacharya U (2020) Scale-invariant multi-oriented text detection in wild scene image. In: 2020 IEEE international conference on image processing (ICIP), pp 2041–2045. IEEE Dasgupta K, Das S, Bhattacharya U (2020) Scale-invariant multi-oriented text detection in wild scene image. In: 2020 IEEE international conference on image processing (ICIP), pp 2041–2045. IEEE
go back to reference Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: International conference on machine learning. PMLR, pp 933–941 Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: International conference on machine learning. PMLR, pp 933–941
go back to reference De Campos TE, Babu BR, Varma M et al (2009) Character recognition in natural images. VISAPP 7:1–10 De Campos TE, Babu BR, Varma M et al (2009) Character recognition in natural images. VISAPP 7:1–10
go back to reference Decker LGL, Pinto A, Campana JLF, Neira MC, dos Santos AA, Conceiçao JS, Angeloni MA, Li LT, et al (2020) MobText: a compact method for scene text localization. VISAPP Decker LGL, Pinto A, Campana JLF, Neira MC, dos Santos AA, Conceiçao JS, Angeloni MA, Li LT, et al (2020) MobText: a compact method for scene text localization. VISAPP
go back to reference Del Gobbo J, Herrera RM (2020) Unconstrained text detection in manga: a new dataset and baseline. In: European conference on computer vision. Springer, New York, pp 629–646 Del Gobbo J, Herrera RM (2020) Unconstrained text detection in manga: a new dataset and baseline. In: European conference on computer vision. Springer, New York, pp 629–646
go back to reference Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32 Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
go back to reference Dey S, Shivakumara P, Raghunandan K, Pal U, Lu T, Kumar GH, Chan CS (2017) Script independent approach for multi-oriented text detection in scene image. Neurocomputing 242:96–112 Dey S, Shivakumara P, Raghunandan K, Pal U, Lu T, Kumar GH, Chan CS (2017) Script independent approach for multi-oriented text detection in scene image. Neurocomputing 242:96–112
go back to reference Dhar D, Chakraborty N, Choudhury S, Paul A, Mollah AF, Basu S, Sarkar R (2020) Multilingual scene text detection using gradient morphology. Int J Comput Vis Image Process (IJCVIP) 10(3):31–43 Dhar D, Chakraborty N, Choudhury S, Paul A, Mollah AF, Basu S, Sarkar R (2020) Multilingual scene text detection using gradient morphology. Int J Comput Vis Image Process (IJCVIP) 10(3):31–43
go back to reference Dizaji KG, Zheng F, Sadoughi N, Yang Y, Deng C, Huang H (2018) Unsupervised deep generative adversarial hashing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3664–3673 Dizaji KG, Zheng F, Sadoughi N, Yang Y, Deng C, Huang H (2018) Unsupervised deep generative adversarial hashing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3664–3673
go back to reference Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2963–2970 Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2963–2970
go back to reference Fang S, Xie H, Zha Z-J, Sun N, Tan J, Zhang Y (2018) Attention and language ensemble for scene text recognition with convolutional sequence modeling. In: Proceedings of the 26th ACM international conference on multimedia, pp 248–256 Fang S, Xie H, Zha Z-J, Sun N, Tan J, Zhang Y (2018) Attention and language ensemble for scene text recognition with convolutional sequence modeling. In: Proceedings of the 26th ACM international conference on multimedia, pp 248–256
go back to reference Fasil O, Manjunath S, Aradhya VM (2017) Word-level script identification from scene images. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications. Springer, New York, pp 417–426 Fasil O, Manjunath S, Aradhya VM (2017) Word-level script identification from scene images. In: Proceedings of the 5th international conference on frontiers in intelligent computing: theory and applications. Springer, New York, pp 417–426
go back to reference Feng Y, Song Y, Zhang Y (2016) Scene text detection based on multi-scale swt and edge filtering. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 645–650 Feng Y, Song Y, Zhang Y (2016) Scene text detection based on multi-scale swt and edge filtering. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 645–650
go back to reference Fernando B, Fromont E, Tuytelaars T (2014) Mining mid-level features for image classification. Int J Comput Vision 108(3):186–203MathSciNet Fernando B, Fromont E, Tuytelaars T (2014) Mining mid-level features for image classification. Int J Comput Vision 108(3):186–203MathSciNet
go back to reference Ganin Y, Lempitsky V (2015) Unsupervised domain adaptation by backpropagation. In: International conference on machine learning. PMLR, pp 1180–1189 Ganin Y, Lempitsky V (2015) Unsupervised domain adaptation by backpropagation. In: International conference on machine learning. PMLR, pp 1180–1189
go back to reference Gao H, Li Y, Wang X, Han J, Li R (2019) Ensemble attention for text recognition in natural images. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–8 Gao H, Li Y, Wang X, Han J, Li R (2019) Ensemble attention for text recognition in natural images. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
go back to reference Gao D, Li K, Wang R, Shan S, Chen X (2020) Multi-modal graph neural network for joint reasoning on vision and scene text. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12746–12756 Gao D, Li K, Wang R, Shan S, Chen X (2020) Multi-modal graph neural network for joint reasoning on vision and scene text. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12746–12756
go back to reference Garcia C, Apostolidis X (2000) Text detection and segmentation in complex color images. In: 2000 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 00CH37100), vol. 4. IEEE, pp 2326–2329 Garcia C, Apostolidis X (2000) Text detection and segmentation in complex color images. In: 2000 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 00CH37100), vol. 4. IEEE, pp 2326–2329
go back to reference Ghosh M, Obaidullah SM, Santosh K, Das N, Roy K (2018) Artistic multi-character script identification using iterative isotropic dilation algorithm. In: International conference on recent trends in image processing and pattern recognition. Springer, New York, pp 49–62 Ghosh M, Obaidullah SM, Santosh K, Das N, Roy K (2018) Artistic multi-character script identification using iterative isotropic dilation algorithm. In: International conference on recent trends in image processing and pattern recognition. Springer, New York, pp 49–62
go back to reference Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2019a) Artistic multi-character script identification. In: Document processing using machine learning. Chapman and Hall/CRC, Boston, pp 28–42 Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2019a) Artistic multi-character script identification. In: Document processing using machine learning. Chapman and Hall/CRC, Boston, pp 28–42
go back to reference Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2019b) Identifying the presence of graphical texts in scene images using cnn. In: 2019 international conference on document analysis and recognition workshops (ICDARW), vol 1. IEEE, pp 86–91 Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2019b) Identifying the presence of graphical texts in scene images using cnn. In: 2019 international conference on document analysis and recognition workshops (ICDARW), vol 1. IEEE, pp 86–91
go back to reference Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Santosh K, Roy K (2019c) Automatic text localization in scene images: a transfer learning based approach. In: National conference on computer vision, pattern recognition, image processing, and graphics. Springer, New York, pp 470–479 Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Santosh K, Roy K (2019c) Automatic text localization in scene images: a transfer learning based approach. In: National conference on computer vision, pattern recognition, image processing, and graphics. Springer, New York, pp 470–479
go back to reference Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2020) Artistic multi-script identification at character level with extreme learning machine. Procedia Comput. Sci. 167:496–505 Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2020) Artistic multi-script identification at character level with extreme learning machine. Procedia Comput. Sci. 167:496–505
go back to reference Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2021a) Lwsinet: a deep learning-based approach towards video script identification. Multimed Tools Appl 1:1–34 Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2021a) Lwsinet: a deep learning-based approach towards video script identification. Multimed Tools Appl 1:1–34
go back to reference Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Gao X-Z, Roy K (2021b) Movie title extraction and script separation using shallow convolution neural network. IEEE Access 9:125184–125201 Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Gao X-Z, Roy K (2021b) Movie title extraction and script separation using shallow convolution neural network. IEEE Access 9:125184–125201
go back to reference Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Santosh K, Roy K (2022) Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition. Vis Comput 38(5):1645–1664 Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Santosh K, Roy K (2022) Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition. Vis Comput 38(5):1645–1664
go back to reference Ghoshal R, Banerjee A (2020) Svm and mlp based segmentation and recognition of text from scene images through an effective binarization scheme. In: Computational intelligence in pattern recognition. Springer, New York, pp 237–246 Ghoshal R, Banerjee A (2020) Svm and mlp based segmentation and recognition of text from scene images through an effective binarization scheme. In: Computational intelligence in pattern recognition. Springer, New York, pp 237–246
go back to reference Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448 Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
go back to reference Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with r* cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1080–1088 Gkioxari G, Girshick R, Malik J (2015) Contextual action recognition with r* cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1080–1088
go back to reference Gllavata J, Ewerth R, Freisleben B (2004) Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 1. IEEE, pp 425–428 Gllavata J, Ewerth R, Freisleben B (2004) Text detection in images based on unsupervised classification of high-frequency wavelet coefficients. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 1. IEEE, pp 425–428
go back to reference Gllavata J, Freisleben B (2005) Script recognition in images with complex backgrounds. In: Proceedings of the fifth IEEE international symposium on signal processing and information technology, 2005. IEEE, pp 589–594 Gllavata J, Freisleben B (2005) Script recognition in images with complex backgrounds. In: Proceedings of the fifth IEEE international symposium on signal processing and information technology, 2005. IEEE, pp 589–594
go back to reference Goel V, Mishra A, Alahari K, Jawahar C (2013) Whole is greater than sum of parts: Recognizing scene text words. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 398–402 Goel V, Mishra A, Alahari K, Jawahar C (2013) Whole is greater than sum of parts: Recognizing scene text words. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 398–402
go back to reference Gomez L, Karatzas D (2013) Multi-script text extraction from natural scenes. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 467–471 Gomez L, Karatzas D (2013) Multi-script text extraction from natural scenes. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 467–471
go back to reference Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: 2016 12th IAPR workshop on document analysis systems (DAS). IEEE, pp 192–197 Gomez L, Karatzas D (2016) A fine-grained approach to scene text script identification. In: 2016 12th IAPR workshop on document analysis systems (DAS). IEEE, pp 192–197
go back to reference Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96 Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96
go back to reference Gonzalez A, Bergasa LM, Yebes JJ, Bronte S (2012) Text location in complex images. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 617–620 Gonzalez A, Bergasa LM, Yebes JJ, Bronte S (2012) Text location in complex images. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 617–620
go back to reference Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V (2013a) Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv:1312.6082 Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V (2013a) Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv:​1312.​6082
go back to reference Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013b) Maxout networks. In: International conference on machine learning. PMLR, pp 1319–1327 Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013b) Maxout networks. In: International conference on machine learning. PMLR, pp 1319–1327
go back to reference Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:1–10 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:1–10
go back to reference Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324 Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324
go back to reference He T, Huang W, Qiao Y, Yao J (2016a) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541MathSciNetMATH He T, Huang W, Qiao Y, Yao J (2016a) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541MathSciNetMATH
go back to reference He K, Zhang X, Ren S, Sun J (2016b) Deep residual learning for image recognition. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016b) Deep residual learning for image recognition. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 770–778
go back to reference He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969 He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
go back to reference He W, Zhang X-Y, Yin F, Liu C-L (2018) Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans Image Process 27(11):5406–5419MathSciNet He W, Zhang X-Y, Yin F, Liu C-L (2018) Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans Image Process 27(11):5406–5419MathSciNet
go back to reference Howe NR (2011) A Laplacian energy for document binarization. In: 2011 international conference on document analysis and recognition. IEEE, pp 6–10 Howe NR (2011) A Laplacian energy for document binarization. In: 2011 international conference on document analysis and recognition. IEEE, pp 6–10
go back to reference Hu Z, Pi P, Wu Z, Xue Y, Shen J, Tan J, Lian X, Wang Z, Liu J (2021) E2vts: energy-efficient video text spotting from unmanned aerial vehicles. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 905–913 Hu Z, Pi P, Wu Z, Xue Y, Shen J, Tan J, Lian X, Wang Z, Liu J (2021) E2vts: energy-efficient video text spotting from unmanned aerial vehicles. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 905–913
go back to reference Huang W, Lin Z, Yang J, Wang J (2013a) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision, pp 1241–1248 Huang W, Lin Z, Yang J, Wang J (2013a) Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE international conference on computer vision, pp 1241–1248
go back to reference Huang R, Shivakumara P, Uchida S (2013b) Scene character detection by an edge-ray filter. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 462–466 Huang R, Shivakumara P, Uchida S (2013b) Scene character detection by an edge-ray filter. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 462–466
go back to reference Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision. Springer, New York, pp 497–511 Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision. Springer, New York, pp 497–511
go back to reference Huang Z, Zhong Z, Sun L, Huo Q (2019) Mask r-cnn with pyramid attention network for scene text detection. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 764–772 Huang Z, Zhong Z, Sun L, Huo Q (2019) Mask r-cnn with pyramid attention network for scene text detection. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 764–772
go back to reference Huang J, Pang G, Kovvuri R, Toh M, Liang KJ, Krishnan P, Yin X, Hassner T (2021) A multiplexed network for end-to-end, multilingual ocr. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4547–4557 Huang J, Pang G, Kovvuri R, Toh M, Liang KJ, Krishnan P, Yin X, Hassner T (2021) A multiplexed network for end-to-end, multilingual ocr. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4547–4557
go back to reference Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014a) Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227 Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014a) Synthetic data and artificial neural networks for natural scene text recognition. arXiv:​1406.​2227
go back to reference Jaderberg M, Vedaldi A, Zisserman A (2014b) Deep features for text spotting. In: European conference on computer vision. Springer, New York, pp 512–528 Jaderberg M, Vedaldi A, Zisserman A (2014b) Deep features for text spotting. In: European conference on computer vision. Springer, New York, pp 512–528
go back to reference Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNet Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20MathSciNet
go back to reference Jang I, Ko B, Byun H, Choi Y (2002) Automatic text extraction in news images using morphology. In: Visual communications and image processing 2002, vol 4671. International Society for Optics and Photonics, pp 521–530 Jang I, Ko B, Byun H, Choi Y (2002) Automatic text extraction in news images using morphology. In: Visual communications and image processing 2002, vol 4671. International Society for Optics and Photonics, pp 521–530
go back to reference Juneja M, Vedaldi A, Jawahar C, Zisserman A (2013) Blocks that shout: distinctive parts for scene classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 923–930 Juneja M, Vedaldi A, Jawahar C, Zisserman A (2013) Blocks that shout: distinctive parts for scene classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 923–930
go back to reference Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 1484–1493 Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras LP (2013) Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition. IEEE, pp 1484–1493
go back to reference Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 1156–1160 Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 1156–1160
go back to reference Kasar T, Ramakrishnan AG (2011) Multi-script and multi-oriented text localization from scene images. In: International workshop on camera-based document analysis and recognition. Springer, New York, pp 1–14 Kasar T, Ramakrishnan AG (2011) Multi-script and multi-oriented text localization from scene images. In: International workshop on camera-based document analysis and recognition. Springer, New York, pp 1–14
go back to reference Khalil A, Jarrah M, Al-Ayyoub M, Jararweh Y (2021) Text detection and script identification in natural scene images using deep learning. Comput. Electr. Eng. 91:107043 Khalil A, Jarrah M, Al-Ayyoub M, Jararweh Y (2021) Text detection and script identification in natural scene images using deep learning. Comput. Electr. Eng. 91:107043
go back to reference Khan T, Mollah AF (2019) Autnt-a component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and d-cnn. Multimed Tools Appl 78(22):32159–32186 Khan T, Mollah AF (2019) Autnt-a component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and d-cnn. Multimed Tools Appl 78(22):32159–32186
go back to reference Khan T, Sarkar R, Mollah AF (2021) Deep learning approaches to scene text detection: a comprehensive review. Artif Intell Rev 54(5):3239–3298 Khan T, Sarkar R, Mollah AF (2021) Deep learning approaches to scene text detection: a comprehensive review. Artif Intell Rev 54(5):3239–3298
go back to reference Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans Pattern Anal Mach Intell 25(12):1631–1639 Kim KI, Jung K, Kim JH (2003) Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans Pattern Anal Mach Intell 25(12):1631–1639
go back to reference Kim K-H, Hong S, Roh B, Cheon Y, Park M (2016) Pvanet: deep but lightweight neural networks for real-time object detection. arXiv:1608.08021 Kim K-H, Hong S, Roh B, Cheon Y, Park M (2016) Pvanet: deep but lightweight neural networks for real-time object detection. arXiv:​1608.​08021
go back to reference Kim S, Hori T, Watanabe S (2017) Joint ctc-attention based end-to-end speech recognition using multi-task learning. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4835–4839 Kim S, Hori T, Watanabe S (2017) Joint ctc-attention based end-to-end speech recognition using multi-task learning. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4835–4839
go back to reference Kong H, Tang D, Meng X, Lu T (2019) Garn: a novel generative adversarial recognition network for end-to-end scene character recognition. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 689–694 Kong H, Tang D, Meng X, Lu T (2019) Garn: a novel generative adversarial recognition network for end-to-end scene character recognition. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 689–694
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
go back to reference Kumuda T, Basavaraj L (2015) Detection and localization of text from natural scene images using texture features. In: 2015 IEEE international conference on computational intelligence and computing research (ICCIC). IEEE, pp 1–4 Kumuda T, Basavaraj L (2015) Detection and localization of text from natural scene images using texture features. In: 2015 IEEE international conference on computational intelligence and computing research (ICCIC). IEEE, pp 1–4
go back to reference Lee C-Y, Bhardwaj A, Di W, Jagadeesh V, Piramuthu R (2014) Region-based discriminative feature pooling for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4050–4057 Lee C-Y, Bhardwaj A, Di W, Jagadeesh V, Piramuthu R (2014) Region-based discriminative feature pooling for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4050–4057
go back to reference Lee CY, Baek Y, Lee H (2019) Tedeval: a fair evaluation metric for scene text detectors. In: 2019 international conference on document analysis and recognition workshops (ICDARW), vol 7. IEEE, pp 14–17 Lee CY, Baek Y, Lee H (2019) Tedeval: a fair evaluation metric for scene text detectors. In: 2019 international conference on document analysis and recognition workshops (ICDARW), vol 7. IEEE, pp 14–17
go back to reference Lei Z, Zhao S, Song H, Shen J (2018) Scene text recognition using residual convolutional recurrent neural network. Mach Vis Appl 29(5):861–871 Lei Z, Zhao S, Song H, Shen J (2018) Scene text recognition using residual convolutional recurrent neural network. Mach Vis Appl 29(5):861–871
go back to reference Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147–156 Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(1):147–156
go back to reference Li H, Wang P, Shen C (2017) Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 5238–5246 Li H, Wang P, Shen C (2017) Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 5238–5246
go back to reference Li H, Wang P, Shen C, Zhang G (2019a) Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8610–8617 Li H, Wang P, Shen C, Zhang G (2019a) Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8610–8617
go back to reference Li K, Zhang Y, Li K, Li Y, Fu Y (2019b) Visual semantic reasoning for image-text matching. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4654–4662 Li K, Zhang Y, Li K, Li Y, Fu Y (2019b) Visual semantic reasoning for image-text matching. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4654–4662
go back to reference Liao M, Zhang J, Wan Z, Xie F, Liang J, Lyu P, Yao C, Bai X (2019) Scene text recognition from two-dimensional perspective. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8714–8721 Liao M, Zhang J, Wan Z, Xie F, Liang J, Lyu P, Yao C, Bai X (2019) Scene text recognition from two-dimensional perspective. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8714–8721
go back to reference Lim JJ, Zitnick CL, Dollár P (2013) Sketch tokens: A learned mid-level representation for contour and object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3158–3165 Lim JJ, Zitnick CL, Dollár P (2013) Sketch tokens: A learned mid-level representation for contour and object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3158–3165
go back to reference Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934 Lin G, Milan A, Shen C, Reid I (2017) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934
go back to reference Lin H, Yang P, Zhang F (2020) Review of scene text detection and recognition. Arch Comput Methods Eng 27(2):433–454 Lin H, Yang P, Zhang F (2020) Review of scene text detection and recognition. Arch Comput Methods Eng 27(2):433–454
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016a) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, New York, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016a) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, New York, pp 21–37
go back to reference Liu W, Chen C, Wong K-YK, Su Z, Han J (2016b) Star-net: a spatial attention residue network for scene text recognition. In: BMVC, vol 2, p 7 Liu W, Chen C, Wong K-YK, Su Z, Han J (2016b) Star-net: a spatial attention residue network for scene text recognition. In: BMVC, vol 2, p 7
go back to reference Liu Z, Li Y, Ren F, Goh WL, Yu H (2018b) Squeezedtext: a real-time scene text recognition by binary convolutional encoder-decoder network. In: Thirty-second AAAI conference on artificial intelligence Liu Z, Li Y, Ren F, Goh WL, Yu H (2018b) Squeezedtext: a real-time scene text recognition by binary convolutional encoder-decoder network. In: Thirty-second AAAI conference on artificial intelligence
go back to reference Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recogn (IJDAR) 22(2):143–162 Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recogn (IJDAR) 22(2):143–162
go back to reference Liu Y, He T, Chen H, Wang X, Luo C, Zhang S, Shen C, Jin L (2021) Exploring the capacity of an orderless box discretization network for multi-orientation scene text detection. Int J Comput Vis 129(6):1972–1992 Liu Y, He T, Chen H, Wang X, Luo C, Zhang S, Shen C, Jin L (2021) Exploring the capacity of an orderless box discretization network for multi-orientation scene text detection. Int J Comput Vis 129(6):1972–1992
go back to reference Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129(1):161–184 Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129(1):161–184
go back to reference Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International conference on machine learning. PMLR, pp 97–105 Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International conference on machine learning. PMLR, pp 97–105
go back to reference Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36 Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36
go back to reference Lu S, Su B, Tan CL (2010) Document image binarization using background estimation and stroke edges. Int J Doc Anal Recogn (IJDAR) 13(4):303–314 Lu S, Su B, Tan CL (2010) Document image binarization using background estimation and stroke edges. Int J Doc Anal Recogn (IJDAR) 13(4):303–314
go back to reference Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679 Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679
go back to reference Lucas SM (2005) Icdar 2005 text locating competition results. In: Eighth international conference on document analysis and recognition (ICDAR’05), pp 80–84 Lucas SM (2005) Icdar 2005 text locating competition results. In: Eighth international conference on document analysis and recognition (ICDAR’05), pp 80–84
go back to reference Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122 Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122
go back to reference Luo C, Jin L, Sun Z (2019) Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118 Luo C, Jin L, Sun Z (2019) Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118
go back to reference Lyu MR, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circuits Syst Video Technol 15(2):243–255 Lyu MR, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circuits Syst Video Technol 15(2):243–255
go back to reference Lyu P, Liao M, Yao C, Wu W, Bai X (2018a) Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 67–83 Lyu P, Liao M, Yao C, Wu W, Bai X (2018a) Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 67–83
go back to reference Lyu P, Yao C, Wu W, Yan S, Bai X (2018b) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563 Lyu P, Yao C, Wu W, Yan S, Bai X (2018b) Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7553–7563
go back to reference Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122 Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
go back to reference Ma C, Sun L, Zhong Z, Huo Q (2021a) Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recogn 111:107684 Ma C, Sun L, Zhong Z, Huo Q (2021a) Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recogn 111:107684
go back to reference Ma M, Wang Q-F, Huang S, Huang S, Goulermas Y, Huang K (2021b) Residual attention-based multi-scale script identification in scene text images. Neurocomputing 421:222–233 Ma M, Wang Q-F, Huang S, Huang S, Goulermas Y, Huang K (2021b) Residual attention-based multi-scale script identification in scene text images. Neurocomputing 421:222–233
go back to reference Mafla A, Dey S, Biten AF, Gomez L, Karatzas D (2021) Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 4023–4033 Mafla A, Dey S, Biten AF, Gomez L, Karatzas D (2021) Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 4023–4033
go back to reference Mahajan S, Rani R (2021) Text detection and localization in scene images: a broad review. Artif Intell Rev 54(6):4317–4377 Mahajan S, Rani R (2021) Text detection and localization in scene images: a broad review. Artif Intell Rev 54(6):4317–4377
go back to reference Mathew M, Jain M, Jawahar C (2017) Benchmarking scene text recognition in devanagari, telugu and malayalam. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 7. IEEE, pp 42–46 Mathew M, Jain M, Jawahar C (2017) Benchmarking scene text recognition in devanagari, telugu and malayalam. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 7. IEEE, pp 42–46
go back to reference Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 4053–4058 Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 4053–4058
go back to reference Mishra A, Alahari K, Jawahar C (2012a) Scene text recognition using higher order language priors. In: BMVC-British Machine Vision Conference. BMVA Mishra A, Alahari K, Jawahar C (2012a) Scene text recognition using higher order language priors. In: BMVC-British Machine Vision Conference. BMVA
go back to reference Mishra A, Alahari K, Jawahar C (2012b) Top-down and bottom-up cues for scene text recognition. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2687–2694 Mishra A, Alahari K, Jawahar C (2012b) Top-down and bottom-up cues for scene text recognition. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2687–2694
go back to reference Nagaoka Y, Miyazaki T, Sugaya Y, Omachi S (2021) Text detection using multi-stage region proposal network sensitive to text scale. Sensors 21(4):1232 Nagaoka Y, Miyazaki T, Sugaya Y, Omachi S (2021) Text detection using multi-stage region proposal network sensitive to text scale. Sensors 21(4):1232
go back to reference Naiemi F, Ghods V, Khalesi H (2021) A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst Appl 170:114549 Naiemi F, Ghods V, Khalesi H (2021) A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst Appl 170:114549
go back to reference Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z, Pal U, Rigaud C, Chazalon J, et al (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 1454–1459 Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z, Pal U, Rigaud C, Chazalon J, et al (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 1454–1459
go back to reference Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie J-C, Liu C-l, et al (2019) Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition-rrc-mlt-2019. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1582–1587 Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie J-C, Liu C-l, et al (2019) Icdar2019 robust reading challenge on multi-lingual scene text detection and recognition-rrc-mlt-2019. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1582–1587
go back to reference Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision. Springer, New York, pp 770–783 Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian conference on computer vision. Springer, New York, pp 770–783
go back to reference Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: 2012 ieee conference on computer vision and pattern recognition. IEEE, pp 3538–3545 Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: 2012 ieee conference on computer vision and pattern recognition. IEEE, pp 3538–3545
go back to reference Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. In: Proceedings of the Ieee international conference on computer vision, pp 97–104 Neumann L, Matas J (2013) Scene text localization and recognition with oriented stroke detection. In: Proceedings of the Ieee international conference on computer vision, pp 97–104
go back to reference Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66 Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
go back to reference Pan Y-F, Hou X, Liu C-L (2009) Text localization in natural scene images based on conditional random field. In: 2009 10th international conference on document analysis and recognition. IEEE, pp 6–10 Pan Y-F, Hou X, Liu C-L (2009) Text localization in natural scene images based on conditional random field. In: 2009 10th international conference on document analysis and recognition. IEEE, pp 6–10
go back to reference Pan Y-F, Liu C-L, Hou X (2010a) Fast scene text localization by learning-based filtering and verification. In: 2010 IEEE international conference on image processing. IEEE, pp 2269–2272 Pan Y-F, Liu C-L, Hou X (2010a) Fast scene text localization by learning-based filtering and verification. In: 2010 IEEE international conference on image processing. IEEE, pp 2269–2272
go back to reference Pan Y-F, Hou X, Liu C-L (2010b) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813MathSciNetMATH Pan Y-F, Hou X, Liu C-L (2010b) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813MathSciNetMATH
go back to reference Pandey D, Pandey BK, Wairya S (2021) Hybrid deep neural network with adaptive galactic swarm optimization for text extraction from scene images. Soft Comput 25(2):1563–1580 Pandey D, Pandey BK, Wairya S (2021) Hybrid deep neural network with adaptive galactic swarm optimization for text extraction from scene images. Soft Comput 25(2):1563–1580
go back to reference Pastor-Pellicer J, España-Boquera S, Zamora-Martínez F, Afzal MZ, Castro-Bleda MJ (2015) Insights on the use of convolutional neural networks for document image binarization. In: International work-conference on artificial neural networks. Springer, New York, pp 115–126 Pastor-Pellicer J, España-Boquera S, Zamora-Martínez F, Afzal MZ, Castro-Bleda MJ (2015) Insights on the use of convolutional neural networks for document image binarization. In: International work-conference on artificial neural networks. Springer, New York, pp 115–126
go back to reference Paul S, Saha S, Basu S, Saha PK, Nasipuri M (2019) Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter. Multimed Tools Appl 78(13):18017–18036 Paul S, Saha S, Basu S, Saha PK, Nasipuri M (2019) Text localization in camera captured images using fuzzy distance transform based adaptive stroke filter. Multimed Tools Appl 78(13):18017–18036
go back to reference Pei Z, Cao Z, Long M, Wang J (2018) Multi-adversarial domain adaptation. In: Thirty-second AAAI conference on artificial intelligence Pei Z, Cao Z, Long M, Wang J (2018) Multi-adversarial domain adaptation. In: Thirty-second AAAI conference on artificial intelligence
go back to reference Peng X, Cao H, Natarajan P (2017) Using convolutional encoder-decoder for document image binarization. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 708–713 Peng X, Cao H, Natarajan P (2017) Using convolutional encoder-decoder for document image binarization. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 708–713
go back to reference Phan TQ, Shivakumara P, Ding Z, Lu S, Tan CL (2011) Video script identification based on text lines. In: 2011 international conference on document analysis and recognition. IEEE, pp 1240–1244 Phan TQ, Shivakumara P, Ding Z, Lu S, Tan CL (2011) Video script identification based on text lines. In: 2011 international conference on document analysis and recognition. IEEE, pp 1240–1244
go back to reference Phan TQ, Shivakumara P, Tan CL (2012) Detecting text in the real world. In: Proceedings of the 20th ACM international conference on multimedia, pp 765–768 Phan TQ, Shivakumara P, Tan CL (2012) Detecting text in the real world. In: Proceedings of the 20th ACM international conference on multimedia, pp 765–768
go back to reference Phan TQ, Shivakumara P, Tian S, Tan CL (2013) Recognizing text with perspective distortion in natural scenes. In: Proceedings of the IEEE international conference on computer vision, pp 569–576 Phan TQ, Shivakumara P, Tian S, Tan CL (2013) Recognizing text with perspective distortion in natural scenes. In: Proceedings of the IEEE international conference on computer vision, pp 569–576
go back to reference Pratikakis I, Gatos B, Ntirogiannis K (2013) Icdar 2013 document image binarization contest (dibco 2013). In: 2013 12th international conference on document analysis and recognition. IEEE, pp 1471–1476 Pratikakis I, Gatos B, Ntirogiannis K (2013) Icdar 2013 document image binarization contest (dibco 2013). In: 2013 12th international conference on document analysis and recognition. IEEE, pp 1471–1476
go back to reference Qin X, Jiang J, Yuan C-A, Qiao S, Fan W (2020) Arbitrary shape natural scene text detection method based on soft attention mechanism and dilated convolution. IEEE Access 8:122685–122694 Qin X, Jiang J, Yuan C-A, Qiao S, Fan W (2020) Arbitrary shape natural scene text detection method based on soft attention mechanism and dilated convolution. IEEE Access 8:122685–122694
go back to reference Raghunandan K, Shivakumara P, Roy S, Kumar GH, Pal U, Lu T (2018) Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans Circuits Syst Video Technol 29(4):1145–1162 Raghunandan K, Shivakumara P, Roy S, Kumar GH, Pal U, Lu T (2018) Multi-script-oriented text detection and recognition in video/scene/born digital images. IEEE Trans Circuits Syst Video Technol 29(4):1145–1162
go back to reference Rainarli E et al (2021) A decade: review of scene text detection methods. Comput. Sci. Rev. 42:100434MathSciNet Rainarli E et al (2021) A decade: review of scene text detection methods. Comput. Sci. Rev. 42:100434MathSciNet
go back to reference Raisi Z, Naiel MA, Fieguth P, Wardell S, Zelek J (2020) 2d positional embedding-based transformer for scene text recognition. J Comput Vis Imaging Syst 6(1):1–4 Raisi Z, Naiel MA, Fieguth P, Wardell S, Zelek J (2020) 2d positional embedding-based transformer for scene text recognition. J Comput Vis Imaging Syst 6(1):1–4
go back to reference Raisi Z, Naiel MA, Younes G, Wardell S, Zelek JS (2021) Transformer-based text detection in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3162–3171 Raisi Z, Naiel MA, Younes G, Wardell S, Zelek JS (2021) Transformer-based text detection in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3162–3171
go back to reference Rashmi V, Nayak SN (2018) A hybrid approach to localize text in natural scene images. Int J Eng Appl Sci Technol 3(1):53–60 Rashmi V, Nayak SN (2018) A hybrid approach to localize text in natural scene images. Int J Eng Appl Sci Technol 3(1):53–60
go back to reference Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271 Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
go back to reference Ren X, Ramanan D (2013) Histograms of sparse codes for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3246–3253 Ren X, Ramanan D (2013) Histograms of sparse codes for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3246–3253
go back to reference Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048 Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027–8048
go back to reference Risnumawan A, Sulistijono IA, Abawajy J (2016) Text detection in low resolution scene images using convolutional neural network. In: International conference on soft computing and data mining. Springer, New York, pp 366–375 Risnumawan A, Sulistijono IA, Abawajy J (2016) Text detection in low resolution scene images using convolutional neural network. In: International conference on soft computing and data mining. Springer, New York, pp 366–375
go back to reference Sajid U, Chow M, Zhang J, Kim T, Wang G (2021) Parallel scale-wise attention network for effective scene text recognition. arXiv:2104.12076 Sajid U, Chow M, Zhang J, Kim T, Wang G (2021) Parallel scale-wise attention network for effective scene text recognition. arXiv:​2104.​12076
go back to reference Selvam P, Koilraj JAS, Romero CAT, Alharbi M, Mehbodniya A, Webber JL, Sengan S (2022) A transformer-based framework for scene text recognition. IEEE Access 10:100895–100910 Selvam P, Koilraj JAS, Romero CAT, Alharbi M, Mehbodniya A, Webber JL, Sengan S (2022) A transformer-based framework for scene text recognition. IEEE Access 10:100895–100910
go back to reference Sengupta P, Mollah AF (2021) Scene character recognition with morphological filtering and hog features. In: Soft computing techniques and applications. Springer, New York, pp 1–9 Sengupta P, Mollah AF (2021) Scene character recognition with morphological filtering and hog features. In: Soft computing techniques and applications. Springer, New York, pp 1–9
go back to reference Sermanet P, Chintala S, LeCun Y (2012) Convolutional neural networks applied to house numbers digit classification. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 3288–3291 Sermanet P, Chintala S, LeCun Y (2012) Convolutional neural networks applied to house numbers digit classification. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 3288–3291
go back to reference Shahab A, Shafait F, Dengel A (2011) Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In: 2011 international conference on document analysis and recognition, pp 1491–1496 Shahab A, Shafait F, Dengel A (2011) Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In: 2011 international conference on document analysis and recognition, pp 1491–1496
go back to reference Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 1196–1200 Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 1196–1200
go back to reference Sheng F, Chen Z, Xu B (2019) Nrtr: a no-recurrence sequence-to-sequence model for scene text recognition. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 781–786 Sheng F, Chen Z, Xu B (2019) Nrtr: a no-recurrence sequence-to-sequence model for scene text recognition. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 781–786
go back to reference Shi C, Xiao B, Wang C, Zhang Y (2012) Graph-based background suppression for scene text detection. In: 2012 10th IAPR international workshop on document analysis systems. IEEE, pp 210–214 Shi C, Xiao B, Wang C, Zhang Y (2012) Graph-based background suppression for scene text detection. In: 2012 10th IAPR international workshop on document analysis systems. IEEE, pp 210–214
go back to reference Shi C, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2961–2968 Shi C, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z (2013) Scene text recognition using part-based tree-structured character detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2961–2968
go back to reference Shi B, Yao C, Zhang C, Guo X, Huang F, Bai X (2015) Automatic script identification in the wild. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 531–535 Shi B, Yao C, Zhang C, Guo X, Huang F, Bai X (2015) Automatic script identification in the wild. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 531–535
go back to reference Shi B, Bai X, Yao C (2016a) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458 Shi B, Bai X, Yao C (2016a) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458
go back to reference Shi B, Bai X, Yao C (2016b) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304 Shi B, Bai X, Yao C (2016b) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
go back to reference Shi B, Wang X, Lyu P, Yao C, Bai X (2016c) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176 Shi B, Wang X, Lyu P, Yao C, Bai X (2016c) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176
go back to reference Shi B, Bai X, Belongie S (2017a) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558 Shi B, Bai X, Belongie S (2017a) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2550–2558
go back to reference Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X (2017b) Icdar2017 competition on reading Chinese text in the wild (rctw-17). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 1429–1434 Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X (2017b) Icdar2017 competition on reading Chinese text in the wild (rctw-17). In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 1429–1434
go back to reference Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X (2018) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41(9):2035–2048 Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X (2018) Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41(9):2035–2048
go back to reference Shinde A, Patil M (2021) Street view text detection methods. In: 2021 international conference on artificial intelligence and smart systems (ICAIS). IEEE, pp 961–965 Shinde A, Patil M (2021) Street view text detection methods. In: 2021 international conference on artificial intelligence and smart systems (ICAIS). IEEE, pp 961–965
go back to reference Shivakumara P, Phan TQ, Tan CL (2010) A Laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419 Shivakumara P, Phan TQ, Tan CL (2010) A Laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419
go back to reference Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans Circuits Syst Video Technol 22(8):1227–1235 Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans Circuits Syst Video Technol 22(8):1227–1235
go back to reference Shivakumara P, Yuan Z, Zhao D, Lu T, Tan CL (2015) New gradient-spatial-structural features for video script identification. Comput Vis Image Underst 130:35–53 Shivakumara P, Yuan Z, Zhao D, Lu T, Tan CL (2015) New gradient-spatial-structural features for video script identification. Comput Vis Image Underst 130:35–53
go back to reference Simanjuntak GD, Nugroho H (2021) Scene text detection with quadtree-based candidate text regions and convolutional neural network. Int J Electr Eng Inf 13(1):152–162 Simanjuntak GD, Nugroho H (2021) Scene text detection with quadtree-based candidate text regions and convolutional neural network. Int J Electr Eng Inf 13(1):152–162
go back to reference Singh AK, Mishra A, Dabral P, Jawahar C (2016) A simple and effective solution for script identification in the wild. In: 2016 12th IAPR workshop on document analysis systems (DAS). IEEE, pp 428–433 Singh AK, Mishra A, Dabral P, Jawahar C (2016) A simple and effective solution for script identification in the wild. In: 2016 12th IAPR workshop on document analysis systems (DAS). IEEE, pp 428–433
go back to reference Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(4):1376–1405 Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(4):1376–1405
go back to reference Sravani M, Maheswararao A, Murthy MK (2021) Robust detection of video text using an efficient hybrid method via key frame extraction and text localization. Multimed Tools Appl 80(6):9671–9686 Sravani M, Maheswararao A, Murthy MK (2021) Robust detection of video text using an efficient hybrid method via key frame extraction and text localization. Multimed Tools Appl 80(6):9671–9686
go back to reference Sriman B, Schomaker L (2019) Multi-script text versus non-text classification of regions in scene images. J Vis Commun Image Represent 62:23–42 Sriman B, Schomaker L (2019) Multi-script text versus non-text classification of regions in scene images. J Vis Commun Image Represent 62:23–42
go back to reference Su B, Lu S (2014) Accurate scene text recognition based on recurrent neural network. In: Asian conference on computer vision. Springer, New York, pp 35–48 Su B, Lu S (2014) Accurate scene text recognition based on recurrent neural network. In: Asian conference on computer vision. Springer, New York, pp 35–48
go back to reference Su Y-M, Peng H-W, Huang K-W, Yang C-S (2019) Image processing technology for text recognition. In: 2019 international conference on technologies and applications of artificial intelligence (TAAI). IEEE, pp 1–5 Su Y-M, Peng H-W, Huang K-W, Yang C-S (2019) Image processing technology for text recognition. In: 2019 international conference on technologies and applications of artificial intelligence (TAAI). IEEE, pp 1–5
go back to reference Sun L, Huo Q, Jia W, Chen K (2015) A robust approach for text detection from natural scene images. Pattern Recogn 48(9):2906–2920 Sun L, Huo Q, Jia W, Chen K (2015) A robust approach for text detection from natural scene images. Pattern Recogn 48(9):2906–2920
go back to reference Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
go back to reference Tang J, Yang Z, Wang Y, Zheng Q, Xu Y, Bai X (2019) Seglink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recogn 96:106954 Tang J, Yang Z, Wang Y, Zheng Q, Xu Y, Bai X (2019) Seglink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recogn 96:106954
go back to reference Tao Y, Jia Z, Ma R, Xu S (2021) Trig: transformer-based text recognizer with initial embedding guidance. Electronics 10(22):2780 Tao Y, Jia Z, Ma R, Xu S (2021) Trig: transformer-based text recognizer with initial embedding guidance. Electronics 10(22):2780
go back to reference Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/cvf international conference on computer vision, pp 9627–9636 Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/cvf international conference on computer vision, pp 9627–9636
go back to reference Tounsi M, Moalla I, Lebourgeois F, Alimi AM (2017) Cnn based transfer learning for scene script identification. In: International conference on neural information processing. Springer, New York, pp 702–711 Tounsi M, Moalla I, Lebourgeois F, Alimi AM (2017) Cnn based transfer learning for scene script identification. In: International conference on neural information processing. Springer, New York, pp 702–711
go back to reference Turki H, Halima MB, Alimi AM (2016) Text detection in natural scene images using two masks filtering. In: 2016 IEEE/ACS 13th international conference of computer systems and applications (AICCSA). IEEE, pp 1–6 Turki H, Halima MB, Alimi AM (2016) Text detection in natural scene images using two masks filtering. In: 2016 IEEE/ACS 13th international conference of computer systems and applications (AICCSA). IEEE, pp 1–6
go back to reference Turki H, Halima MB, Alimi AM (2017) A hybrid method of natural scene text detection using msers masks in hsv space color. In: Ninth international conference on machine vision (ICMV 2016), vol 10341. International Society for Optics and Photonics, p 1034111 Turki H, Halima MB, Alimi AM (2017) A hybrid method of natural scene text detection using msers masks in hsv space color. In: Ninth international conference on machine vision (ICMV 2016), vol 10341. International Society for Optics and Photonics, p 1034111
go back to reference Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7167–7176 Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7167–7176
go back to reference Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1–10 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:1–10
go back to reference Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140 Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv:​1601.​07140
go back to reference Verma M, Sood N, Roy PP, Raman B (2017) Script identification in natural scene images: a dataset and texture-feature based performance evaluation. In: Proceedings of international conference on computer vision and image processing. Springer, New York, pp 309–319 Verma M, Sood N, Roy PP, Raman B (2017) Script identification in natural scene images: a dataset and texture-feature based performance evaluation. In: Proceedings of international conference on computer vision and image processing. Springer, New York, pp 309–319
go back to reference Wang K, Belongie S (2010) Word spotting in the wild. In: European conference on computer vision. Springer, New York, pp 591–604 Wang K, Belongie S (2010) Word spotting in the wild. In: European conference on computer vision. Springer, New York, pp 591–604
go back to reference Wang J, Hu X (2017) Gated recurrent convolution neural network for ocr. Adv Neural Inf Process Syst 30:1–10 Wang J, Hu X (2017) Gated recurrent convolution neural network for ocr. Adv Neural Inf Process Syst 30:1–10
go back to reference Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 international conference on computer vision. IEEE, pp 1457–1464 Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 international conference on computer vision. IEEE, pp 1457–1464
go back to reference Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 3304–3308 Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 3304–3308
go back to reference Wang X, Wang B, Bai X, Liu W, Tu Z (2013) Max-margin multiple-instance dictionary learning. In: International conference on machine learning, pp 846–854 Wang X, Wang B, Bai X, Liu W, Tu Z (2013) Max-margin multiple-instance dictionary learning. In: International conference on machine learning, pp 846–854
go back to reference Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp 1451–1460. IEEE Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp 1451–1460. IEEE
go back to reference Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9336–9345 Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9336–9345
go back to reference Wang S, Liu Y, He Z, Wang Y, Tang Z (2020a) A quadrilateral scene text detector with two-stage network architecture. Pattern Recogn 102:107230 Wang S, Liu Y, He Z, Wang Y, Tang Z (2020a) A quadrilateral scene text detector with two-stage network architecture. Pattern Recogn 102:107230
go back to reference Wang T, Zhu Y, Jin L, Luo C, Chen X, Wu Y, Wang Q, Cai M (2020b) Decoupled attention network for text recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12216–12224 Wang T, Zhu Y, Jin L, Luo C, Chen X, Wu Y, Wang Q, Cai M (2020b) Decoupled attention network for text recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 12216–12224
go back to reference Wang X, Zheng S, Zhang C, Li R, Gui L (2021a) R-yolo: a real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3):888 Wang X, Zheng S, Zhang C, Li R, Gui L (2021a) R-yolo: a real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3):888
go back to reference Wang P, Li H, Shen C (2021b) Towards end-to-end text spotting in natural scenes. IEEE Trans Pattern Anal Mach Intell Wang P, Li H, Shen C (2021b) Towards end-to-end text spotting in natural scenes. IEEE Trans Pattern Anal Mach Intell
go back to reference Wojna Z, Gorban AN, Lee D-S, Murphy K, Yu Q, Li Y, Ibarz J (2017) Attention-based extraction of structured information from street view imagery. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 844–850 Wojna Z, Gorban AN, Lee D-S, Murphy K, Yu Q, Li Y, Ibarz J (2017) Attention-based extraction of structured information from street view imagery. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 844–850
go back to reference Wolf C, Doermann D (2002) Binarization of low quality text using a markov random field model. In: Object recognition supported by user interaction for service robots, vol 3. IEEE, pp 160–163 Wolf C, Doermann D (2002) Binarization of low quality text using a markov random field model. In: Object recognition supported by user interaction for service robots, vol 3. IEEE, pp 160–163
go back to reference Wolf C, Jolion J-M (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296 Wolf C, Jolion J-M (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296
go back to reference Wu H, Zou B, Zhao Y-Q, Chen Z, Zhu C, Guo J (2016) Natural scene text detection by multi-scale adaptive color clustering and non-text filtering. Neurocomputing 214:1011–1025 Wu H, Zou B, Zhao Y-Q, Chen Z, Zhu C, Guo J (2016) Natural scene text detection by multi-scale adaptive color clustering and non-text filtering. Neurocomputing 214:1011–1025
go back to reference Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019a) Simplifying graph convolutional networks. In: International conference on machine learning. PMLR, pp 6861–6871 Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019a) Simplifying graph convolutional networks. In: International conference on machine learning. PMLR, pp 6861–6871
go back to reference Wu H, Zhang J, Huang K, Liang K, Yu Y (2019b) Fastfcn: rethinking dilated convolution in the backbone for semantic segmentation. arXiv:1903.11816 Wu H, Zhang J, Huang K, Liang K, Yu Y (2019b) Fastfcn: rethinking dilated convolution in the backbone for semantic segmentation. arXiv:​1903.​11816
go back to reference Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403 Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403
go back to reference Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X (2019a) Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579MathSciNetMATH Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X (2019a) Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579MathSciNetMATH
go back to reference Xu H, Su X, Liu T, Guo P, Gao G, Bao F (2019b) A natural scene text extraction approach based on generative adversarial learning. In: International conference on neural information processing. Springer, New York, pp 65–73 Xu H, Su X, Liu T, Guo P, Gao G, Bao F (2019b) A natural scene text extraction approach based on generative adversarial learning. In: International conference on neural information processing. Springer, New York, pp 65–73
go back to reference Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1794–1801 Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1794–1801
go back to reference Yang X, He D, Zhou Z, Kifer D, Giles CL (2017) Learning to read irregular text with attention mechanisms. In: IJCAI, vol 1, p 3 Yang X, He D, Zhou Z, Kifer D, Giles CL (2017) Learning to read irregular text with attention mechanisms. In: IJCAI, vol 1, p 3
go back to reference Yang B, Ma AJ, Yuen PC (2018) Learning domain-shared group-sparse representation for unsupervised domain adaptation. Pattern Recogn 81:615–632 Yang B, Ma AJ, Yuen PC (2018) Learning domain-shared group-sparse representation for unsupervised domain adaptation. Pattern Recogn 81:615–632
go back to reference Yang M, Guan Y, Liao M, He X, Bian K, Bai S, Yao C, Bai X (2019) Symmetry-constrained rectification network for scene text recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9147–9156 Yang M, Guan Y, Liao M, He X, Bian K, Bai S, Yao C, Bai X (2019) Symmetry-constrained rectification network for scene text recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9147–9156
go back to reference Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 1083–1090 Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 1083–1090
go back to reference Yao C, Bai X, Liu W (2014a) A unified framework for multioriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749MathSciNetMATH Yao C, Bai X, Liu W (2014a) A unified framework for multioriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749MathSciNetMATH
go back to reference Yao C, Bai X, Shi B, Liu W (2014b) Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4042–4049 Yao C, Bai X, Shi B, Liu W (2014b) Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4042–4049
go back to reference Yao C, Bai X, Shi B, Liu W (2014c) Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4042–4049 Yao C, Bai X, Shi B, Liu W (2014c) Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4042–4049
go back to reference Yao C, Wu J, Zhou X, Zhang C, Zhou S, Cao Z, Yin Q (2015) Incidental scene text understanding: Recent progresses on icdar 2015 robust reading competition challenge 4. arXiv:1511.09207 Yao C, Wu J, Zhou X, Zhang C, Zhou S, Cao Z, Yin Q (2015) Incidental scene text understanding: Recent progresses on icdar 2015 robust reading competition challenge 4. arXiv:​1511.​09207
go back to reference Yi C, Tian Y (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605MathSciNetMATH Yi C, Tian Y (2011) Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans Image Process 20(9):2594–2605MathSciNetMATH
go back to reference Yi C, Tian Y (2013) Text extraction from scene images by character appearance and structure modeling. Comput Vis Image Underst 117(2):182–194 Yi C, Tian Y (2013) Text extraction from scene images by character appearance and structure modeling. Comput Vis Image Underst 117(2):182–194
go back to reference Yildirim G, Achanta R, Süsstrunk S (2013) Text recognition in natural images using multiclass hough forests. In: Proceedings of the 8th international conference on computer vision theory and applications, vol 1, pp 737–741 Yildirim G, Achanta R, Süsstrunk S (2013) Text recognition in natural images using multiclass hough forests. In: Proceedings of the 8th international conference on computer vision theory and applications, vol 1, pp 737–741
go back to reference Yin X, Yin X-C, Hao H-W, Iqbal K (2012) Effective text localization in natural scene images with mser, geometry-based grouping and adaboost. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 725–728 Yin X, Yin X-C, Hao H-W, Iqbal K (2012) Effective text localization in natural scene images with mser, geometry-based grouping and adaboost. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 725–728
go back to reference Yin X-C, Yin X, Huang K, Hao H-W (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983 Yin X-C, Yin X, Huang K, Hao H-W (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
go back to reference Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937 Yin X-C, Pei W-Y, Zhang J, Hao H-W (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37(9):1930–1937
go back to reference Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 472–480 Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 472–480
go back to reference Yu D, Li X, Zhang C, Liu T, Han J, Liu J, Ding E (2020) Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12113–12122 Yu D, Li X, Zhang C, Liu T, Han J, Liu J, Ding E (2020) Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12113–12122
go back to reference Zdenek J, Nakayama H (2017) Bag of local convolutional triplets for script identification in scene text. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 369–375 Zdenek J, Nakayama H (2017) Bag of local convolutional triplets for script identification in scene text. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 369–375
go back to reference Zhan F, Lu S (2019) Esir: end-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2059–2068 Zhan F, Lu S (2019) Esir: end-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2059–2068
go back to reference Zhang C, Yao C, Shi B, Bai X (2015) Automatic discrimination of text and non-text natural images. In: 2015 13th International conference on document analysis and recognition (icdar). IEEE, pp 886–890 Zhang C, Yao C, Shi B, Bai X (2015) Automatic discrimination of text and non-text natural images. In: 2015 13th International conference on document analysis and recognition (icdar). IEEE, pp 886–890
go back to reference Zhang S, Liu Y, Jin L, Luo C (2018) Feature enhancement network: a refined scene text detector. In: Proceedings of the AAAI conference on artificial intelligence, vol 32 Zhang S, Liu Y, Jin L, Luo C (2018) Feature enhancement network: a refined scene text detector. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
go back to reference Zhang Y, Nie S, Liu W, Xu X, Zhang D, Shen HT (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2740–2749 Zhang Y, Nie S, Liu W, Xu X, Zhang D, Shen HT (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2740–2749
go back to reference Zhang S-X, Zhu X, Yang C, Wang H, Yin X-C (2021a) Adaptive boundary proposal network for arbitrary shape text detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1305–1314 Zhang S-X, Zhu X, Yang C, Wang H, Yin X-C (2021a) Adaptive boundary proposal network for arbitrary shape text detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1305–1314
go back to reference Zhang M, Ma M, Wang P (2021b) Scene text recognition with cascade attention network. In: Proceedings of the 2021 international conference on multimedia retrieval, pp 385–393 Zhang M, Ma M, Wang P (2021b) Scene text recognition with cascade attention network. In: Proceedings of the 2021 international conference on multimedia retrieval, pp 385–393
go back to reference Zhao D, Shivakumara P, Lu S, Tan CL (2012) New spatial-gradient-features for video script identification. In: 2012 10th IAPR international workshop on document analysis systems. IEEE, pp 38–42 Zhao D, Shivakumara P, Lu S, Tan CL (2012) New spatial-gradient-features for video script identification. In: 2012 10th IAPR international workshop on document analysis systems. IEEE, pp 38–42
go back to reference Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890 Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
go back to reference Zharikov I, Nikitin P, Vasiliev I, Dokholyan V (2020) Ddi-100: Dataset for text detection and recognition. In: Proceedings of the 2020 4th international symposium on computer science and intelligent control, pp 1–5 Zharikov I, Nikitin P, Vasiliev I, Dokholyan V (2020) Ddi-100: Dataset for text detection and recognition. In: Proceedings of the 2020 4th international symposium on computer science and intelligent control, pp 1–5
go back to reference Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017a) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560 Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017a) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560
go back to reference Zhuo J, Wang S, Zhang W, Huang Q (2017b) Deep unsupervised convolutional domain adaptation. In: Proceedings of the 25th ACM international conference on multimedia, pp 261–269 Zhuo J, Wang S, Zhang W, Huang Q (2017b) Deep unsupervised convolutional domain adaptation. In: Proceedings of the 25th ACM international conference on multimedia, pp 261–269
go back to reference Zhu Y, Du J (2021) Textmountain: accurate scene text detection via instance segmentation. Pattern Recogn 110:107336 Zhu Y, Du J (2021) Textmountain: accurate scene text detection via instance segmentation. Pattern Recogn 110:107336
Metadata
Title
Scene text understanding: recapitulating the past decade
Authors
Mridul Ghosh
Himadri Mukherjee
Sk Md Obaidullah
Xiao-Zhi Gao
Kaushik Roy
Publication date
18-06-2023
Publisher
Springer Netherlands
Published in
Artificial Intelligence Review / Issue 12/2023
Print ISSN: 0269-2821
Electronic ISSN: 1573-7462
DOI
https://doi.org/10.1007/s10462-023-10530-3

Other articles of this Issue 12/2023

Artificial Intelligence Review 12/2023 Go to the issue

Premium Partner