Skip to main content
Top
Published in: International Journal of Computer Vision 1/2021

27-08-2020

Scene Text Detection and Recognition: The Deep Learning Era

Authors: Shangbang Long, Xin He, Cong Yao

Published in: International Journal of Computer Vision | Issue 1/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the rise and development of deep learning, computer vision has been tremendously transformed and reshaped. As an important research area in computer vision, scene text detection and recognition has been inevitably influenced by this wave of revolution, consequentially entering the era of deep learning. In recent years, the community has witnessed substantial advancements in mindset, methodology and performance. This survey is aimed at summarizing and analyzing the major changes and significant progresses of scene text detection and recognition in the deep learning era. Through this article, we devote to: (1) introduce new insights and ideas; (2) highlight recent techniques and benchmarks; (3) look ahead into future trends. Specifically, we will emphasize the dramatic differences brought by deep learning and remaining grand challenges. We expect that this review paper would serve as a reference book for researchers in this field. Related resources are also collected in our Github repository (https://​github.​com/​Jyouhou/​SceneTextPapers).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Almazán, J., Gordo, A., Fornés, A., & Valveny, E. (2014). Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(12), 2552–2566.CrossRef Almazán, J., Gordo, A., Fornés, A., & Valveny, E. (2014). Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(12), 2552–2566.CrossRef
go back to reference Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.CrossRef Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.CrossRef
go back to reference Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., et al. (2019a). What is wrong with scene text recognition model comparisons? Dataset and model analysis. In Proceedings of the IEEE international conference on computer vision (pp. 4715–4723). Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., et al. (2019a). What is wrong with scene text recognition model comparisons? Dataset and model analysis. In Proceedings of the IEEE international conference on computer vision (pp. 4715–4723).
go back to reference Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019b). Character region awareness for text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 9365–9374). Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019b). Character region awareness for text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 9365–9374).
go back to reference Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. In ICLR 2015. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. In ICLR 2015.
go back to reference Bai, F., Cheng, Z., Niu, Y., Pu, S., & Zhou, S. (2018). Edit probability for scene text recognition. In CVPR 2018. Bai, F., Cheng, Z., Niu, Y., Pu, S., & Zhou, S. (2018). Edit probability for scene text recognition. In CVPR 2018.
go back to reference Bartz, C., Yang, H., & Meinel, C. (2017). See: Towards semi-supervised end-to-end scene text recognition. arXiv preprint arXiv:1712.05404. Bartz, C., Yang, H., & Meinel, C. (2017). See: Towards semi-supervised end-to-end scene text recognition. arXiv preprint arXiv:​1712.​05404.
go back to reference Bissacco, A., Cummins, M., Netzer, Y., & Neven, H. (2013). Photoocr: Reading text in uncontrolled conditions. In Proceedings of the IEEE international conference on computer vision (pp. 785–792). Bissacco, A., Cummins, M., Netzer, Y., & Neven, H. (2013). Photoocr: Reading text in uncontrolled conditions. In Proceedings of the IEEE international conference on computer vision (pp. 785–792).
go back to reference Borisyuk, F., Gordo, A., & Sivakumar, V. (2018). Rosetta: Large scale system for text detection and recognition in images. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 71–79). ACM. Borisyuk, F., Gordo, A., & Sivakumar, V. (2018). Rosetta: Large scale system for text detection and recognition in images. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 71–79). ACM.
go back to reference Busta, M., Neumann, L., & Matas, J. (2015). Fastext: Efficient unconstrained scene text detector. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 1206–1214). Busta, M., Neumann, L., & Matas, J. (2015). Fastext: Efficient unconstrained scene text detector. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 1206–1214).
go back to reference Busta, M., Neumann, L., & Matas, J. (2017). Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In Proceedings of ICCV. Busta, M., Neumann, L., & Matas, J. (2017). Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In Proceedings of ICCV.
go back to reference Chen, X., Yang, J., Zhang, J., & Waibel, A. (2004). Automatic detection and recognition of signs from natural scenes. IEEE Transactions on Image Processing, 13(1), 87–99.CrossRef Chen, X., Yang, J., Zhang, J., & Waibel, A. (2004). Automatic detection and recognition of signs from natural scenes. IEEE Transactions on Image Processing, 13(1), 87–99.CrossRef
go back to reference Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., & Zhou, S. (2017a). Focusing attention: Towards accurate text recognition in natural images. In 2017 IEEE international conference on computer vision (ICCV) (pp. 5086–5094). IEEE. Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., & Zhou, S. (2017a). Focusing attention: Towards accurate text recognition in natural images. In 2017 IEEE international conference on computer vision (ICCV) (pp. 5086–5094). IEEE.
go back to reference Cheng, Z., Liu, X., Bai, F., Niu, Y., Pu, S., & Zhou, S. (2017b). Arbitrarily-oriented text recognition. In CVPR2018. Cheng, Z., Liu, X., Bai, F., Niu, Y., Pu, S., & Zhou, S. (2017b). Arbitrarily-oriented text recognition. In CVPR2018.
go back to reference Ch’ng, C.K., & Chan, C. S. (2017). Total-text: A comprehensive dataset for scene text detection and recognition. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 935–942). IEEE. Ch’ng, C.K., & Chan, C. S. (2017). Total-text: A comprehensive dataset for scene text detection and recognition. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 935–942). IEEE.
go back to reference Chowdhury, M. A., & Deb, K. (2013). Extracting and segmenting container name from container images. International Journal of Computer Applications, 74(19), 18–22.CrossRef Chowdhury, M. A., & Deb, K. (2013). Extracting and segmenting container name from container images. International Journal of Computer Applications, 74(19), 18–22.CrossRef
go back to reference Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., et al. (2011). Text detection and character recognition in scene images with unsupervised feature learning. In 2011 international conference on document analysis and recognition (ICDAR) (pp. 440–445). IEEE. Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., et al. (2011). Text detection and character recognition in scene images with unsupervised feature learning. In 2011 international conference on document analysis and recognition (ICDAR) (pp. 440–445). IEEE.
go back to reference Dai, Y., Huang, Z., Gao, Y., & Chen, K. (2017). Fused text segmentation networks for multi-oriented scene text detection. arXiv preprint arXiv:1709.03272. Dai, Y., Huang, Z., Gao, Y., & Chen, K. (2017). Fused text segmentation networks for multi-oriented scene text detection. arXiv preprint arXiv:​1709.​03272.
go back to reference Dalal, N., & Triggs, B., (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 886–893). IEEE. Dalal, N., & Triggs, B., (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 886–893). IEEE.
go back to reference Deng, D., Liu, H., Li, X., & Cai, D. (2018). Pixellink: Detecting scene text via instance segmentation. Proceedings of AAA, I, 2018. Deng, D., Liu, H., Li, X., & Cai, D. (2018). Pixellink: Detecting scene text via instance segmentation. Proceedings of AAA, I, 2018.
go back to reference DeSouza, G. N., & Kak, A. C. (2002). Vision for mobile robot navigation: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2), 237–267.CrossRef DeSouza, G. N., & Kak, A. C. (2002). Vision for mobile robot navigation: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2), 237–267.CrossRef
go back to reference Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 1532–1545.CrossRef Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8), 1532–1545.CrossRef
go back to reference Dvorin, Y., & Havosha, U. E. (2009). Method and device for instant translation, June 4. US Patent App. 11/998,931. Dvorin, Y., & Havosha, U. E. (2009). Method and device for instant translation, June 4. US Patent App. 11/998,931.
go back to reference Epshtein, B., Ofek, E., & Wexler, Y. (2010). Detecting text in natural scenes with stroke width transform. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2963–2970). IEEE. Epshtein, B., Ofek, E., & Wexler, Y. (2010). Detecting text in natural scenes with stroke width transform. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2963–2970). IEEE.
go back to reference Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1), 98–136.CrossRef
go back to reference Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.CrossRef Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.CrossRef
go back to reference Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., & Berg, A. C. (2017). DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659. Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., & Berg, A. C. (2017). DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:​1701.​06659.
go back to reference Gao, Y., Chen, Y., Wang, J., & Lu, H. (2017). Reading scene text with attention convolutional sequence modeling. arXiv preprint arXiv:1709.04303. Gao, Y., Chen, Y., Wang, J., & Lu, H. (2017). Reading scene text with attention convolutional sequence modeling. arXiv preprint arXiv:​1709.​04303.
go back to reference Girshick, R. (2015). Fast R-CNN. In The IEEE international conference on computer vision (ICCV). Girshick, R. (2015). Fast R-CNN. In The IEEE international conference on computer vision (ICCV).
go back to reference Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 580–587). Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 580–587).
go back to reference Goldberg, A. V. (1997). An efficient implementation of a scaling minimum-cost flow algorithm. Journal of Algorithms, 22(1), 1–29.MathSciNetCrossRef Goldberg, A. V. (1997). An efficient implementation of a scaling minimum-cost flow algorithm. Journal of Algorithms, 22(1), 1–29.MathSciNetCrossRef
go back to reference Gordo, A. (2015). Supervised mid-level features for word image representation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2956–2964). Gordo, A. (2015). Supervised mid-level features for word image representation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2956–2964).
go back to reference Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on machine learning (pp. 369–376). ACM. Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on machine learning (pp. 369–376). ACM.
go back to reference Graves, A., Liwicki, M., Bunke, H., Schmidhuber, J., & Fernández, S. (2008). Unconstrained on-line handwriting recognition with recurrent neural networks. In Advances in neural information processing systems (pp. 577–584). Graves, A., Liwicki, M., Bunke, H., Schmidhuber, J., & Fernández, S. (2008). Unconstrained on-line handwriting recognition with recurrent neural networks. In Advances in neural information processing systems (pp. 577–584).
go back to reference Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2315–2324). Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2315–2324).
go back to reference Ham, Y. K., Kang, M. S., Chung, H. K., Park, R.-H., & Park, G. T. (1995). Recognition of raised characters for automatic classification of rubber tires. Optical Engineering, 34(1), 102–110.CrossRef Ham, Y. K., Kang, M. S., Chung, H. K., Park, R.-H., & Park, G. T. (1995). Recognition of raised characters for automatic classification of rubber tires. Optical Engineering, 34(1), 102–110.CrossRef
go back to reference Han, J., Zhang, D., Cheng, G., Liu, N., & Xu, D. (2018). Advanced deep-learning techniques for salient and category-specific object detection: A survey. IEEE Signal Processing Magazine, 35(1), 84–100.CrossRef Han, J., Zhang, D., Cheng, G., Liu, N., & Xu, D. (2018). Advanced deep-learning techniques for salient and category-specific object detection: A survey. IEEE Signal Processing Magazine, 35(1), 84–100.CrossRef
go back to reference He, D., Yang, X., Liang, C., Zhou, Z., Ororbia, A. G., Kifer, D., & Giles, C. L. (2017a). Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 474–483). IEEE. He, D., Yang, X., Liang, C., Zhou, Z., Ororbia, A. G., Kifer, D., & Giles, C. L. (2017a). Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 474–483). IEEE.
go back to reference He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017b). Mask R-CNN. In 2017 IEEE international conference on computer vision (ICCV) (pp. 2980–2988). IEEE. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017b). Mask R-CNN. In 2017 IEEE international conference on computer vision (ICCV) (pp. 2980–2988). IEEE.
go back to reference He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., & Li, X. (2017c). Single shot text detector with regional attention. In The IEEE international conference on computer vision (ICCV). He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., & Li, X. (2017c). Single shot text detector with regional attention. In The IEEE international conference on computer vision (ICCV).
go back to reference He, P., Huang, W., Qiao, Y., Loy, C. C., & Tang, X. (2016). Reading scene text in deep convolutional sequences. In Thirtieth AAAI conference on artificial intelligence. He, P., Huang, W., Qiao, Y., Loy, C. C., & Tang, X. (2016). Reading scene text in deep convolutional sequences. In Thirtieth AAAI conference on artificial intelligence.
go back to reference He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., & Sun, C. (2018). An end-to-end textspotter with explicit alignment and attention. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5020–5029). He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., & Sun, C. (2018). An end-to-end textspotter with explicit alignment and attention. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5020–5029).
go back to reference He, W., Zhang, X.-Y., Yin, F., & Liu, C.-L. (2017d). Deep direct regression for multi-oriented scene text detection. In The IEEE international conference on computer vision (ICCV). He, W., Zhang, X.-Y., Yin, F., & Liu, C.-L. (2017d). Deep direct regression for multi-oriented scene text detection. In The IEEE international conference on computer vision (ICCV).
go back to reference He, Z., Liu, J., Ma, H., & Li, P. (2005). A new automatic extraction method of container identity codes. IEEE Transactions on Intelligent Transportation Systems, 6(1), 72–78.CrossRef He, Z., Liu, J., Ma, H., & Li, P. (2005). A new automatic extraction method of container identity codes. IEEE Transactions on Intelligent Transportation Systems, 6(1), 72–78.CrossRef
go back to reference Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef
go back to reference Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., & Ding, E. (2017). Wordsup: Exploiting word annotations for character based text detection. In Proceedings of the IEEE international conference on computer vision. 2017. Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., & Ding, E. (2017). Wordsup: Exploiting word annotations for character based text detection. In Proceedings of the IEEE international conference on computer vision. 2017.
go back to reference Huang, W., Lin, Z., Yang, J., & Wang, J. (2013). Text localization in natural images using stroke feature transform and text covariance descriptors. In Proceedings of the IEEE international conference on computer vision (pp. 1241–1248). Huang, W., Lin, Z., Yang, J., & Wang, J. (2013). Text localization in natural images using stroke feature transform and text covariance descriptors. In Proceedings of the IEEE international conference on computer vision (pp. 1241–1248).
go back to reference Huang, W., Qiao, Y., & Tang, X. (2014). Robust scene text detection with convolution neural network induced MSER trees. In European conference on computer vision (pp. 497–511). Springer. Huang, W., Qiao, Y., & Tang, X. (2014). Robust scene text detection with convolution neural network induced MSER trees. In European conference on computer vision (pp. 497–511). Springer.
go back to reference Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014a). Deep structured output learning for unconstrained text recognition. In ICLR2015. Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014a). Deep structured output learning for unconstrained text recognition. In ICLR2015.
go back to reference Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014b). Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227. Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014b). Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:​1406.​2227.
go back to reference Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1–20.MathSciNetCrossRef Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1–20.MathSciNetCrossRef
go back to reference Jaderberg, M., Simonyan, K., Zisserman, A. et al. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025). Jaderberg, M., Simonyan, K., Zisserman, A. et al. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).
go back to reference Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014c). Deep features for text spotting. In In Proceedings of European conference on computer vision (ECCV) (pp. 512–528). Springer. Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014c). Deep features for text spotting. In In Proceedings of European conference on computer vision (ECCV) (pp. 512–528). Springer.
go back to reference Jain, A. K., & Yu, B. (1998). Automatic text location in images and video frames. Pattern Recognition, 31(12), 2055–2076.CrossRef Jain, A. K., & Yu, B. (1998). Automatic text location in images and video frames. Pattern Recognition, 31(12), 2055–2076.CrossRef
go back to reference Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P., & Luo, Z. (2017). R2CNN: rotational region CNN for orientation robust scene text detection. arXiv preprint arXiv:1706.09579. Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P., & Luo, Z. (2017). R2CNN: rotational region CNN for orientation robust scene text detection. arXiv preprint arXiv:​1706.​09579.
go back to reference Jung, K., Kim, K. I., & Jain, A. K. (2004). Text information extraction in images and video: A survey. Pattern Recognition, 37(5), 977–997.CrossRef Jung, K., Kim, K. I., & Jain, A. K. (2004). Text information extraction in images and video: A survey. Pattern Recognition, 37(5), 977–997.CrossRef
go back to reference Kang, L., Li, Y., & Doermann, D. (2014). Orientation robust text line detection in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4034–4041). Kang, L., Li, Y., & Doermann, D. (2014). Orientation robust text line detection in natural images. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4034–4041).
go back to reference Karatzas, D., & Antonacopoulos, A. (2004). Text extraction from web images based on a split-and-merge segmentation method using colour perception. In Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004 (Vol. 2, pp. 634–637). IEEE. Karatzas, D., & Antonacopoulos, A. (2004). Text extraction from web images based on a split-and-merge segmentation method using colour perception. In Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004 (Vol. 2, pp. 634–637). IEEE.
go back to reference Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., et al. (2015). ICDAR 2015 competition on robust reading. In 2015 13th international conference on document analysis and recognition (ICDAR) (pp. 1156–1160). IEEE. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., et al. (2015). ICDAR 2015 competition on robust reading. In 2015 13th international conference on document analysis and recognition (ICDAR) (pp. 1156–1160). IEEE.
go back to reference Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L. G. I., Mestre, S. R., et al. (2013). ICDAR 2013 robust reading competition. In 2013 12th international conference on document analysis and recognition (ICDAR) (pp. 1484–1493). IEEE. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L. G. I., Mestre, S. R., et al. (2013). ICDAR 2013 robust reading competition. In 2013 12th international conference on document analysis and recognition (ICDAR) (pp. 1484–1493). IEEE.
go back to reference Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:​1609.​02907.
go back to reference Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105). Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
go back to reference Lee, C.-Y., & Osindero, S. (2016). Recursive recurrent nets with attention modeling for OCR in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2231–2239). Lee, C.-Y., & Osindero, S. (2016). Recursive recurrent nets with attention modeling for OCR in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2231–2239).
go back to reference Lee, J.-J, Lee, P.-H., Lee, S.-W., Yuille, A., & Koch, C. (2011). Adaboost for text detection in natural scene. In 2011 international conference on document analysis and recognition (ICDAR) (pp. 429–434). IEEE. Lee, J.-J, Lee, P.-H., Lee, S.-W., Yuille, A., & Koch, C. (2011). Adaboost for text detection in natural scene. In 2011 international conference on document analysis and recognition (ICDAR) (pp. 429–434). IEEE.
go back to reference Lee, S., & Kim, J. H. (2013). Integrating multiple character proposals for robust scene text extraction. Image and Vision Computing, 31(11), 823–840.CrossRef Lee, S., & Kim, J. H. (2013). Integrating multiple character proposals for robust scene text extraction. Image and Vision Computing, 31(11), 823–840.CrossRef
go back to reference Li, H., Wang, P., & Shen, C. (2017a). Towards end-to-end text spotting with convolutional recurrent neural networks. In The IEEE international conference on computer vision (ICCV). Li, H., Wang, P., & Shen, C. (2017a). Towards end-to-end text spotting with convolutional recurrent neural networks. In The IEEE international conference on computer vision (ICCV).
go back to reference Li, H., Wang, P., Shen, C., & Zhang, G. (2019). Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI. Li, H., Wang, P., Shen, C., & Zhang, G. (2019). Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI.
go back to reference Li, R., En, M., Li, J., & Zhang, H. (2017b). Weakly supervised text attention network for generating text proposals in scene images. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 324–330). IEEE. Li, R., En, M., Li, J., & Zhang, H. (2017b). Weakly supervised text attention network for generating text proposals in scene images. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 324–330). IEEE.
go back to reference Liao, M., Shi, B., & Bai, X. (2018a). Textboxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 27(8), 3676–3690.MathSciNetCrossRef Liao, M., Shi, B., & Bai, X. (2018a). Textboxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 27(8), 3676–3690.MathSciNetCrossRef
go back to reference Liao, M., Shi, B., Bai, X., Wang, X., & Liu, W. (2017). Textboxes: A fast text detector with a single deep neural network. In AAAI (pp. 4161–4167). Liao, M., Shi, B., Bai, X., Wang, X., & Liu, W. (2017). Textboxes: A fast text detector with a single deep neural network. In AAAI (pp. 4161–4167).
go back to reference Liao, M., Song, B., He, M., Long, S., Yao, C., & Bai, X. (2019a). Synthtext3d: Synthesizing scene text images from 3d virtual worlds. arXiv preprint arXiv:1907.06007. Liao, M., Song, B., He, M., Long, S., Yao, C., & Bai, X. (2019a). Synthtext3d: Synthesizing scene text images from 3d virtual worlds. arXiv preprint arXiv:​1907.​06007.
go back to reference Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., Yao, C., & Bai, X. (2019b). Scene text recognition from two-dimensional perspective. In AAAI. Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., Yao, C., & Bai, X. (2019b). Scene text recognition from two-dimensional perspective. In AAAI.
go back to reference Liao, M., Zhu, Z., Shi, B., Xia, G.-S., & Bai, X. (2018b). Rotation-sensitive regression for oriented scene text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5909–5918). Liao, M., Zhu, Z., Shi, B., Xia, G.-S., & Bai, X. (2018b). Rotation-sensitive regression for oriented scene text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5909–5918).
go back to reference Liu, F., Shen, C., & Lin, G. (2015). Deep convolutional neural fields for depth estimation from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5162–5170). Liu, F., Shen, C., & Lin, G. (2015). Deep convolutional neural fields for depth estimation from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5162–5170).
go back to reference Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2018a). Deep learning for generic object detection: A survey. arXiv preprint arXiv:1809.02165. Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2018a). Deep learning for generic object detection: A survey. arXiv preprint arXiv:​1809.​02165.
go back to reference Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016a). SSD: Single shot multibox detector. In In Proceedings of European conference on computer vision (ECCV) (pp. 21–37). Springer. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016a). SSD: Single shot multibox detector. In In Proceedings of European conference on computer vision (ECCV) (pp. 21–37). Springer.
go back to reference Liu, W., Chen, C., & Wong, K. (2018b). Char-net: A character-aware neural network for distorted scene text recognition. In AAAI conference on artificial intelligence, New Orleans, Louisiana, USA. Liu, W., Chen, C., & Wong, K. (2018b). Char-net: A character-aware neural network for distorted scene text recognition. In AAAI conference on artificial intelligence, New Orleans, Louisiana, USA.
go back to reference Liu, W., Chen, C., Wong, K.-Y. K., Su, Z., & Han, J. (2016b). Star-net: A spatial attention residue network for scene text recognition. In BMVC (Vol. 2, p. 7). Liu, W., Chen, C., Wong, K.-Y. K., Su, Z., & Han, J. (2016b). Star-net: A spatial attention residue network for scene text recognition. In BMVC (Vol. 2, p. 7).
go back to reference Liu, X. (1975). Old book of tang. Beijing: Zhonghua Book Company. Liu, X. (1975). Old book of tang. Beijing: Zhonghua Book Company.
go back to reference Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., & Yan, J. (2018c). FOTS: Fast oriented text spotting with a unified network. In CVPR2018. Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., & Yan, J. (2018c). FOTS: Fast oriented text spotting with a unified network. In CVPR2018.
go back to reference Liu, X., & Samarabandu, J. (2005a). An edge-based text region extraction algorithm for indoor mobile robot navigation. In 2005 IEEE international conference mechatronics and automation (Vol. 2, pp. 701–706). IEEE. Liu, X., & Samarabandu, J. (2005a). An edge-based text region extraction algorithm for indoor mobile robot navigation. In 2005 IEEE international conference mechatronics and automation (Vol. 2, pp. 701–706). IEEE.
go back to reference Liu, X., & Samarabandu, J. K. (2005b). A simple and fast text localization algorithm for indoor mobile robot navigation. In Image processing: Algorithms and systems IV (Vol. 5672, pp. 139–151). International Society for Optics and Photonics. Liu, X., & Samarabandu, J. K. (2005b). A simple and fast text localization algorithm for indoor mobile robot navigation. In Image processing: Algorithms and systems IV (Vol. 5672, pp. 139–151). International Society for Optics and Photonics.
go back to reference Liu, Y., & Jin, L. (2017). Deep matching prior network: Toward tighter multi-oriented text detection. Liu, Y., & Jin, L. (2017). Deep matching prior network: Toward tighter multi-oriented text detection.
go back to reference Liu, Y., Jin, L., Xie, Z., Luo, C., Zhang, S., & Xie, L. (2019). Tightness-aware evaluation protocol for scene text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9612–9620). Liu, Y., Jin, L., Xie, Z., Luo, C., Zhang, S., & Xie, L. (2019). Tightness-aware evaluation protocol for scene text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9612–9620).
go back to reference Liu, Y., Jin, L., Zhang, S., & Zhang, S. (2017). Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:1712.02170. Liu, Y., Jin, L., Zhang, S., & Zhang, S. (2017). Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:​1712.​02170.
go back to reference Liu, Z., Li, Y., Ren, F., Yu, H., & Goh, W. (2018d). Squeezedtext: A real-time scene text recognition by binary convolutional encoder–decoder network. In AAAI. Liu, Z., Li, Y., Ren, F., Yu, H., & Goh, W. (2018d). Squeezedtext: A real-time scene text recognition by binary convolutional encoder–decoder network. In AAAI.
go back to reference Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., & Goh, W. L. (2018e). Learning Markov clustering networks for scene text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6936–6944). Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., & Goh, W. L. (2018e). Learning Markov clustering networks for scene text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6936–6944).
go back to reference Long, S., Guan, Y., Bian, K., & Yao, C. (2020). A new perspective for flexible feature gathering in scene text recognition via character anchor pooling. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2458–2462. IEEE. Long, S., Guan, Y., Bian, K., & Yao, C. (2020). A new perspective for flexible feature gathering in scene text recognition via character anchor pooling. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2458–2462. IEEE.
go back to reference Long, S., Guan, Y., Wang, B., Bian, K., & Yao, C. (2019). Alchemy: Techniques for rectification based irregular scene text recognition. arXiv preprint arXiv:1908.11834. Long, S., Guan, Y., Wang, B., Bian, K., & Yao, C. (2019). Alchemy: Techniques for rectification based irregular scene text recognition. arXiv preprint arXiv:​1908.​11834.
go back to reference Long, S., Ruan, J., Zhang, W., He, X., Wu, W., & Yao, C. (2018). Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of European conference on computer vision (ECCV). Long, S., Ruan, J., Zhang, W., He, X., Wu, W., & Yao, C. (2018). Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of European conference on computer vision (ECCV).
go back to reference Long, S., & Yao, C. (2020). Unrealtext: Synthesizing realistic scene text images from the unreal world. arXiv preprint arXiv:2003.10608. Long, S., & Yao, C. (2020). Unrealtext: Synthesizing realistic scene text images from the unreal world. arXiv preprint arXiv:​2003.​10608.
go back to reference Lyu, P., Liao, M., Yao, C., Wu, W., & Bai, X. (2018a). Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of European conference on computer vision (ECCV). Lyu, P., Liao, M., Yao, C., Wu, W., & Bai, X. (2018a). Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of European conference on computer vision (ECCV).
go back to reference Lyu, P., Yao, C., Wu, W., Yan, S., & Bai, X. (2018b). Multi-oriented scene text detection via corner localization and region segmentation. In 2018 IEEE conference on computer vision and pattern recognition (CVPR). Lyu, P., Yao, C., Wu, W., Yan, S., & Bai, X. (2018b). Multi-oriented scene text detection via corner localization and region segmentation. In 2018 IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., et al. (2018). Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20, 3111–3122.CrossRef Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., et al. (2018). Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20, 3111–3122.CrossRef
go back to reference Mammeri, A., & Boukerche, A. et al. (2016). MSER-based text detection and communication algorithm for autonomous vehicles. In 2016 IEEE symposium on computers and communication (ISCC) (pp. 1218–1223). IEEE. Mammeri, A., & Boukerche, A. et al. (2016). MSER-based text detection and communication algorithm for autonomous vehicles. In 2016 IEEE symposium on computers and communication (ISCC) (pp. 1218–1223). IEEE.
go back to reference Mammeri, A., Khiari, E.-H., & Boukerche, A. (2014). Road-sign text recognition architecture for intelligent transportation systems. In 2014 IEEE 80th vehicular technology conference (VTC Fall) (pp. 1–5). IEEE. Mammeri, A., Khiari, E.-H., & Boukerche, A. (2014). Road-sign text recognition architecture for intelligent transportation systems. In 2014 IEEE 80th vehicular technology conference (VTC Fall) (pp. 1–5). IEEE.
go back to reference Mishra, A., Alahari, K., & Jawahar, C. (2011). An MRF model for binarization of natural scene text. In ICDAR-international conference on document analysis and recognition. IEEE. Mishra, A., Alahari, K., & Jawahar, C. (2011). An MRF model for binarization of natural scene text. In ICDAR-international conference on document analysis and recognition. IEEE.
go back to reference Mishra, A., Alahari, K., & Jawahar, C. (2012). Scene text recognition using higher order language priors. In BMVC-British machine vision conference. BMVA. Mishra, A., Alahari, K., & Jawahar, C. (2012). Scene text recognition using higher order language priors. In BMVC-British machine vision conference. BMVA.
go back to reference Neumann, L., & Matas, J. (2010). A method for text localization and recognition in real-world images. In Asian conference on computer vision (pp. 770–783). Springer. Neumann, L., & Matas, J. (2010). A method for text localization and recognition in real-world images. In Asian conference on computer vision (pp. 770–783). Springer.
go back to reference Neumann, L., & Matas, J. (2012). Real-time scene text localization and recognition. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3538–3545). IEEE. Neumann, L., & Matas, J. (2012). Real-time scene text localization and recognition. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3538–3545). IEEE.
go back to reference Neumann, L., & Matas, J. (2013). On combining multiple segmentations in scene text recognition. In 2013 12th international conference on document analysis and recognition (ICDAR) (pp. 523–527). IEEE. Neumann, L., & Matas, J. (2013). On combining multiple segmentations in scene text recognition. In 2013 12th international conference on document analysis and recognition (ICDAR) (pp. 523–527). IEEE.
go back to reference Nomura, S., Yamanaka, K., Katai, O., Kawakami, H., & Shiose, T. (2005). A novel adaptive morphological approach for degraded character image segmentation. Pattern Recognition, 38(11), 1961–1975.CrossRef Nomura, S., Yamanaka, K., Katai, O., Kawakami, H., & Shiose, T. (2005). A novel adaptive morphological approach for degraded character image segmentation. Pattern Recognition, 38(11), 1961–1975.CrossRef
go back to reference Parkinson, C., Jacobsen, J. J., Ferguson, D. B., & Pombo, S. A. (2016). Instant translation system, Nov. 29. US Patent 9,507,772. Parkinson, C., Jacobsen, J. J., Ferguson, D. B., & Pombo, S. A. (2016). Instant translation system, Nov. 29. US Patent 9,507,772.
go back to reference Qin, S., Bissacco, A., Raptis, M., Fujii, Y., & Xiao, Y. (2019). Towards unconstrained end-to-end text spotting. In Proceedings of the IEEE international conference on computer vision (pp. 4704–4714). Qin, S., Bissacco, A., Raptis, M., Fujii, Y., & Xiao, Y. (2019). Towards unconstrained end-to-end text spotting. In Proceedings of the IEEE international conference on computer vision (pp. 4704–4714).
go back to reference Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T. S., & Wang, Y. (2017). Unrealcv: Virtual worlds for computer vision. In Proceedings of the 25th ACM international conference on multimedia (pp. 1221–1224). ACM. Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T. S., & Wang, Y. (2017). Unrealcv: Virtual worlds for computer vision. In Proceedings of the 25th ACM international conference on multimedia (pp. 1221–1224). ACM.
go back to reference Phan, T. Q., Shivakumara, P., Tian, S., & Tan, C. L. (2013). Recognizing text with perspective distortion in natural scenes. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 569–576). Phan, T. Q., Shivakumara, P., Tian, S., & Tan, C. L. (2013). Recognizing text with perspective distortion in natural scenes. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 569–576).
go back to reference Redmon, J., & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. arXiv preprint. Redmon, J., & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. arXiv preprint.
go back to reference Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 779–788). Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 779–788).
go back to reference Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99). Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99).
go back to reference Rodriguez-Serrano, J. A., Gordo, A., & Perronnin, F. (2015). Label embedding: A frugal baseline for text recognition. International Journal of Computer Vision, 113(3), 193–207.CrossRef Rodriguez-Serrano, J. A., Gordo, A., & Perronnin, F. (2015). Label embedding: A frugal baseline for text recognition. International Journal of Computer Vision, 113(3), 193–207.CrossRef
go back to reference Rodriguez-Serrano, J. A., Perronnin, F., & Meylan, F. (2013). Label embedding for text recognition. In Proceedings of the British machine vision conference. Citeseer. Rodriguez-Serrano, J. A., Perronnin, F., & Meylan, F. (2013). Label embedding for text recognition. In Proceedings of the British machine vision conference. Citeseer.
go back to reference Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Berlin: Springer. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Berlin: Springer.
go back to reference Roy, P. P., Pal, U., Llados, J., & Delalandre, M. (2009). Multi-oriented and multi-sized touching character segmentation using dynamic programming. In 10th international conference on document analysis and recognition, 2009. IEEE. Roy, P. P., Pal, U., Llados, J., & Delalandre, M. (2009). Multi-oriented and multi-sized touching character segmentation using dynamic programming. In 10th international conference on document analysis and recognition, 2009. IEEE.
go back to reference Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef
go back to reference Schroth, G., Hilsenbeck, S., Huitl, R., Schweiger, F., & Steinbach, E. (2011). Exploiting text-related features for content-based image retrieval. In 2011 IEEE international symposium on multimedia (pp. 77–84). IEEE. Schroth, G., Hilsenbeck, S., Huitl, R., Schweiger, F., & Steinbach, E. (2011). Exploiting text-related features for content-based image retrieval. In 2011 IEEE international symposium on multimedia (pp. 77–84). IEEE.
go back to reference Schulz, R., Talbot, B., Lam, O., Dayoub, F., Corke, P., Upcroft, B., & Wyeth, G. (2015). Robot navigation using human cues: A robot navigation system for symbolic goal-directed exploration. In Proceedings of the 2015 IEEE international conference on robotics and automation (ICRA 2015) (pp. 1100–1105). IEEE. Schulz, R., Talbot, B., Lam, O., Dayoub, F., Corke, P., Upcroft, B., & Wyeth, G. (2015). Robot navigation using human cues: A robot navigation system for symbolic goal-directed exploration. In Proceedings of the 2015 IEEE international conference on robotics and automation (ICRA 2015) (pp. 1100–1105). IEEE.
go back to reference Sheshadri, K., & Divvala, S. K. (2012). Exemplar driven character recognition in the wild. In BMVC (pp. 1–10). Sheshadri, K., & Divvala, S. K. (2012). Exemplar driven character recognition in the wild. In BMVC (pp. 1–10).
go back to reference Shi, B., Bai, X., & Belongie, S. (2017a). Detecting oriented text in natural images by linking segments. In The IEEE conference on computer vision and pattern recognition (CVPR). Shi, B., Bai, X., & Belongie, S. (2017a). Detecting oriented text in natural images by linking segments. In The IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference Shi, B., Bai, X., & Yao, C. (2017b). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304.CrossRef Shi, B., Bai, X., & Yao, C. (2017b). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304.CrossRef
go back to reference Shi, B., Wang, X., Lyu, P., Yao, C., & Bai, X. (2016). Robust scene text recognition with automatic rectification. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4168–4176). Shi, B., Wang, X., Lyu, P., Yao, C., & Bai, X. (2016). Robust scene text recognition with automatic rectification. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4168–4176).
go back to reference Shi, B., Yang, M., Wang, X., Lyu, P., Bai, X., & Yao, C. (2018). Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 855–868. Shi, B., Yang, M., Wang, X., Lyu, P., Bai, X., & Yao, C. (2018). Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 855–868.
go back to reference Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., & Zhang, Z. (2013). Scene text recognition using part-based tree-structured character detection. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2961–2968). IEEE. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., & Zhang, Z. (2013). Scene text recognition using part-based tree-structured character detection. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2961–2968). IEEE.
go back to reference Shivakumara, P., Bhowmick, S., Su, B., Tan, C. L., & Pal, U. (2011). A new gradient based character segmentation method for video text recognition. In 2011 international conference on document analysis and recognition (ICDAR). IEEE. Shivakumara, P., Bhowmick, S., Su, B., Tan, C. L., & Pal, U. (2011). A new gradient based character segmentation method for video text recognition. In 2011 international conference on document analysis and recognition (ICDAR). IEEE.
go back to reference Su, B., & Lu, S. (2014). Accurate scene text recognition based on recurrent neural network. In Asian conference on computer vision (pp. 35–48). Springer. Su, B., & Lu, S. (2014). Accurate scene text recognition based on recurrent neural network. In Asian conference on computer vision (pp. 35–48). Springer.
go back to reference Sun, Y., Liu, J., Liu, W., Han, J., Ding, E., & Liu, J. (2019). Chinese street view text: Large-scale Chinese text reading with partially supervised learning. In Proceedings of the IEEE international conference on computer vision (pp. 9086–9095). Sun, Y., Liu, J., Liu, W., Han, J., Ding, E., & Liu, J. (2019). Chinese street view text: Large-scale Chinese text reading with partially supervised learning. In Proceedings of the IEEE international conference on computer vision (pp. 9086–9095).
go back to reference Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104–3112). Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104–3112).
go back to reference Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., & Tan, C. L. (2015). Text flow: A unified text detection system in natural scene images. In Proceedings of the IEEE international conference on computer vision (pp. 4651–4659). Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., & Tan, C. L. (2015). Text flow: A unified text detection system in natural scene images. In Proceedings of the IEEE international conference on computer vision (pp. 4651–4659).
go back to reference Tian, S., Lu, S., & Li, C. (2017). Wetext: Scene text detection under weak supervision. In Proceedings of ICCV. Tian, S., Lu, S., & Li, C. (2017). Wetext: Scene text detection under weak supervision. In Proceedings of ICCV.
go back to reference Tian, Z. Huang, W., He, T., He, P., & Qiao, Y. (2016). Detecting text in natural image with connectionist text proposal network. In In Proceedings of European conference on computer vision (ECCV) (pp. 56–72). Springer. Tian, Z. Huang, W., He, T., He, P., & Qiao, Y. (2016). Detecting text in natural image with connectionist text proposal network. In In Proceedings of European conference on computer vision (ECCV) (pp. 56–72). Springer.
go back to reference Tian, Z., Shu, M., Lyu, P., Li, R., Zhou, C., Shen, X., & Jia, J. (2019). Learning shape-aware embedding for scene text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4234–4243). Tian, Z., Shu, M., Lyu, P., Li, R., Zhou, C., Shen, X., & Jia, J. (2019). Learning shape-aware embedding for scene text detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4234–4243).
go back to reference Tsai, S. S., Chen, H., Chen, D., Schroth, G., Grzeszczuk, R., & Girod, B. (2011). Mobile visual search on printed documents using text and low bit-rate features. In 18th IEEE international conference on image processing (ICIP) (pp. 2601–2604). IEEE. Tsai, S. S., Chen, H., Chen, D., Schroth, G., Grzeszczuk, R., & Girod, B. (2011). Mobile visual search on printed documents using text and low bit-rate features. In 18th IEEE international conference on image processing (ICIP) (pp. 2601–2604). IEEE.
go back to reference Tu, Z., Ma, Y., Liu, W., Bai, X., & Yao, C. (2012). Detecting texts of arbitrary orientations in natural images. In 2012 IEEE conference on computer vision and pattern recognition (pp. 1083–1090). IEEE. Tu, Z., Ma, Y., Liu, W., Bai, X., & Yao, C. (2012). Detecting texts of arbitrary orientations in natural images. In 2012 IEEE conference on computer vision and pattern recognition (pp. 1083–1090). IEEE.
go back to reference Uchida, S. (2014). Text localization and recognition in images and video. In Handbook of document image processing and recognition (pp. 843–883). Springer. Uchida, S. (2014). Text localization and recognition in images and video. In Handbook of document image processing and recognition (pp. 843–883). Springer.
go back to reference Wachenfeld, S., Klein, H.-U., & Jiang, X. (2006). Recognition of screen-rendered text. In 18th international conference on pattern recognition, 2006. ICPR 2006 (Vol. 2, pp. 1086–1089). IEEE. Wachenfeld, S., Klein, H.-U., & Jiang, X. (2006). Recognition of screen-rendered text. In 18th international conference on pattern recognition, 2006. ICPR 2006 (Vol. 2, pp. 1086–1089). IEEE.
go back to reference Wakahara, T., & Kita, K. (2011). Binarization of color character strings in scene images using k-means clustering and support vector machines. In 2011 international conference on document analysis and recognition (ICDAR) (pp. 274–278). IEEE. Wakahara, T., & Kita, K. (2011). Binarization of color character strings in scene images using k-means clustering and support vector machines. In 2011 international conference on document analysis and recognition (ICDAR) (pp. 274–278). IEEE.
go back to reference Wang, C., Yin, F., & Liu, C.-L. (2017). Scene text detection with novel superpixel based character candidate extraction. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 929–934). IEEE. Wang, C., Yin, F., & Liu, C.-L. (2017). Scene text detection with novel superpixel based character candidate extraction. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 929–934). IEEE.
go back to reference Wang, F., Zhao, L., Li, X., Wang, X., & Tao, D. (2018). Geometry-aware scene text detection with instance transformation network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1381–1389). Wang, F., Zhao, L., Li, X., Wang, X., & Tao, D. (2018). Geometry-aware scene text detection with instance transformation network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1381–1389).
go back to reference Wang, K., Babenko, B., & Belongie, S. (2011). End-to-end scene text recognition. In 2011 IEEE international conference on computer vision (ICCV), (pp. 1457–1464). IEEE. Wang, K., Babenko, B., & Belongie, S. (2011). End-to-end scene text recognition. In 2011 IEEE international conference on computer vision (ICCV), (pp. 1457–1464). IEEE.
go back to reference Wang, T., Wu, D. J., Coates, A., & Ng, A. Y. (2012). End-to-end text recognition with convolutional neural networks. In 2012 21st international conference on pattern recognition (ICPR) (pp. 3304–3308). IEEE. Wang, T., Wu, D. J., Coates, A., & Ng, A. Y. (2012). End-to-end text recognition with convolutional neural networks. In 2012 21st international conference on pattern recognition (ICPR) (pp. 3304–3308). IEEE.
go back to reference Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., & Shao, S. (2019a). Shape robust text detection with progressive scale expansion network. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., & Shao, S. (2019a). Shape robust text detection with progressive scale expansion network. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference Wang, X., Jiang, Y., Luo, Z., Liu, C.-L., Choi, H., & Kim, S. (2019b). Arbitrary shape scene text detection with adaptive text region representation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6449–6458). Wang, X., Jiang, Y., Luo, Z., Liu, C.-L., Choi, H., & Kim, S. (2019b). Arbitrary shape scene text detection with adaptive text region representation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6449–6458).
go back to reference Weinman, J., Learned-Miller, E., & Hanson, A. (2007). Fast lexicon-based scene text recognition with sparse belief propagation. In ICDAR (pp. 979–983). IEEE. Weinman, J., Learned-Miller, E., & Hanson, A. (2007). Fast lexicon-based scene text recognition with sparse belief propagation. In ICDAR (pp. 979–983). IEEE.
go back to reference Wolf, C., & Jolion, J.-M. (2006). Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal of Document Analysis and Recognition (IJDAR), 8(4), 280–296.CrossRef Wolf, C., & Jolion, J.-M. (2006). Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal of Document Analysis and Recognition (IJDAR), 8(4), 280–296.CrossRef
go back to reference Wu, L., Zhang, C., Liu, J., Han, J., Liu, J., Ding, E., & Bai, X. (2019). Editing text in the wild. In Proceedings of the 27th ACM international conference on multimedia (pp. 1500–1508). Wu, L., Zhang, C., Liu, J., Han, J., Liu, J., Ding, E., & Bai, X. (2019). Editing text in the wild. In Proceedings of the 27th ACM international conference on multimedia (pp. 1500–1508).
go back to reference Wu, Y., & Natarajan, P. (2017). Self-organized text detection with minimal post-processing via border learning. In Proceedings of the IEEE conference on CVPR (pp. 5000–5009). Wu, Y., & Natarajan, P. (2017). Self-organized text detection with minimal post-processing via border learning. In Proceedings of the IEEE conference on CVPR (pp. 5000–5009).
go back to reference Xia, Y., Tian, F., Wu, L., Lin, J., Qin, T., Yu, N., & Liu, T.-Y. (2017). Deliberation networks: Sequence generation beyond one-pass decoding. In Advances in neural information processing systems (pp. 1784–1794). Xia, Y., Tian, F., Wu, L., Lin, J., Qin, T., Yu, N., & Liu, T.-Y. (2017). Deliberation networks: Sequence generation beyond one-pass decoding. In Advances in neural information processing systems (pp. 1784–1794).
go back to reference Xing, L., Tian, Z., Huang, W., & Scott, M. R. (2019). Convolutional character networks. In Proceedings of the IEEE international conference on computer vision (pp. 9126–9136). Xing, L., Tian, Z., Huang, W., & Scott, M. R. (2019). Convolutional character networks. In Proceedings of the IEEE international conference on computer vision (pp. 9126–9136).
go back to reference Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057). Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057).
go back to reference Xue, C., Lu, S., & Zhan, F. (2018). Accurate scene text detection through border semantics awareness and bootstrapping. In In Proceedings of European conference on computer vision (ECCV). Xue, C., Lu, S., & Zhan, F. (2018). Accurate scene text detection through border semantics awareness and bootstrapping. In In Proceedings of European conference on computer vision (ECCV).
go back to reference Yang, M., Guan, Y., Liao, M., He, X., Bian, K., Bai, S., et al. (2019). Symmetry-constrained rectification network for scene text recognition. In Proceedings of the IEEE international conference on computer vision (pp. 9147–9156). Yang, M., Guan, Y., Liao, M., He, X., Bian, K., Bai, S., et al. (2019). Symmetry-constrained rectification network for scene text recognition. In Proceedings of the IEEE international conference on computer vision (pp. 9147–9156).
go back to reference Yang, X., He, D., Zhou, Z., Kifer, D., & Giles, C. L. (2017). Learning to read irregular text with attention mechanisms. In Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17 (pp. 3280–3286). Yang, X., He, D., Zhou, Z., Kifer, D., & Giles, C. L. (2017). Learning to read irregular text with attention mechanisms. In Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17 (pp. 3280–3286).
go back to reference Yao, C., Bai, X., Shi, B., & Liu, W. (2014). Strokelets: A learned multi-scale representation for scene text recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4042–4049). Yao, C., Bai, X., Shi, B., & Liu, W. (2014). Strokelets: A learned multi-scale representation for scene text recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4042–4049).
go back to reference Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., & Cao, Z. (2016). Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002. Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., & Cao, Z. (2016). Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:​1606.​09002.
go back to reference Ye, Q., & Doermann, D. (2015). Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(7), 1480–1500.CrossRef Ye, Q., & Doermann, D. (2015). Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(7), 1480–1500.CrossRef
go back to reference Ye, Q., Gao, W., Wang, W., & Zeng, W. (2003). A robust text detection algorithm in images and video frames. In IEEE ICICS-PCM (pp. 802–806). Ye, Q., Gao, W., Wang, W., & Zeng, W. (2003). A robust text detection algorithm in images and video frames. In IEEE ICICS-PCM (pp. 802–806).
go back to reference Yi, C., & Tian, Y. (2011). Text string detection from natural scenes by structure-based partition and grouping. IEEE Transactions on Image Processing, 20(9), 2594–2605.MathSciNetCrossRef Yi, C., & Tian, Y. (2011). Text string detection from natural scenes by structure-based partition and grouping. IEEE Transactions on Image Processing, 20(9), 2594–2605.MathSciNetCrossRef
go back to reference Yin, F., Wu, Y.-C, Zhang, X.-Y., & Liu, C.-L. (2017). Scene text recognition with sliding convolutional character models. arXiv preprint arXiv:1709.01727. Yin, F., Wu, Y.-C, Zhang, X.-Y., & Liu, C.-L. (2017). Scene text recognition with sliding convolutional character models. arXiv preprint arXiv:​1709.​01727.
go back to reference Yin, X.-C., Yin, X., Huang, K., & Hao, H.-W. (2014). Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 970–983.CrossRef Yin, X.-C., Yin, X., Huang, K., & Hao, H.-W. (2014). Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 970–983.CrossRef
go back to reference Yin, X.-C., Zuo, Z.-Y., Tian, S., & Liu, C.-L. (2016). Text detection, tracking and recognition in video: A comprehensive survey. IEEE Transactions on Image Processing, 25(6), 2752–2773.MathSciNetCrossRef Yin, X.-C., Zuo, Z.-Y., Tian, S., & Liu, C.-L. (2016). Text detection, tracking and recognition in video: A comprehensive survey. IEEE Transactions on Image Processing, 25(6), 2752–2773.MathSciNetCrossRef
go back to reference Yu, D., Li, X., Zhang, C., Han, J., Liu, J., & Ding, E. (2020). Towards accurate scene text recognition with semantic reasoning networks. arXiv preprint arXiv:2003.12294. Yu, D., Li, X., Zhang, C., Han, J., Liu, J., & Ding, E. (2020). Towards accurate scene text recognition with semantic reasoning networks. arXiv preprint arXiv:​2003.​12294.
go back to reference Zhan, F., & Lu, S. (2019). ESIR: End-to-end scene text recognition via iterative image rectification. In Proceedings of the IEEE conference on computer vision and pattern recognition. Zhan, F., & Lu, S. (2019). ESIR: End-to-end scene text recognition via iterative image rectification. In Proceedings of the IEEE conference on computer vision and pattern recognition.
go back to reference Zhan, F., Lu, S., & Xue, C. (2018). Verisimilar image synthesis for accurate detection and recognition of texts in scenes. Zhan, F., Lu, S., & Xue, C. (2018). Verisimilar image synthesis for accurate detection and recognition of texts in scenes.
go back to reference Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., & Ding, X. (2019). Look more than once: An accurate detector for text of arbitrary shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., & Ding, X. (2019). Look more than once: An accurate detector for text of arbitrary shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference Zhang, D., & Chang, S.-F. (2003). A Bayesian framework for fusing multiple word knowledge models in videotext recognition. In Computer vision and pattern recognition, 2003. IEEE. Zhang, D., & Chang, S.-F. (2003). A Bayesian framework for fusing multiple word knowledge models in videotext recognition. In Computer vision and pattern recognition, 2003. IEEE.
go back to reference Zhang, S., Liu, Y., Jin, L., & Luo, C. (2018). Feature enhancement network: A refined scene text detector. In Proceedings of AAAI, 2018. Zhang, S., Liu, Y., Jin, L., & Luo, C. (2018). Feature enhancement network: A refined scene text detector. In Proceedings of AAAI, 2018.
go back to reference Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., & Yin, X.-C. (2020). Deep relational reasoning graph network for arbitrary shape text detection. arXiv preprint arXiv:2003.07493. Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., & Yin, X.-C. (2020). Deep relational reasoning graph network for arbitrary shape text detection. arXiv preprint arXiv:​2003.​07493.
go back to reference Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., & Bai, X. (2016). Multi-oriented text detection with fully convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., & Bai, X. (2016). Multi-oriented text detection with fully convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference Zhiwei, Z., Linlin, L., & Lim, T. C. (2010). Edge based binarization for video text images. In 2010 20th international conference on pattern recognition (ICPR) (pp. 133–136). IEEE. Zhiwei, Z., Linlin, L., & Lim, T. C. (2010). Edge based binarization for video text images. In 2010 20th international conference on pattern recognition (ICPR) (pp. 133–136). IEEE.
go back to reference Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., & Liang, J. (2017). EAST: An efficient and accurate scene text detector. In The IEEE conference on computer vision and pattern recognition (CVPR). Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., & Liang, J. (2017). EAST: An efficient and accurate scene text detector. In The IEEE conference on computer vision and pattern recognition (CVPR).
go back to reference Zhu, Y., Yao, C., & Bai, X. (2016). Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 10(1), 19–36.CrossRef Zhu, Y., Yao, C., & Bai, X. (2016). Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 10(1), 19–36.CrossRef
go back to reference Zitnick, C. L., & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In Proceedings of European conference on computer vision (ECCV) (pp. 391–405). Springer. Zitnick, C. L., & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In Proceedings of European conference on computer vision (ECCV) (pp. 391–405). Springer.
Metadata
Title
Scene Text Detection and Recognition: The Deep Learning Era
Authors
Shangbang Long
Xin He
Cong Yao
Publication date
27-08-2020
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 1/2021
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01369-0

Other articles of this Issue 1/2021

International Journal of Computer Vision 1/2021 Go to the issue

Premium Partner