Skip to main content

2018 | OriginalPaper | Buchkapitel

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

verfasst von : Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, Cong Yao

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks. However, limited by the representations (axis-aligned rectangles, rotated rectangles or quadrangles) adopted to describe text, existing methods may fall short when dealing with much more free-form text instances, such as curved text, which are actually very common in real-world scenarios. To tackle this problem, we propose a more flexible representation for scene text, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms. In TextSnake, a text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation. Such geometry attributes are estimated via a Fully Convolutional Network (FCN) model. In experiments, the text detector based on TextSnake achieves state-of-the-art or comparable performance on Total-Text and SCUT-CTW1500, the two newly published benchmarks with special emphasis on curved text in natural images, as well as the widely-used datasets ICDAR 2015 and MSRA-TD500. Specifically, TextSnake outperforms the baseline on Total-Text by more than 40% in F-measure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. OSDI 16, 265–283 (2016) Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)
2.
Zurück zum Zitat Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: Proceedings of AAAI (2018) Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: Proceedings of AAAI (2018)
3.
Zurück zum Zitat Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970. IEEE (2010) Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970. IEEE (2010)
4.
Zurück zum Zitat Girshick, R.: Fast R-CNN. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV), December 2015 Girshick, R.: Fast R-CNN. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV), December 2015
5.
Zurück zum Zitat Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016) Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)
6.
Zurück zum Zitat He, D., et al.: Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 474–483. IEEE (2017) He, D., et al.: Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 474–483. IEEE (2017)
7.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
8.
Zurück zum Zitat He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV), October 2017 He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV), October 2017
9.
Zurück zum Zitat Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV), October 2017 Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV), October 2017
10.
Zurück zum Zitat Huang, L., Yang, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 (2015) Huang, L., Yang, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:​1509.​04874 (2015)
12.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)MathSciNetCrossRef Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)MathSciNetCrossRef
14.
Zurück zum Zitat Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015) Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
15.
Zurück zum Zitat Kheng Chng, C., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (2017) Kheng Chng, C., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (2017)
16.
Zurück zum Zitat Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015) Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)
17.
Zurück zum Zitat Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of AAAI, pp. 4161–4167 (2017) Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of AAAI, pp. 4161–4167 (2017)
18.
Zurück zum Zitat Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
20.
Zurück zum Zitat Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection (2017) Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection (2017)
21.
Zurück zum Zitat Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)
22.
Zurück zum Zitat Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
23.
25.
Zurück zum Zitat Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation, pp. 1520–1528 (2015) Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation, pp. 1520–1528 (2015)
26.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
28.
Zurück zum Zitat Sheng, Z., Yuliang, L., Lianwen, J., Canjie, L.: Feature enhancement network: a refined scene text detector. In: Proceedings of AAAI (2018) Sheng, Z., Yuliang, L., Lianwen, J., Canjie, L.: Feature enhancement network: a refined scene text detector. In: Proceedings of AAAI (2018)
29.
Zurück zum Zitat Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
30.
Zurück zum Zitat Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRef Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)CrossRef
31.
Zurück zum Zitat Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. (2018) Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
32.
Zurück zum Zitat Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining, pp. 761–769 (2016) Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining, pp. 761–769 (2016)
33.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
34.
Zurück zum Zitat Tian, S., Lu, S., Li, C.: WeText: scene text detection under weak supervision. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV) (2017) Tian, S., Lu, S., Li, C.: WeText: scene text detection under weak supervision. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV) (2017)
35.
Zurück zum Zitat Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. (IJDAR) 8(4), 280–296 (2006)CrossRef Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. (IJDAR) 8(4), 280–296 (2006)CrossRef
36.
Zurück zum Zitat Wu, Y., Natarajan, P.: Self-organized text detection with minimal post-processing via border learning. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000–5009 (2017) Wu, Y., Natarajan, P.: Self-organized text detection with minimal post-processing via border learning. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000–5009 (2017)
37.
Zurück zum Zitat Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. 23(11), 4737–4749 (2014)MathSciNetCrossRef Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. 23(11), 4737–4749 (2014)MathSciNetCrossRef
38.
Zurück zum Zitat Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1083–1090. IEEE (2012) Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1083–1090. IEEE (2012)
39.
Zurück zum Zitat Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016) Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:​1606.​09002 (2016)
40.
Zurück zum Zitat Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014) Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)
41.
Zurück zum Zitat Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRef Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRef
42.
Zurück zum Zitat Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)CrossRef Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)CrossRef
43.
Zurück zum Zitat Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170 (2017) Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:​1712.​02170 (2017)
44.
Zurück zum Zitat Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2528–2535 (2010) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2528–2535 (2010)
45.
Zurück zum Zitat Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015) Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)
46.
Zurück zum Zitat Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4159–4167 (2016) Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4159–4167 (2016)
47.
Zurück zum Zitat Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
48.
Zurück zum Zitat Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comput. Sci. 10(1), 19–36 (2016)CrossRef Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comput. Sci. 10(1), 19–36 (2016)CrossRef
Metadaten
Titel
TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
verfasst von
Shangbang Long
Jiaqiang Ruan
Wenjie Zhang
Xin He
Wenhao Wu
Cong Yao
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01216-8_2

Premium Partner