nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Detecting Text in Natural Image with Connectionist Text Proposal Network

verfasst von : Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy. The sequential proposals are naturally connected by a recurrent neural network, which is seamlessly incorporated into the convolutional network, resulting in an end-to-end trainable model. This allows the CTPN to explore rich context information of image, making it powerful to detect extremely ambiguous text. The CTPN works reliably on multi-scale and multi-language text without further post-processing, departing from previous bottom-up methods requiring multi-step post filtering. It achieves 0.88 and 0.61 F-measure on the ICDAR 2013 and 2015 benchmarks, surpassing recent results [8, 35] by a large margin. The CTPN is computationally efficient with 0.14 s/image, by using the very deep VGG16 model [27]. Online demo is available: http://textdet.com/.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel PlaNet - Photo Geolocation with Convolutional Neural Networks

Nächstes Kapitel Face Recognition Using a Unified 3D Morphable Model

Busta, M., Neumann, L., Matas, J.: FasText: efficient unconstrained scene text detector. In: IEEE International Conference on Computer Vision (ICCV) (2015)

Cheng, M., Zhang, Z., Lin, W., Torr, P.: BING: binarized normed gradients for objectness estimation at 300 fps. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2014)

Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2010)

Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput.Vis. (IJCV) 88(2), 303–338 (2010)CrossRef

Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV)(2015)

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2014)

Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)CrossRef

Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

He, P., Huang, W., Qiao, Y., Loy, C.C., Tang, X.: Reading scene text in deep convolutional sequences. In: The 30th AAAI Conference on Artificial Intelligence (AAAI-16) (2016)

10.

He, T., Huang, W., Qiao, Y., Yao, J.: Accurate text localization in natural image with cascaded convolutional text network (2016). arXiv:1603.09423

11.

He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural networks for scene text detection. IEEE Trans. Image Processing (TIP) 25, 2529–2541 (2016)CrossRefMathSciNet

12.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Netw. 9(8), 1735–1780 (1997)

13.

Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE International Conference on Computer Vision (ICCV) (2013)

14.

Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolutional neural networks induced mser trees. In: European Conference on Computer Vision (ECCV) (2014)

15.

Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. (IJCV) 116(1), 1–20 (2016)CrossRefMathSciNet

16.

Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 512–528. Springer, Heidelberg (2014)

17.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (ACM MM) (2014)

18.

Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition (ICDAR)(2015)

19.

Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras., L.P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)

20.

Mao, J., Li, H., Zhou, W., Yan, S., Tian, Q.: Scale based region growing for scene text detection. In: ACM International Conference on Multimedia (ACM MM) (2013)

21.

Minetto, R., Thome, N., Cord, M., Fabrizio, J., Marcotegui, B.: Snoopertext: a multiresolution system for text detection in complex visual scenes. In: IEEE International Conference on Pattern Recognition (ICIP) (2010)

22.

Neumann, L., Matas, J.: Efficient scene text localization and recognition with local character refinement. In: International Conference on Document Analysis and Recognition (ICDAR) (2015)

23.

Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. In: IEEE Transaction on Pattern Analysis and Machine Intelligence (TPAMI) (2015)

24.

Pan, Y., Hou, X., Liu, C.: Hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. (TIP) 20, 800–813 (2011)CrossRefMathSciNet

25.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NIPS) (2015)

26.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Li, F.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)CrossRefMathSciNet

27.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representation (ICLR) (2015)

28.

Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: IEEE International Conference on Computer Vision (ICCV) (2015)

29.

Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision (ICCV) (2011)

30.

Wolf, C., Jolion, J.: Object count / area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. 8, 280–296 (2006)CrossRef

31.

Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. (TIP) 23(11), 4737–4749 (2014)CrossRefMathSciNet

32.

Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(9), 1930–1937 (2015)CrossRef

33.

Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(4), 970–983 (2014)

34.

Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2015)

35.

Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2016)

Titel: Detecting Text in Natural Image with Connectionist Text Proposal Network
verfasst von: Zhi Tian
Weilin Huang
Tong He
Pan He
Yu Qiao
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2016
Print ISBN: 978-3-319-46483-1

Electronic ISBN: 978-3-319-46484-8

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-46484-8_4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"