Skip to main content

2016 | OriginalPaper | Buchkapitel

Detecting Text in Natural Image with Connectionist Text Proposal Network

verfasst von : Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy. The sequential proposals are naturally connected by a recurrent neural network, which is seamlessly incorporated into the convolutional network, resulting in an end-to-end trainable model. This allows the CTPN to explore rich context information of image, making it powerful to detect extremely ambiguous text. The CTPN works reliably on multi-scale and multi-language text without further post-processing, departing from previous bottom-up methods requiring multi-step post filtering. It achieves 0.88 and 0.61 F-measure on the ICDAR 2013 and 2015 benchmarks, surpassing recent results [8, 35] by a large margin. The CTPN is computationally efficient with 0.14 s/image, by using the very deep VGG16 model [27]. Online demo is available: http://​textdet.​com/​.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Busta, M., Neumann, L., Matas, J.: FasText: efficient unconstrained scene text detector. In: IEEE International Conference on Computer Vision (ICCV) (2015) Busta, M., Neumann, L., Matas, J.: FasText: efficient unconstrained scene text detector. In: IEEE International Conference on Computer Vision (ICCV) (2015)
2.
Zurück zum Zitat Cheng, M., Zhang, Z., Lin, W., Torr, P.: BING: binarized normed gradients for objectness estimation at 300 fps. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2014) Cheng, M., Zhang, Z., Lin, W., Torr, P.: BING: binarized normed gradients for objectness estimation at 300 fps. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2014)
3.
Zurück zum Zitat Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2010) Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2010)
4.
Zurück zum Zitat Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput.Vis. (IJCV) 88(2), 303–338 (2010)CrossRef Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput.Vis. (IJCV) 88(2), 303–338 (2010)CrossRef
5.
Zurück zum Zitat Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV)(2015) Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV)(2015)
6.
Zurück zum Zitat Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2014)
7.
Zurück zum Zitat Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)CrossRef Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)CrossRef
8.
Zurück zum Zitat Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016) Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
9.
Zurück zum Zitat He, P., Huang, W., Qiao, Y., Loy, C.C., Tang, X.: Reading scene text in deep convolutional sequences. In: The 30th AAAI Conference on Artificial Intelligence (AAAI-16) (2016) He, P., Huang, W., Qiao, Y., Loy, C.C., Tang, X.: Reading scene text in deep convolutional sequences. In: The 30th AAAI Conference on Artificial Intelligence (AAAI-16) (2016)
10.
Zurück zum Zitat He, T., Huang, W., Qiao, Y., Yao, J.: Accurate text localization in natural image with cascaded convolutional text network (2016). arXiv:1603.09423 He, T., Huang, W., Qiao, Y., Yao, J.: Accurate text localization in natural image with cascaded convolutional text network (2016). arXiv:​1603.​09423
11.
Zurück zum Zitat He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural networks for scene text detection. IEEE Trans. Image Processing (TIP) 25, 2529–2541 (2016)CrossRefMathSciNet He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural networks for scene text detection. IEEE Trans. Image Processing (TIP) 25, 2529–2541 (2016)CrossRefMathSciNet
12.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Netw. 9(8), 1735–1780 (1997) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Netw. 9(8), 1735–1780 (1997)
13.
Zurück zum Zitat Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE International Conference on Computer Vision (ICCV) (2013) Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE International Conference on Computer Vision (ICCV) (2013)
14.
Zurück zum Zitat Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolutional neural networks induced mser trees. In: European Conference on Computer Vision (ECCV) (2014) Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolutional neural networks induced mser trees. In: European Conference on Computer Vision (ECCV) (2014)
15.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. (IJCV) 116(1), 1–20 (2016)CrossRefMathSciNet Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. (IJCV) 116(1), 1–20 (2016)CrossRefMathSciNet
16.
Zurück zum Zitat Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 512–528. Springer, Heidelberg (2014) Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 512–528. Springer, Heidelberg (2014)
17.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (ACM MM) (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (ACM MM) (2014)
18.
Zurück zum Zitat Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition (ICDAR)(2015) Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition (ICDAR)(2015)
19.
Zurück zum Zitat Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras., L.P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition (ICDAR) (2013) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras., L.P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)
20.
Zurück zum Zitat Mao, J., Li, H., Zhou, W., Yan, S., Tian, Q.: Scale based region growing for scene text detection. In: ACM International Conference on Multimedia (ACM MM) (2013) Mao, J., Li, H., Zhou, W., Yan, S., Tian, Q.: Scale based region growing for scene text detection. In: ACM International Conference on Multimedia (ACM MM) (2013)
21.
Zurück zum Zitat Minetto, R., Thome, N., Cord, M., Fabrizio, J., Marcotegui, B.: Snoopertext: a multiresolution system for text detection in complex visual scenes. In: IEEE International Conference on Pattern Recognition (ICIP) (2010) Minetto, R., Thome, N., Cord, M., Fabrizio, J., Marcotegui, B.: Snoopertext: a multiresolution system for text detection in complex visual scenes. In: IEEE International Conference on Pattern Recognition (ICIP) (2010)
22.
Zurück zum Zitat Neumann, L., Matas, J.: Efficient scene text localization and recognition with local character refinement. In: International Conference on Document Analysis and Recognition (ICDAR) (2015) Neumann, L., Matas, J.: Efficient scene text localization and recognition with local character refinement. In: International Conference on Document Analysis and Recognition (ICDAR) (2015)
23.
Zurück zum Zitat Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. In: IEEE Transaction on Pattern Analysis and Machine Intelligence (TPAMI) (2015) Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. In: IEEE Transaction on Pattern Analysis and Machine Intelligence (TPAMI) (2015)
24.
Zurück zum Zitat Pan, Y., Hou, X., Liu, C.: Hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. (TIP) 20, 800–813 (2011)CrossRefMathSciNet Pan, Y., Hou, X., Liu, C.: Hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. (TIP) 20, 800–813 (2011)CrossRefMathSciNet
25.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NIPS) (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NIPS) (2015)
26.
Zurück zum Zitat Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Li, F.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)CrossRefMathSciNet Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Li, F.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)CrossRefMathSciNet
27.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representation (ICLR) (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representation (ICLR) (2015)
28.
Zurück zum Zitat Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: IEEE International Conference on Computer Vision (ICCV) (2015) Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: IEEE International Conference on Computer Vision (ICCV) (2015)
29.
Zurück zum Zitat Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision (ICCV) (2011) Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision (ICCV) (2011)
30.
Zurück zum Zitat Wolf, C., Jolion, J.: Object count / area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. 8, 280–296 (2006)CrossRef Wolf, C., Jolion, J.: Object count / area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. 8, 280–296 (2006)CrossRef
31.
Zurück zum Zitat Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. (TIP) 23(11), 4737–4749 (2014)CrossRefMathSciNet Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. (TIP) 23(11), 4737–4749 (2014)CrossRefMathSciNet
32.
Zurück zum Zitat Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(9), 1930–1937 (2015)CrossRef Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(9), 1930–1937 (2015)CrossRef
33.
Zurück zum Zitat Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(4), 970–983 (2014) Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(4), 970–983 (2014)
34.
Zurück zum Zitat Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2015) Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2015)
35.
Zurück zum Zitat Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2016) Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2016)
Metadaten
Titel
Detecting Text in Natural Image with Connectionist Text Proposal Network
verfasst von
Zhi Tian
Weilin Huang
Tong He
Pan He
Yu Qiao
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46484-8_4