Skip to main content
Erschienen in: International Journal of Computer Vision 3/2021

24.10.2020

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

verfasst von: Wei Feng, Fei Yin, Xu-Yao Zhang, Wenhao He, Cheng-Lin Liu

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Existing methods for arbitrary shaped text spotting can be divided into two categories: bottom-up methods detect and recognize local areas of text, and then group them into text lines or words; top-down methods detect text regions of interest, then apply polygon fitting and text recognition to the detected regions. In this paper, we analyze the advantages and disadvantages of these two methods, and propose a novel text spotter by fusing bottom-up and top-down processing. To detect text of arbitrary shapes, we employ a bottom-up detector to describe text with a series of rotated squares, and design a top-down detector to represent the region of interest with a minimum enclosing rotated rectangle. Then the text boundary is determined by fusing the outputs of two detectors. To connect arbitrary shaped text detection and recognition, we propose a differentiable operator named RoISlide, which can extract features for arbitrary text regions from whole image feature maps. Based on the extracted features through RoISlide, a CNN and CTC based text recognizer is introduced to make the framework free from character-level annotations. To improve the robustness against scale variance, we further propose a residual dual scale spotting mechanism, where two spotters work on different feature levels, and the high-level spotter is based on residuals of the low-level spotter. Our method has achieved state-of-the-art performance on four English datasets and one Chinese dataset, including both arbitrary shaped and oriented texts. We also provide abundant ablation experiments to analyze how the key components affect the performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019). Character region awareness for text detection. In Proceedings of IEEE conference on computer vision and pattern recognition. Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019). Character region awareness for text detection. In Proceedings of IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Bissacco, A., Cummins, M., Netzer, Y., & Neven, H. (2013). Photoocr: Reading text in uncontrolled conditions. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 785–792). IEEE. Bissacco, A., Cummins, M., Netzer, Y., & Neven, H. (2013). Photoocr: Reading text in uncontrolled conditions. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 785–792). IEEE.
Zurück zum Zitat Bušta, M., Neumann, L., & Matas, J. (2017). Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In Proceedings of the IEEE international conference on computer vision (pp. 2223–2231). Bušta, M., Neumann, L., & Matas, J. (2017). Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In Proceedings of the IEEE international conference on computer vision (pp. 2223–2231).
Zurück zum Zitat Cheng, Z., Liu, X., Bai, F., Niu, Y., Pu, S., & Zhou, S. (2017). Arbitrarily-oriented text recognition. arXiv preprint arXiv:1711.04226 Cheng, Z., Liu, X., Bai, F., Niu, Y., Pu, S., & Zhou, S. (2017). Arbitrarily-oriented text recognition. arXiv preprint arXiv:​1711.​04226
Zurück zum Zitat Ch’ng, C. K., & Chan, C. S. (2017). Total-text: A comprehensive dataset for scene text detection and recognition. In Proceedings of the international conference on document analysis and recognition (Vol. 1, pp. 935–942). Ch’ng, C. K., & Chan, C. S. (2017). Total-text: A comprehensive dataset for scene text detection and recognition. In Proceedings of the international conference on document analysis and recognition (Vol. 1, pp. 935–942).
Zurück zum Zitat Feng, W., He, W., Yin, F., Zhang, X. Y., & Liu, C. L. (2019). Textdragon: An end-to-end framework for arbitrary shaped text spotting. In Proceedings of the IEEE international conference on computer vision. Feng, W., He, W., Yin, F., Zhang, X. Y., & Liu, C. L. (2019). Textdragon: An end-to-end framework for arbitrary shaped text spotting. In Proceedings of the IEEE international conference on computer vision.
Zurück zum Zitat Gómez, L., & Karatzas, D. (2017). Textproposals: A text-specific selective search algorithm for word spotting in the wild. Pattern Recognition, 70, 60–74.CrossRef Gómez, L., & Karatzas, D. (2017). Textproposals: A text-specific selective search algorithm for word spotting in the wild. Pattern Recognition, 70, 60–74.CrossRef
Zurück zum Zitat Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the international conference on machine learning (pp. 369–376). Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the international conference on machine learning (pp. 369–376).
Zurück zum Zitat Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2315–2324). Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2315–2324).
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017a). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988). He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017a). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988).
Zurück zum Zitat He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., & Li, X. (2017b). Single shot text detector with regional attention. In Proceedings of the IEEE international conference on computer vision (pp. 3066–3074). He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., & Li, X. (2017b). Single shot text detector with regional attention. In Proceedings of the IEEE international conference on computer vision (pp. 3066–3074).
Zurück zum Zitat He, P., Huang, W., Qiao, Y., Loy, C. C., & Tang, X. (2016a). Reading scene text in deep convolutional sequences. In Proceedings of the AAAI conference on artificial intelligence (Vol. 16, pp. 3501–3508). He, P., Huang, W., Qiao, Y., Loy, C. C., & Tang, X. (2016a). Reading scene text in deep convolutional sequences. In Proceedings of the AAAI conference on artificial intelligence (Vol. 16, pp. 3501–3508).
Zurück zum Zitat He, T., Huang, W., Qiao, Y., & Yao, J. (2016b). Text-attentional convolutional neural network for scene text detection. IEEE Transactions on Image Processing, 25(6), 2529–2541.MathSciNetCrossRef He, T., Huang, W., Qiao, Y., & Yao, J. (2016b). Text-attentional convolutional neural network for scene text detection. IEEE Transactions on Image Processing, 25(6), 2529–2541.MathSciNetCrossRef
Zurück zum Zitat He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., & Sun, C. (2018a). An end-to-end textspotter with explicit alignment and attention. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 5020–5029). He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., & Sun, C. (2018a). An end-to-end textspotter with explicit alignment and attention. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 5020–5029).
Zurück zum Zitat He, W., Zhang, X., Yin, F., & Liu, C. (2018b). Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing, 27(11), 5406–5419.MathSciNetCrossRef He, W., Zhang, X., Yin, F., & Liu, C. (2018b). Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Transactions on Image Processing, 27(11), 5406–5419.MathSciNetCrossRef
Zurück zum Zitat He, W., Zhang, X., Yin, F., Luo, Z., Ogier, J., & Liu, C. (2020). Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognition, 98, 107026.CrossRef He, W., Zhang, X., Yin, F., Luo, Z., Ogier, J., & Liu, C. (2020). Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognition, 98, 107026.CrossRef
Zurück zum Zitat He, W., Zhang, X. Y., Yin, F., Liu, & C. L. (2017c). Deep direct regression for multi-oriented scene text detection. In Proceedings of the IEEE international conference on computer vision (pp. 745–753). He, W., Zhang, X. Y., Yin, F., Liu, & C. L. (2017c). Deep direct regression for multi-oriented scene text detection. In Proceedings of the IEEE international conference on computer vision (pp. 745–753).
Zurück zum Zitat Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef
Zurück zum Zitat Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., & Ding, E. (2017). Wordsup: Exploiting word annotations for character based text detection. In Proceedings of the IEEE international conference on computer vision. Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., & Ding, E. (2017). Wordsup: Exploiting word annotations for character based text detection. In Proceedings of the IEEE international conference on computer vision.
Zurück zum Zitat Huang, L., Yang, Y., Deng, Y., & Yu, Y. (2015). Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874. Huang, L., Yang, Y., Deng, Y., & Yu, Y. (2015). Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:​1509.​04874.
Zurück zum Zitat Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:​1502.​03167.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227. Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:​1406.​2227.
Zurück zum Zitat Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1–20.MathSciNetCrossRef Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1–20.MathSciNetCrossRef
Zurück zum Zitat Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Deep features for text spotting. In Proceedings of the European conference on computer vision (pp. 512–528). Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Deep features for text spotting. In Proceedings of the European conference on computer vision (pp. 512–528).
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM international conference on multimedia (pp. 675–678). Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM international conference on multimedia (pp. 675–678).
Zurück zum Zitat Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., et al. (2015). ICDAR 2015 competition on robust reading. In Proceedings of the international conference on document analysis and recognition (pp. 1156–1160). Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., et al. (2015). ICDAR 2015 competition on robust reading. In Proceedings of the international conference on document analysis and recognition (pp. 1156–1160).
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the neural information processing systems (pp. 1097–1105). Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the neural information processing systems (pp. 1097–1105).
Zurück zum Zitat Li, H., Wang, P., & Shen, C. (2017). Towards end-to-end text spotting with convolutional recurrent neural networks. In Proceedings of the IEEE international conference on computer vision (pp. 5238–5246). Li, H., Wang, P., & Shen, C. (2017). Towards end-to-end text spotting with convolutional recurrent neural networks. In Proceedings of the IEEE international conference on computer vision (pp. 5238–5246).
Zurück zum Zitat Liao, M., Lyu, P., He, M., Yao, C., Wu, W., & Bai, X. (2019). Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. arXiv preprint arXiv:1908.08207. Liao, M., Lyu, P., He, M., Yao, C., Wu, W., & Bai, X. (2019). Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. arXiv preprint arXiv:​1908.​08207.
Zurück zum Zitat Liao, M., Shi, B., & Bai, X. (2018). Textboxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 27(8), 3676–3690.MathSciNetCrossRef Liao, M., Shi, B., & Bai, X. (2018). Textboxes++: A single-shot oriented scene text detector. IEEE Transactions on Image Processing, 27(8), 3676–3690.MathSciNetCrossRef
Zurück zum Zitat Liao, M., Shi, B., Bai, X., Wang, X., & Liu, W. (2017). Textboxes: A fast text detector with a single deep neural network. In Proceedings of the AAAI conference on artificial intelligence (pp. 4161–4167). Liao, M., Shi, B., Bai, X., Wang, X., & Liu, W. (2017). Textboxes: A fast text detector with a single deep neural network. In Proceedings of the AAAI conference on artificial intelligence (pp. 4161–4167).
Zurück zum Zitat Liao, M., Zhu, Z., Shi, B., Xia, G., & Bai, X. (2018). Rotation-sensitive regression for oriented scene text detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 5909–5918). Liao, M., Zhu, Z., Shi, B., Xia, G., & Bai, X. (2018). Rotation-sensitive regression for oriented scene text detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 5909–5918).
Zurück zum Zitat Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (p. 4). Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (p. 4).
Zurück zum Zitat Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., & Yan, J. (2018). Fots: Fast oriented text spotting with a unified network. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 5676–5685). Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., & Yan, J. (2018). Fots: Fast oriented text spotting with a unified network. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 5676–5685).
Zurück zum Zitat Liu, Y., & Jin, L. (2017). Deep matching prior network: Toward tighter multi-oriented text detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3454–3461). Liu, Y., & Jin, L. (2017). Deep matching prior network: Toward tighter multi-oriented text detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3454–3461).
Zurück zum Zitat Liu, Y., Jin, L., Zhang, S., & Zhang, S. (2017). Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:1712.02170. Liu, Y., Jin, L., Zhang, S., & Zhang, S. (2017). Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:​1712.​02170.
Zurück zum Zitat Long, S., Ruan, J., Zhang, W., He, X., Wu, W., & Yao, C. (2018). Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European conference on computer vision (pp. 19–35). Long, S., Ruan, J., Zhang, W., He, X., Wu, W., & Yao, C. (2018). Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European conference on computer vision (pp. 19–35).
Zurück zum Zitat Lyu, P., Liao, M., Yao, C., Wu, W., & Bai, X. (2018). Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European conference on computer vision. Lyu, P., Liao, M., Yao, C., Wu, W., & Bai, X. (2018). Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European conference on computer vision.
Zurück zum Zitat Lyu, P., Yao, C., Wu, W., Yan, S., & Bai, X. (2018). Multi-oriented scene text detection via corner localization and region segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 7553–7563). Lyu, P., Yao, C., Wu, W., Yan, S., & Bai, X. (2018). Multi-oriented scene text detection via corner localization and region segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 7553–7563).
Zurück zum Zitat Mishra, A., Alahari, K., & Jawahar, C. (2012). Top-down and bottom-up cues for scene text recognition. In Proceedings of IEEE conference on computer vision and pattern recognition. Mishra, A., Alahari, K., & Jawahar, C. (2012). Top-down and bottom-up cues for scene text recognition. In Proceedings of IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Neumann, L., & Matas, J. (2016). Real-time lexicon-free scene text localization and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1872–1885.CrossRef Neumann, L., & Matas, J. (2016). Real-time lexicon-free scene text localization and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1872–1885.CrossRef
Zurück zum Zitat Patel, Y., Bušta, M., & Matas, J. (2018). E2e-mlt—An unconstrained end-to-end method for multi-language scene text. arXiv preprint arXiv:1801.09919. Patel, Y., Bušta, M., & Matas, J. (2018). E2e-mlt—An unconstrained end-to-end method for multi-language scene text. arXiv preprint arXiv:​1801.​09919.
Zurück zum Zitat Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the neural information processing systems (pp. 91–99). Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the neural information processing systems (pp. 91–99).
Zurück zum Zitat Shi, B., Bai, X., & Belongie, S. (2017). Detecting oriented text in natural images by linking segments. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2550–2558). Shi, B., Bai, X., & Belongie, S. (2017). Detecting oriented text in natural images by linking segments. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2550–2558).
Zurück zum Zitat Shi, B., Bai, X., & Yao, C. (2017). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304.CrossRef Shi, B., Bai, X., & Yao, C. (2017). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304.CrossRef
Zurück zum Zitat Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., & Bai, X. (2019). ASTER: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2035–2048.CrossRef Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., & Bai, X. (2019). ASTER: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2035–2048.CrossRef
Zurück zum Zitat Shrivastava, A., Gupta, A., & Girshick, R. (2016). Training region-based object detectors with online hard example mining. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 761–769). Shrivastava, A., Gupta, A., & Girshick, R. (2016). Training region-based object detectors with online hard example mining. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 761–769).
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of the international conference on learning representations. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of the international conference on learning representations.
Zurück zum Zitat Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., & Lim Tan, C. (2015). Text flow: A unified text detection system in natural scene images. In Proceedings of the IEEE international conference on computer vision (pp. 4651–4659). Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., & Lim Tan, C. (2015). Text flow: A unified text detection system in natural scene images. In Proceedings of the IEEE international conference on computer vision (pp. 4651–4659).
Zurück zum Zitat Veit, A., Matera, T., Neumann, L., Matas, J., & Belongie, S. J. (2016). Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140. Veit, A., Matera, T., Neumann, L., Matas, J., & Belongie, S. J. (2016). Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:​1601.​07140.
Zurück zum Zitat Wang, K., Babenko, B., & Belongie, S. (2011). End-to-end scene text recognition. In Proceedings of the IEEE international conference on computer vision (pp. 1457–1464). Wang, K., Babenko, B., & Belongie, S. (2011). End-to-end scene text recognition. In Proceedings of the IEEE international conference on computer vision (pp. 1457–1464).
Zurück zum Zitat Wang, T., Wu, D. J., Coates, A., & Ng, A. Y. (2012). End-to-end text recognition with convolutional neural networks. In Proceedings of IEEE conference on pattern recognition (pp. 3304–3308). Wang, T., Wu, D. J., Coates, A., & Ng, A. Y. (2012). End-to-end text recognition with convolutional neural networks. In Proceedings of IEEE conference on pattern recognition (pp. 3304–3308).
Zurück zum Zitat Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., et al. (2019). Shape robust text detection with progressive scale expansion network. In Proceedings of IEEE conference on computer vision and pattern recognition. Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., et al. (2019). Shape robust text detection with progressive scale expansion network. In Proceedings of IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Wang, X., Jiang, Y., Luo, Z., Liu, C. L., Choi, H., & Kim, S. (2019). Arbitrary shape scene text detection with adaptive text region representation. In Proceedings of IEEE conference on computer vision and pattern recognition. Wang, X., Jiang, Y., Luo, Z., Liu, C. L., Choi, H., & Kim, S. (2019). Arbitrary shape scene text detection with adaptive text region representation. In Proceedings of IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Yang, X., He, D., Zhou, Z., Kifer, D., Giles, & C. L. (2017). Learning to read irregular text with attention mechanisms. In Proceedings of international joint conference on artificial intelligence (pp. 3280–3286). Yang, X., He, D., Zhou, Z., Kifer, D., Giles, & C. L. (2017). Learning to read irregular text with attention mechanisms. In Proceedings of international joint conference on artificial intelligence (pp. 3280–3286).
Zurück zum Zitat Ye, Q., & Doermann, D. S. (2015). Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(7), 1480–1500.CrossRef Ye, Q., & Doermann, D. S. (2015). Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(7), 1480–1500.CrossRef
Zurück zum Zitat Yin, F., Wu, Y. C., Zhang, X. Y., & Liu, C. L. (2017). Scene text recognition with sliding convolutional character models. arXiv preprint arXiv:1709.01727. Yin, F., Wu, Y. C., Zhang, X. Y., & Liu, C. L. (2017). Scene text recognition with sliding convolutional character models. arXiv preprint arXiv:​1709.​01727.
Zurück zum Zitat Yu, J., Jiang, Y., Wang, Z., Cao, Z., & Huang, T. S. (2016). Unitbox: An advanced object detection network. In Proceedings of the ACM international conference on multimedia (pp. 516–520). Yu, J., Jiang, Y., Wang, Z., Cao, Z., & Huang, T. S. (2016). Unitbox: An advanced object detection network. In Proceedings of the ACM international conference on multimedia (pp. 516–520).
Zurück zum Zitat Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., et al. (2017). East: An efficient and accurate scene text detector. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 5551–5560). Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., et al. (2017). East: An efficient and accurate scene text detector. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 5551–5560).
Zurück zum Zitat Zhu, Y., Yao, C., & Bai, X. (2016). Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 10(1), 19–36.CrossRef Zhu, Y., Yao, C., & Bai, X. (2016). Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 10(1), 19–36.CrossRef
Metadaten
Titel
Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing
verfasst von
Wei Feng
Fei Yin
Xu-Yao Zhang
Wenhao He
Cheng-Lin Liu
Publikationsdatum
24.10.2020
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-020-01388-x

Weitere Artikel der Ausgabe 3/2021

International Journal of Computer Vision 3/2021 Zur Ausgabe