Top

International Journal on Document Analysis and Recognition (IJDAR)

Published in:

29-03-2022 | Original Paper

Scene text detection via decoupled feature pyramid networks

Authors: Min Liang, Jie-Bo Hou, Xiaobin Zhu, Chun Yang, Jingyan Qin

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 3/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Detecting arbitrary shape scene texts is challenging mainly due to the varied aspect ratios, curves, and scales. In this paper, we propose a novel arbitrary shape scene text detection method via Decoupled Feature Pyramid Networks (DFPN) and regression-based linking (RegLink). Our innovative DFPN decouples the width and height of feature maps generated by FPN to enhance the discriminability of features for varied aspect ratios. As quadrilateral regression results cannot directly represent curve text, we propose a simple yet effective RegLink to link pixels into text instances because pixels in the same curve text have an identical target quadrilateral. Thus, our RegLink can extend the ability of the rotated rectangles text detector for detecting curve text. Besides, we propose a Feature Scale Module to enhance the robustness of features for varied scales. In this way, our method can effectively detect scene texts in arbitrary shapes. Meanwhile, experimental results on three publicly available challenging datasets demonstrate the effectiveness of our method. The code and model of our method is available at https://github.com/lmplayer/DFPN-master.

next article CarveNet: a channel-wise attention-based network for irregular scene text recognition

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019)

Chen, J., Lian, Z.: Textpolar: irregular scene text detection using polar representation. Int. J. Doc. Anal. Recognit. 24, 315–323 (2021)CrossRef

Ch’ng, C., Chan, C.S., Liu, C.: Total-text: toward orientation robustness in scene text detection. Int. J. Doc. Anal. Recognit. 23(1), 31–52 (2020)

Chng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: ICDAR, pp. 935–942 (2017)

Dai, Y., Huang, Z., Gao, Y., Xu, Y., Chen, K., Guo, J., Qiu, W.: Fused text segmentation networks for multi-oriented scene text detection. In: ICPR, pp. 3604–3609 (2018)

Deng, D., Liu, H., Li, X., Cai, D. PixelLink: detecting scene text via instance segmentation. In: AAAI, pp. 6773–6780 (2018)

Feng, W., He, W., Yin, F., Zhang, X.Y., Liu, C.L.: Textdragon: an end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9075–9084 (2019)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)

10.

He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: ICCV, pp. 745–753 (2017)

11.

Hou, J., Zhu, X., Liu, C., Sheng, K., Wu, L., Wang, H., Yin, X.: HAM: hidden anchor mechanism for scene text detection. IEEE Trans. Image Process. 29, 7904–7916 (2020)CrossRef

12.

Karatzas, D., Shafait, F., Uchida, S., Iwamura, M.B, et al.: ICDAR 2013 robust reading competition. In: ICDAR, IEEE, United States, vol. 1, pp. 1484–1493 (2013). https://doi.org/10.1109/ICDAR.2013.221

13.

Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S.K., et al.: ADB ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)

14.

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)

15.

Kuang, Z., Sun, H., Li, Z., Yue, X., Lin, T.H., Chen, J., Wei, H., Zhu, Y., Gao, T., Zhang, W., Chen, K., Zhang, W., Lin, D.: MMOCR: a comprehensive toolbox for text detection, recognition and understanding. In: ACM MM, pp. 3791–3794 (2021)

16.

Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)MathSciNetCrossRef

17.

Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask textspotter v3: segmentation proposal network for robust scene text spotting. In: ECCV, pp. 706–722 (2020)

18.

Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI, pp. 11474–11481 (2020)

19.

Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 532–548 (2021)CrossRef

20.

Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)

21.

Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: ECCV, pp. 404–419 (2018)

22.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. In: ECCV, pp. 21–37 (2016)

23.

Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018)

24.

Liu, X., Meng, G., Pan, C.: Scene text detection and recognition with advances in deep learning: a survey. Int. J. Doc. Anal. Recognit. 22(2), 143–162 (2019)CrossRef

25.

Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: real-time scene text spotting with adaptive bezier-curve network. In: CVPR, pp. 9806–9815 (2020)

26.

Liu, Z., Lin, G., Yang, S., Liu, F., Lin, W., Goh, W.L.: Towards robust curve text detection with conditional spatial expansion. In: CVPR, pp. 7269–7278 (2019)

27.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)

28.

Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes. In: ECCV, pp. 19–35 (2018)

29.

Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: ECCV, pp. 71–88 (2018)

30.

Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: (CVPR), pp. 7553–7563 (2018)

31.

Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 20(11), 3111–3122 (2018). https://doi.org/10.1109/TMM.2018.2818020CrossRef

32.

Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., et al.: ZL ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification—RRC-MLT. In: ICDAR, pp. 1454–1459 (2017)

33.

Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR, pp. 6517–6525 (2017)

34.

Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)

35.

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRef

36.

Shi, B., Bai, X., Belongie, S.J.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 3482–3490 (2017)

37.

Shrivastava, A., Gupta, A., Girshick, R.B.: Training region-based object detectors with online hard example mining. In: CVPR, pp. 761–769 (2016)

38.

Tian, S., Yin, X., Su, Y., Hao, H.: A unified framework for tracking based text detection and recognition from web videos. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 542–554 (2018)CrossRef

39.

Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: ECCV, pp. 56–72 (2016)

40.

Tian, Z., Shu, M., Lyu, P., Li, R., Zhou, C., Shen, X., Jia, J.: Learning shape-aware embedding for scene text detection. In: CVPR, pp. 4234–4243 (2019)

41.

Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: toward arbitrary-shaped text spotting. In: AAAI, pp. 12160–12167 (2020)

42.

Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: CVPR, pp. 9336–9345 (2019)

43.

Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: ICCV, pp. 8439–8448 (2019)

44.

Wang, X., Jiang, Y., Luo, Z., Liu, C.L., Choi, H., Kim, S.: Arbitrary shape scene text detection with adaptive text region representation. In: CVPR, pp. 6449–6458 (2019)

45.

Wang, Y., Xie, H., Zha, Z.J., Xing, M., Fu, Z., Zhang, Y.: Contournet: taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11750–11759 (2020)

46.

Xie, L., Liu, Y., Jin, L., Xie, Z.: Derpn: taking a further step toward more general object detection. In: AAAI, pp. 9046–9053 (2019)

47.

Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)MathSciNetCrossRef

48.

Xue, C., Lu, S., Zhan, F.: Accurate scene text detection through border semantics awareness and bootstrapping. In: ECCV, pp. 370–387 (2018)

49.

Yang, C., Yin, X., Pei, W., Tian, S., Zuo, Z., Zhu, C., Yan, J.: Tracking based multi-orientation scene text detection: a unified framework with dynamic programming. IEEE Trans. Image Process. 26(7), 3235–3248 (2017)MathSciNetCrossRef

50.

Yang, Q., Cheng, M., Zhou, W., Chen, Y., Qiu, M., Lin, W.: Inceptext: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. In: IJCAI, pp. 1071–1077 (2018)

51.

Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: CVPR, pp. 1083–1090 (2012)

52.

Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. 23(11), 4737–4749 (2014)MathSciNetCrossRef

53.

Ye, Q., Doermann, D.S.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRef

54.

Yin, X., Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)CrossRef

55.

Yin, X., Pei, W., Zhang, J., Hao, H.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1930–1937 (2015)CrossRef

56.

Yin, X., Zuo, Z., Tian, S., Liu, C.: Text detection, tracking and recognition in video: a comprehensive survey. IEEE Trans. Image Process. 25(6), 2752–2773 (2016)MathSciNetCrossRef

57.

Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)

58.

Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.S.: Unitbox: an advanced object detection network. In: ACM MM, pp. 516–520 (2016)

59.

Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., Ding, X.: Look more than once: an accurate detector for text of arbitrary shapes. In: CVPR, pp. 10552–10561 (2019)

60.

Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017). https://doi.org/10.1109/CVPR.2017.283

61.

Zhu, X., Li, Z., Li, X., Li, S., Dai, F.: Attention-aware perceptual enhancement nets for low-resolution image classification. Inf. Sci. 515, 233–247 (2020)CrossRef

62.

Zhu, Y., Du, J.: Textmountain: accurate scene text detection via instance segmentation. Pattern Recogn. 110, 107336 (2021)CrossRef

63.

Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: CVPR, pp. 3123–3131 (2021)

Title: Scene text detection via decoupled feature pyramid networks
Authors: Min Liang
Jie-Bo Hou
Xiaobin Zhu
Chun Yang
Jingyan Qin
Publication date: 29-03-2022
Publisher: Springer Berlin Heidelberg
Published in: International Journal on Document Analysis and Recognition (IJDAR) / Issue 3/2022
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-022-00397-5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2022

Correction to: Radical-based extract and recognition networks for Oracle character recognition

CarveNet: a channel-wise attention-based network for irregular scene text recognition

Boosting modern and historical handwritten text recognition with deformable convolutions

Fusion of visual representations for multimodal information extraction from unstructured transactional documents

Retraction Note: Offline scripting-free author identification based on speeded-up robust features

Radical-based extract and recognition networks for Oracle character recognition

Premium Partner