Top

International Journal on Document Analysis and Recognition (IJDAR)

Published in:

10-08-2020 | Original Paper

DetectGAN: GAN-based text detector for camera-captured document images

Authors: Jinyuan Zhao, Yanna Wang, Baihua Xiao, Cunzhao Shi, Fuxi Jia, Chunheng Wang

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 4/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Nowadays, with the development of electronic devices, more and more attention has been paid to camera-based text processing. Different from scene image, the recognition system of document image needs to sort out the recognition results and store them in the structured document for the subsequent data processing. However, in document images, the fusion of text lines largely depends on their semantic information rather than just the distance between the characters, which causes the problem of learning confusion in training. At the same time, for multi-directional printed characters in document images, it is necessary to use additional directional information to guide subsequent recognition tasks. In order to avoid learning confusion and get recognition-friendly detection results, we propose a character-level text detection framework, DetectGAN, based on the conditional generative adversarial networks (abbreviation cGAN used in the text). In the proposed framework, position regression and NMS process are removed, and the problem of text detection is directly transformed into an image-to-image generation problem. Experimental results show that our method has an excellent effect on text detection of camera-captured document images and outperforms the classical and state-of-the-art algorithms.

previous article Automatic room information retrieval and classification from floor plan using linear regression model

next article Optical character recognition with neural networks and post-correction with finite state methods

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Chen, F., Carter, S., Denoue, L. et al.: SmartDCap: semi-automatic capture of higher quality document images from a smartphone. In: Proceedings of the 2013 International Conference on Intelligent User Interfaces, pp. 287–296. ACM (2013)

Kumar, J., Bala, R., Ding, H., Emmett, P.: Mobile video capture of multi-page documents. In: Conference on Computer Vision and Pattern Recognition Workshops, p. 3540. IEEE (2013)

Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. CoRR, arXiv:1606.09002 (2016)

Lyu, P., Liao, M., Yao, C., et al.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018)

Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)

Ohta, K.: Character segmentation of address reading/letter sorting machine for the ministry of posts and telecommunications of Japan. NEC Res. Dev. 34(2), 248256 (1993)

Lee, S.-W., Lee, D.-J., Park, H.-S.: A new methodology for grayscale character segmentation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 18(10), 10451050 (1996)

Shivakumara, P., Bhowmick, S., Su, B., Tan, C.L., Pal, U.: A new gradient based character segmentation method for video text recognition. In: International Conference on Document Analysis and Recognition (ICDAR), p. 126130. IEEE (2011)

Taxt, T., Flynn, P.J., Jain, A.K.: Segmentation of document images. IEEE Trans. Pattern Anal. Mach. Intell. 11(12), 1322–1329 (1989)CrossRef

10.

Busta, M., Neumann, L., Fastext, Matas J.: Efficient unconstrained scene text detector. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1206–1214 (2015)

11.

Koo, H.I.: Text-line detection in camera-captured document images using the state estimation of connected components. IEEE Trans. Image Process. 25(11), 5358–5368 (2016)MathSciNetCrossRef

12.

Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 19(1), 6266 (1979)

13.

Bernsen, J.: Dynamic thresholding of grey-level images. In: ICPR, p. 12511255. IEEE (1986)

14.

Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Copenhagen (1985)

15.

Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225236 (2000)CrossRef

16.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. In: Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, 11–14 Oct 2016, Proceedings. Part I, p. 2137 (2016)

17.

Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 779788 (2016)

18.

Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection. CoRR, arXiv:1509.04874 (2015)

19.

Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 Feb 2017, San Francisco, California, USA, p. 41614167 (2017)

20.

Liao, M., Shi, B., Xiang, B.: TextBoxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)MathSciNetCrossRef

21.

Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2999–3007 (2017)

22.

Liao, M., Zhu, Z., Shi, B., et al.: Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918 (2018)

23.

Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)

24.

25.

Li, X., Wang, W., Hou, W., Liu, R. Z., Lu, T., Yang, J.: Shape robust text detection with progressive scale expansion network (2018)

26.

Lyu, P., Yao., C, Wu, W., et al.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7553–7563 (2018)

27.

Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 23152324 (2016)

28.

Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 41594167 (2016)

29.

Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)

30.

Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, pp. 1278–1286 (2014)

31.

Oord, A.V.D., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756 (2016)

32.

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 9351, pp. 234–241 (2015)

33.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

34.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

35.

Zhang, X.Y., Bengio, Y., Liu, C.L.: Online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. Pattern Recogn. 61((Complete)), 348–360 (2017)CrossRef

36.

Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: International Conference on Neural Information Processing Systems, pp. 2672–2680 (2014)

37.

Denton, E., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: International Conference on Neural Information Processing Systems, pp. 1486–1494 (2015)

38.

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on International Conference on. Machine Learning, pp. 1060–1069 (2016)

Title: DetectGAN: GAN-based text detector for camera-captured document images
Authors: Jinyuan Zhao
Yanna Wang
Baihua Xiao
Cunzhao Shi
Fuxi Jia
Chunheng Wang
Publication date: 10-08-2020
Publisher: Springer Berlin Heidelberg
Published in: International Journal on Document Analysis and Recognition (IJDAR) / Issue 4/2020
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-020-00358-w

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner