Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 4/2020

10-08-2020 | Original Paper

DetectGAN: GAN-based text detector for camera-captured document images

Authors: Jinyuan Zhao, Yanna Wang, Baihua Xiao, Cunzhao Shi, Fuxi Jia, Chunheng Wang

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 4/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Nowadays, with the development of electronic devices, more and more attention has been paid to camera-based text processing. Different from scene image, the recognition system of document image needs to sort out the recognition results and store them in the structured document for the subsequent data processing. However, in document images, the fusion of text lines largely depends on their semantic information rather than just the distance between the characters, which causes the problem of learning confusion in training. At the same time, for multi-directional printed characters in document images, it is necessary to use additional directional information to guide subsequent recognition tasks. In order to avoid learning confusion and get recognition-friendly detection results, we propose a character-level text detection framework, DetectGAN, based on the conditional generative adversarial networks (abbreviation cGAN used in the text). In the proposed framework, position regression and NMS process are removed, and the problem of text detection is directly transformed into an image-to-image generation problem. Experimental results show that our method has an excellent effect on text detection of camera-captured document images and outperforms the classical and state-of-the-art algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Chen, F., Carter, S., Denoue, L. et al.: SmartDCap: semi-automatic capture of higher quality document images from a smartphone. In: Proceedings of the 2013 International Conference on Intelligent User Interfaces, pp. 287–296. ACM (2013) Chen, F., Carter, S., Denoue, L. et al.: SmartDCap: semi-automatic capture of higher quality document images from a smartphone. In: Proceedings of the 2013 International Conference on Intelligent User Interfaces, pp. 287–296. ACM (2013)
2.
go back to reference Kumar, J., Bala, R., Ding, H., Emmett, P.: Mobile video capture of multi-page documents. In: Conference on Computer Vision and Pattern Recognition Workshops, p. 3540. IEEE (2013) Kumar, J., Bala, R., Ding, H., Emmett, P.: Mobile video capture of multi-page documents. In: Conference on Computer Vision and Pattern Recognition Workshops, p. 3540. IEEE (2013)
3.
go back to reference Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. CoRR, arXiv:1606.09002 (2016) Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. CoRR, arXiv:​1606.​09002 (2016)
4.
go back to reference Lyu, P., Liao, M., Yao, C., et al.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018) Lyu, P., Liao, M., Yao, C., et al.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 67–83 (2018)
5.
go back to reference Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019) Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)
6.
go back to reference Ohta, K.: Character segmentation of address reading/letter sorting machine for the ministry of posts and telecommunications of Japan. NEC Res. Dev. 34(2), 248256 (1993) Ohta, K.: Character segmentation of address reading/letter sorting machine for the ministry of posts and telecommunications of Japan. NEC Res. Dev. 34(2), 248256 (1993)
7.
go back to reference Lee, S.-W., Lee, D.-J., Park, H.-S.: A new methodology for grayscale character segmentation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 18(10), 10451050 (1996) Lee, S.-W., Lee, D.-J., Park, H.-S.: A new methodology for grayscale character segmentation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 18(10), 10451050 (1996)
8.
go back to reference Shivakumara, P., Bhowmick, S., Su, B., Tan, C.L., Pal, U.: A new gradient based character segmentation method for video text recognition. In: International Conference on Document Analysis and Recognition (ICDAR), p. 126130. IEEE (2011) Shivakumara, P., Bhowmick, S., Su, B., Tan, C.L., Pal, U.: A new gradient based character segmentation method for video text recognition. In: International Conference on Document Analysis and Recognition (ICDAR), p. 126130. IEEE (2011)
9.
go back to reference Taxt, T., Flynn, P.J., Jain, A.K.: Segmentation of document images. IEEE Trans. Pattern Anal. Mach. Intell. 11(12), 1322–1329 (1989)CrossRef Taxt, T., Flynn, P.J., Jain, A.K.: Segmentation of document images. IEEE Trans. Pattern Anal. Mach. Intell. 11(12), 1322–1329 (1989)CrossRef
10.
go back to reference Busta, M., Neumann, L., Fastext, Matas J.: Efficient unconstrained scene text detector. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1206–1214 (2015) Busta, M., Neumann, L., Fastext, Matas J.: Efficient unconstrained scene text detector. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1206–1214 (2015)
11.
go back to reference Koo, H.I.: Text-line detection in camera-captured document images using the state estimation of connected components. IEEE Trans. Image Process. 25(11), 5358–5368 (2016)MathSciNetCrossRef Koo, H.I.: Text-line detection in camera-captured document images using the state estimation of connected components. IEEE Trans. Image Process. 25(11), 5358–5368 (2016)MathSciNetCrossRef
12.
go back to reference Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 19(1), 6266 (1979) Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 19(1), 6266 (1979)
13.
go back to reference Bernsen, J.: Dynamic thresholding of grey-level images. In: ICPR, p. 12511255. IEEE (1986) Bernsen, J.: Dynamic thresholding of grey-level images. In: ICPR, p. 12511255. IEEE (1986)
14.
go back to reference Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Copenhagen (1985) Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Copenhagen (1985)
15.
go back to reference Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225236 (2000)CrossRef Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225236 (2000)CrossRef
16.
go back to reference Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. In: Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, 11–14 Oct 2016, Proceedings. Part I, p. 2137 (2016) Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. In: Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, 11–14 Oct 2016, Proceedings. Part I, p. 2137 (2016)
17.
go back to reference Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 779788 (2016) Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 779788 (2016)
18.
go back to reference Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection. CoRR, arXiv:1509.04874 (2015) Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: unifying landmark localization with end to end object detection. CoRR, arXiv:​1509.​04874 (2015)
19.
go back to reference Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 Feb 2017, San Francisco, California, USA, p. 41614167 (2017) Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4–9 Feb 2017, San Francisco, California, USA, p. 41614167 (2017)
20.
go back to reference Liao, M., Shi, B., Xiang, B.: TextBoxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)MathSciNetCrossRef Liao, M., Shi, B., Xiang, B.: TextBoxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)MathSciNetCrossRef
21.
go back to reference Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2999–3007 (2017) Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2999–3007 (2017)
22.
go back to reference Liao, M., Zhu, Z., Shi, B., et al.: Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918 (2018) Liao, M., Zhu, Z., Shi, B., et al.: Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918 (2018)
23.
go back to reference Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
24.
go back to reference Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
25.
go back to reference Li, X., Wang, W., Hou, W., Liu, R. Z., Lu, T., Yang, J.: Shape robust text detection with progressive scale expansion network (2018) Li, X., Wang, W., Hou, W., Liu, R. Z., Lu, T., Yang, J.: Shape robust text detection with progressive scale expansion network (2018)
26.
go back to reference Lyu, P., Yao., C, Wu, W., et al.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7553–7563 (2018) Lyu, P., Yao., C, Wu, W., et al.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7553–7563 (2018)
27.
go back to reference Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 23152324 (2016) Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 23152324 (2016)
28.
go back to reference Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 41594167 (2016) Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, p. 41594167 (2016)
30.
go back to reference Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, pp. 1278–1286 (2014) Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, pp. 1278–1286 (2014)
31.
go back to reference Oord, A.V.D., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756 (2016) Oord, A.V.D., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756 (2016)
32.
go back to reference Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 9351, pp. 234–241 (2015) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 9351, pp. 234–241 (2015)
33.
go back to reference Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
34.
go back to reference Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
35.
go back to reference Zhang, X.Y., Bengio, Y., Liu, C.L.: Online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. Pattern Recogn. 61((Complete)), 348–360 (2017)CrossRef Zhang, X.Y., Bengio, Y., Liu, C.L.: Online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. Pattern Recogn. 61((Complete)), 348–360 (2017)CrossRef
36.
go back to reference Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: International Conference on Neural Information Processing Systems, pp. 2672–2680 (2014) Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: International Conference on Neural Information Processing Systems, pp. 2672–2680 (2014)
37.
go back to reference Denton, E., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: International Conference on Neural Information Processing Systems, pp. 1486–1494 (2015) Denton, E., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: International Conference on Neural Information Processing Systems, pp. 1486–1494 (2015)
38.
go back to reference Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on International Conference on. Machine Learning, pp. 1060–1069 (2016) Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on International Conference on. Machine Learning, pp. 1060–1069 (2016)
Metadata
Title
DetectGAN: GAN-based text detector for camera-captured document images
Authors
Jinyuan Zhao
Yanna Wang
Baihua Xiao
Cunzhao Shi
Fuxi Jia
Chunheng Wang
Publication date
10-08-2020
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 4/2020
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-020-00358-w

Premium Partner