Top

International Journal on Document Analysis and Recognition (IJDAR)

Published in:

08-08-2019 | Special Issue Paper

Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement

Authors: Anna Zhu, Chen Zhang, Zhi Li, Shengwu Xiong

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 3/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Document localization is a promising step for document-based optical character recognition. This task gains difficulty when documents are located in complex natural scene images. In this paper, we propose a coarse-to-fine document localization approach to detect the four corner points of the document in natural scene images. In the first stage, the four corners are roughly predicted through a deep neural networks-based Joint Corner Detector (JCD) with an attention mechanism, which roughly localize the document region via an attentional map. As a key to produce accurate inference of corners, the JCD module suppresses the interference from background in convolutional features substantially. In the second stage, a corner-specific refiner module is designed to refine the previously predicted corners. Considering the different characteristics of the four document corners, the patches cropped around the predicted corners are input into four different corner-specified CNN models, to search the accurate corner locations recursively. Three datasets (ICDAR 2015 SmartDoc competition 1 dataset, SEECS-NUSF dataset and a self-collected dataset) are used to evaluate the performance of our method. The experimental results demonstrate the superiority of the proposed method in localizing the document in natural images, especially in those with complex background. Compared with the state-of-the-art works, our method outperforms most of them.

previous article Handwritten Arabic text recognition using multi-stage sub-core-shape HMMs

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE International Conference on Computer Vision, pp. 3047–3055 (2017)

Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceeding of International Conference on Computer Vision, pp. 1520–1528 (2015)

Qiao, Y., Hu, Q.M., Qian, G.Y., Luo, S.H., Nowinski, W.L.: Thresholding based on variance and intensity contrast. Pattern Recognit. 40, 596–608 (2007)CrossRefMATH

Tobias, O.J., Seara, R.: Image segmentation by histogram thresholding using fuzzy sets. IEEE Trans. Image Process. 11, 1457–65 (2002)CrossRef

Lampert, C.H., Braun, T., Ulges, A., Keysers, D., Breuel, T.M.: Oblivious document capture and real-time retrieval. In: Proceeding of International Conference on Camera Based Document Analysis and Recognition, pp. 79–86 (2005)

Guillou, E., Meneveaux, D., Maisel, E., Bouaouch, K.: Using vanishing points for camera calibration and coarse 3D reconstruction from a single image. Visual Comput. 16, 396–410 (2000)CrossRefMATH

Kofler, C., Keysers, D., Koetsier, A., Laagland, J., Breuel, T.M.: Gestural interaction for an automatic document capture system. In: Proceedings of the International Workshop on Camera-Based Document Analysis and Recognition, pp. 161–167 (2007)

Clark, P., Mirmehdi, M.: Rectifying perspective view of text in 3D scenes using vanishing points. Pattern Recognit. 36, 2673–2686 (2003)CrossRef

10.

Miao, L., Peng, S.: Perspective rectification of document images based on morphology. In: International Conference on Computational Intelligence and Security, pp. 1805–1808 (2009)

11.

Lu, S., Tan, C.L.: The restoration of camera documents through image segmentation. In: Proceeding of Document Analysis Systems, vol. 3872, pp. 484–495 (2006)

12.

Lu, S., Chen, B.M., Ko, C.C.: Perspective rectification of document images using fuzzy set and morphological operations. Image Vis. Comput. 23, 541–553 (2005)CrossRef

13.

Stamatopoulos, N., Gatos, B., Kesidis, A.: Automatic borders detection of camera document images. Psychopharmacology 182, 597–598 (2007)

14.

Bulatov, K., Arlazarov, V.V., Chernov, T., Slavin, O., Nikolaev, D.: Smart IDReader: document recognition in video stream. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 39–44 (2018)

15.

Zhang, Z., He, L. W.: Note-taking with a camera: whiteboard scanning and image enhancement. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing, pp. 533–536 (2004)

16.

Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013)

17.

Zhukovsky, A., Nikolaev, D., Arlazarov, V., Postnikov, V., Polevoy, D., Skoryukina, N., Chernov, T., Shemiakina, J., Mukovozov, A., Konovalenko, I.: Segments graph-based approach for document capture in a smartphone video stream. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 337–342 (2018)

18.

Javed, K., Shafait, F.: Real-time document localization in natural images by recursive application of a CNN. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 105–110 (2017)

19.

Yin, X.C., Sun, J., Naoi, S., Fujimoto, K., Fujii, Y., Kurokawa, K., Takebe, H.: A multi-stage strategy to perspective rectification for mobile phone camera-based document images. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 574–578 (2007)

20.

Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations (2018). arXiv preprint arXiv: 1805.12177

21.

Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2016)

22.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

23.

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. In: USENIX Symposium on Operating Systems Design and Implementation, pp. 265–283 (2016)

24.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

25.

Burie, J.C., Chazalon, J., Coustaty, M., Eskenazi, S., Luqman, M.M., Mehri, M., Nayef, N., Ogier, J.M., Prum, S., Rusinol, M.: ICDAR2015 competition on smartphone document capture and OCR (Smart-Doc). In: Proceeding of International Conference on Document Analysis and Recognition, pp. 1161–1165 (2015)

26.

Zisserman, A.: The Pascal Visual Object Classes Challenge. Lecture Notes in Computer Science, vol. 111, pp. 98–136 (2007)

Title: Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement
Authors: Anna Zhu
Chen Zhang
Zhi Li
Shengwu Xiong
Publication date: 08-08-2019
Publisher: Springer Berlin Heidelberg
Published in: International Journal on Document Analysis and Recognition (IJDAR) / Issue 3/2019
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-019-00341-0

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2019

Generalized framework for summarization of fixed-camera lecture videos by detecting and binarizing handwritten content

An anchor-free region proposal network for Faster R-CNN-based text detection approaches

Boosting scene character recognition by learning canonical forms of glyphs

A two-stage method for text line detection in historical documents

On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model

Comic MTL: optimized multi-task learning for comic book image analysis

Premium Partner