Skip to main content
Top
Published in: International Journal on Document Analysis and Recognition (IJDAR) 3/2019

08-08-2019 | Special Issue Paper

Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement

Authors: Anna Zhu, Chen Zhang, Zhi Li, Shengwu Xiong

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Document localization is a promising step for document-based optical character recognition. This task gains difficulty when documents are located in complex natural scene images. In this paper, we propose a coarse-to-fine document localization approach to detect the four corner points of the document in natural scene images. In the first stage, the four corners are roughly predicted through a deep neural networks-based Joint Corner Detector (JCD) with an attention mechanism, which roughly localize the document region via an attentional map. As a key to produce accurate inference of corners, the JCD module suppresses the interference from background in convolutional features substantially. In the second stage, a corner-specific refiner module is designed to refine the previously predicted corners. Considering the different characteristics of the four document corners, the patches cropped around the predicted corners are input into four different corner-specified CNN models, to search the accurate corner locations recursively. Three datasets (ICDAR 2015 SmartDoc competition 1 dataset, SEECS-NUSF dataset and a self-collected dataset) are used to evaluate the performance of our method. The experimental results demonstrate the superiority of the proposed method in localizing the document in natural images, especially in those with complex background. Compared with the state-of-the-art works, our method outperforms most of them.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
2.
go back to reference He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE International Conference on Computer Vision, pp. 3047–3055 (2017) He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE International Conference on Computer Vision, pp. 3047–3055 (2017)
3.
go back to reference Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceeding of International Conference on Computer Vision, pp. 1520–1528 (2015) Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceeding of International Conference on Computer Vision, pp. 1520–1528 (2015)
4.
go back to reference Qiao, Y., Hu, Q.M., Qian, G.Y., Luo, S.H., Nowinski, W.L.: Thresholding based on variance and intensity contrast. Pattern Recognit. 40, 596–608 (2007)CrossRefMATH Qiao, Y., Hu, Q.M., Qian, G.Y., Luo, S.H., Nowinski, W.L.: Thresholding based on variance and intensity contrast. Pattern Recognit. 40, 596–608 (2007)CrossRefMATH
5.
go back to reference Tobias, O.J., Seara, R.: Image segmentation by histogram thresholding using fuzzy sets. IEEE Trans. Image Process. 11, 1457–65 (2002)CrossRef Tobias, O.J., Seara, R.: Image segmentation by histogram thresholding using fuzzy sets. IEEE Trans. Image Process. 11, 1457–65 (2002)CrossRef
6.
go back to reference Lampert, C.H., Braun, T., Ulges, A., Keysers, D., Breuel, T.M.: Oblivious document capture and real-time retrieval. In: Proceeding of International Conference on Camera Based Document Analysis and Recognition, pp. 79–86 (2005) Lampert, C.H., Braun, T., Ulges, A., Keysers, D., Breuel, T.M.: Oblivious document capture and real-time retrieval. In: Proceeding of International Conference on Camera Based Document Analysis and Recognition, pp. 79–86 (2005)
7.
go back to reference Guillou, E., Meneveaux, D., Maisel, E., Bouaouch, K.: Using vanishing points for camera calibration and coarse 3D reconstruction from a single image. Visual Comput. 16, 396–410 (2000)CrossRefMATH Guillou, E., Meneveaux, D., Maisel, E., Bouaouch, K.: Using vanishing points for camera calibration and coarse 3D reconstruction from a single image. Visual Comput. 16, 396–410 (2000)CrossRefMATH
8.
go back to reference Kofler, C., Keysers, D., Koetsier, A., Laagland, J., Breuel, T.M.: Gestural interaction for an automatic document capture system. In: Proceedings of the International Workshop on Camera-Based Document Analysis and Recognition, pp. 161–167 (2007) Kofler, C., Keysers, D., Koetsier, A., Laagland, J., Breuel, T.M.: Gestural interaction for an automatic document capture system. In: Proceedings of the International Workshop on Camera-Based Document Analysis and Recognition, pp. 161–167 (2007)
9.
go back to reference Clark, P., Mirmehdi, M.: Rectifying perspective view of text in 3D scenes using vanishing points. Pattern Recognit. 36, 2673–2686 (2003)CrossRef Clark, P., Mirmehdi, M.: Rectifying perspective view of text in 3D scenes using vanishing points. Pattern Recognit. 36, 2673–2686 (2003)CrossRef
10.
go back to reference Miao, L., Peng, S.: Perspective rectification of document images based on morphology. In: International Conference on Computational Intelligence and Security, pp. 1805–1808 (2009) Miao, L., Peng, S.: Perspective rectification of document images based on morphology. In: International Conference on Computational Intelligence and Security, pp. 1805–1808 (2009)
11.
go back to reference Lu, S., Tan, C.L.: The restoration of camera documents through image segmentation. In: Proceeding of Document Analysis Systems, vol. 3872, pp. 484–495 (2006) Lu, S., Tan, C.L.: The restoration of camera documents through image segmentation. In: Proceeding of Document Analysis Systems, vol. 3872, pp. 484–495 (2006)
12.
go back to reference Lu, S., Chen, B.M., Ko, C.C.: Perspective rectification of document images using fuzzy set and morphological operations. Image Vis. Comput. 23, 541–553 (2005)CrossRef Lu, S., Chen, B.M., Ko, C.C.: Perspective rectification of document images using fuzzy set and morphological operations. Image Vis. Comput. 23, 541–553 (2005)CrossRef
13.
go back to reference Stamatopoulos, N., Gatos, B., Kesidis, A.: Automatic borders detection of camera document images. Psychopharmacology 182, 597–598 (2007) Stamatopoulos, N., Gatos, B., Kesidis, A.: Automatic borders detection of camera document images. Psychopharmacology 182, 597–598 (2007)
14.
go back to reference Bulatov, K., Arlazarov, V.V., Chernov, T., Slavin, O., Nikolaev, D.: Smart IDReader: document recognition in video stream. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 39–44 (2018) Bulatov, K., Arlazarov, V.V., Chernov, T., Slavin, O., Nikolaev, D.: Smart IDReader: document recognition in video stream. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 39–44 (2018)
15.
go back to reference Zhang, Z., He, L. W.: Note-taking with a camera: whiteboard scanning and image enhancement. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing, pp. 533–536 (2004) Zhang, Z., He, L. W.: Note-taking with a camera: whiteboard scanning and image enhancement. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing, pp. 533–536 (2004)
16.
go back to reference Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013) Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013)
17.
go back to reference Zhukovsky, A., Nikolaev, D., Arlazarov, V., Postnikov, V., Polevoy, D., Skoryukina, N., Chernov, T., Shemiakina, J., Mukovozov, A., Konovalenko, I.: Segments graph-based approach for document capture in a smartphone video stream. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 337–342 (2018) Zhukovsky, A., Nikolaev, D., Arlazarov, V., Postnikov, V., Polevoy, D., Skoryukina, N., Chernov, T., Shemiakina, J., Mukovozov, A., Konovalenko, I.: Segments graph-based approach for document capture in a smartphone video stream. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 337–342 (2018)
18.
go back to reference Javed, K., Shafait, F.: Real-time document localization in natural images by recursive application of a CNN. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 105–110 (2017) Javed, K., Shafait, F.: Real-time document localization in natural images by recursive application of a CNN. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 105–110 (2017)
19.
go back to reference Yin, X.C., Sun, J., Naoi, S., Fujimoto, K., Fujii, Y., Kurokawa, K., Takebe, H.: A multi-stage strategy to perspective rectification for mobile phone camera-based document images. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 574–578 (2007) Yin, X.C., Sun, J., Naoi, S., Fujimoto, K., Fujii, Y., Kurokawa, K., Takebe, H.: A multi-stage strategy to perspective rectification for mobile phone camera-based document images. In: Proceeding of International Conference on Document Analysis and Recognition, pp. 574–578 (2007)
20.
go back to reference Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations (2018). arXiv preprint arXiv: 1805.12177 Azulay, A., Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations (2018). arXiv preprint arXiv:​ 1805.​12177
21.
go back to reference Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2016) Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2016)
22.
go back to reference Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
23.
go back to reference Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. In: USENIX Symposium on Operating Systems Design and Implementation, pp. 265–283 (2016) Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. In: USENIX Symposium on Operating Systems Design and Implementation, pp. 265–283 (2016)
24.
go back to reference Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceeding of International Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
25.
go back to reference Burie, J.C., Chazalon, J., Coustaty, M., Eskenazi, S., Luqman, M.M., Mehri, M., Nayef, N., Ogier, J.M., Prum, S., Rusinol, M.: ICDAR2015 competition on smartphone document capture and OCR (Smart-Doc). In: Proceeding of International Conference on Document Analysis and Recognition, pp. 1161–1165 (2015) Burie, J.C., Chazalon, J., Coustaty, M., Eskenazi, S., Luqman, M.M., Mehri, M., Nayef, N., Ogier, J.M., Prum, S., Rusinol, M.: ICDAR2015 competition on smartphone document capture and OCR (Smart-Doc). In: Proceeding of International Conference on Document Analysis and Recognition, pp. 1161–1165 (2015)
26.
go back to reference Zisserman, A.: The Pascal Visual Object Classes Challenge. Lecture Notes in Computer Science, vol. 111, pp. 98–136 (2007) Zisserman, A.: The Pascal Visual Object Classes Challenge. Lecture Notes in Computer Science, vol. 111, pp. 98–136 (2007)
Metadata
Title
Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement
Authors
Anna Zhu
Chen Zhang
Zhi Li
Shengwu Xiong
Publication date
08-08-2019
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Document Analysis and Recognition (IJDAR) / Issue 3/2019
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-019-00341-0

Other articles of this Issue 3/2019

International Journal on Document Analysis and Recognition (IJDAR) 3/2019 Go to the issue

Premium Partner