Skip to main content
Top

2021 | OriginalPaper | Chapter

Data Synthesis for Document Layout Analysis

Authors : Lin Wan, Ju Zhou, Bailing Zhang

Published in: Learning Technologies and Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Layout analysis plays an important role in various document image processing tasks such as OCR and document understanding, and the methods based on deep learning have achieved significant achievements. In recent years, pre-training and transfer learning techniques have become a common practice in a variety of computer vision and natural language processing tasks. In this paper, we present an efficient approach of data synthesis for pretraining deep learning models in document layout analysis. The synthesized data is automatically annotated based on heuristic rules, and then applied to the PubLayNet pre-trained models. The models are fine-tuned with real document layout data. Three types of document elements are taken into account: text lines, tables, and figures/images. The experiments demonstrate that the pre-training model with synthesized data is very effective for transfer learning on different document domains.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Dafang, H., Scott, C., et al.: MultiScale multi-task FCN for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Kyoto, Japan(2017) Dafang, H., Scott, C., et al.: MultiScale multi-task FCN for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Kyoto, Japan(2017)
3.
go back to reference O’Gorman, L.: The document spectrum for page layout analysis. Pattern Anal. Mach. Intell. IEEE Trans. 15(11), 1162–1173 (1993) O’Gorman, L.: The document spectrum for page layout analysis. Pattern Anal. Mach. Intell. IEEE Trans. 15(11), 1162–1173 (1993)
4.
go back to reference Kise, K., Sato, A., et al.: Segmentation of page images using the area voronoi diagram. Comput. Vis. Image Underst. 70(3), 370–382 (1998)CrossRef Kise, K., Sato, A., et al.: Segmentation of page images using the area voronoi diagram. Comput. Vis. Image Underst. 70(3), 370–382 (1998)CrossRef
5.
go back to reference Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. IEEE Comput. 25(7), 10–22 (1992) Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. IEEE Comput. 25(7), 10–22 (1992)
6.
go back to reference Leipeng, H., Liangcai, G., et al.: A table detection method for pdf documents based on convolutional neural networks. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 287–292. IEEE, Santorini, Greece (2016) Leipeng, H., Liangcai, G., et al.: A table detection method for pdf documents based on convolutional neural networks. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 287–292. IEEE, Santorini, Greece (2016)
7.
go back to reference Shaoqing, R., Kaiming, H., et al.: Faster-RCNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRef Shaoqing, R., Kaiming, H., et al.: Faster-RCNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRef
8.
go back to reference Kaiming, H.,Georgia, G., Piotr, D., et al.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2017) Kaiming, H.,Georgia, G., Piotr, D., et al.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2017)
9.
go back to reference Yang, X., Yumer, E., et al.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, USA (2017) Yang, X., Yumer, E., et al.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, USA (2017)
10.
go back to reference Kai, C., Seuret, M.: Convolutional neural networks for page segmentation of historical document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, Kyoto, Japan (2017) Kai, C., Seuret, M.: Convolutional neural networks for page segmentation of historical document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, Kyoto, Japan (2017)
11.
go back to reference Viana, P.M., Oliveira, D.A.B.: Fast CNN-based document layout analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1173–1180. IEEE Computer Society, Venice, Italy (2017) Viana, P.M., Oliveira, D.A.B.: Fast CNN-based document layout analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1173–1180. IEEE Computer Society, Venice, Italy (2017)
13.
go back to reference Kamilaris, A., van den Brink, C., Karatsiolis, S.: Training deep learning models via synthetic data: application in unmanned aerial vehicles. In: Vento, M., Percannella, G., Colantonio, S., Giorgi, D., Matuszewski, Bogdan J., Kerdegari, H., Razaak, M. (eds.) CAIP 2019. CCIS, vol. 1089, pp. 81–90. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29930-9_8CrossRef Kamilaris, A., van den Brink, C., Karatsiolis, S.: Training deep learning models via synthetic data: application in unmanned aerial vehicles. In: Vento, M., Percannella, G., Colantonio, S., Giorgi, D., Matuszewski, Bogdan J., Kerdegari, H., Razaak, M. (eds.) CAIP 2019. CCIS, vol. 1089, pp. 81–90. Springer, Cham (2019). https://​doi.​org/​10.​1007/​978-3-030-29930-9_​8CrossRef
14.
go back to reference Max, J., Karen, S., et al.: Synthetic data and artificial neural networks for natural scene text recognition. In: Computer Vision and Pattern Recognition (2014) Max, J., Karen, S., et al.: Synthetic data and artificial neural networks for natural scene text recognition. In: Computer Vision and Pattern Recognition (2014)
16.
go back to reference Romen, .S.T., Roy, S., Imocha, S.O., et al.: A new local adaptive thresholding technique in binarization. Int. J. Comput. 8(6) (2011) Romen, .S.T., Roy, S., Imocha, S.O., et al.: A new local adaptive thresholding technique in binarization. Int. J. Comput. 8(6) (2011)
18.
go back to reference Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
19.
go back to reference Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017) Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
20.
go back to reference Kaiming, H., Georgia, G., Piotrk, D., et al.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2980–2988 (2017) Kaiming, H., Georgia, G., Piotrk, D., et al.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2980–2988 (2017)
Metadata
Title
Data Synthesis for Document Layout Analysis
Authors
Lin Wan
Ju Zhou
Bailing Zhang
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-66906-5_23

Premium Partner