Top

Published in:

2021 | OriginalPaper | Chapter

Data Synthesis for Document Layout Analysis

Authors : Lin Wan, Ju Zhou, Bailing Zhang

Published in: Learning Technologies and Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Layout analysis plays an important role in various document image processing tasks such as OCR and document understanding, and the methods based on deep learning have achieved significant achievements. In recent years, pre-training and transfer learning techniques have become a common practice in a variety of computer vision and natural language processing tasks. In this paper, we present an efficient approach of data synthesis for pretraining deep learning models in document layout analysis. The synthesized data is automatically annotated based on heuristic rules, and then applied to the PubLayNet pre-trained models. The models are fine-tuned with real document layout data. Three types of document elements are taken into account: text lines, tables, and figures/images. The experiments demonstrate that the pre-training model with synthesized data is very effective for transfer learning on different document domains.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Research on Intelligent Transportation Platform Based on Big Data Technology

next chapter An Experience of Teaching Advanced Control Engineering (ACE) for Postgraduate Students

Yiheng, X., Minghao, L., Lei, C., et al.: LayoutLM: Pre-training of text and layout for document image understanding. https://arxiv.org/abs/1912.13318, arXiv (2009)

Dafang, H., Scott, C., et al.: MultiScale multi-task FCN for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Kyoto, Japan(2017)

O’Gorman, L.: The document spectrum for page layout analysis. Pattern Anal. Mach. Intell. IEEE Trans. 15(11), 1162–1173 (1993)

Kise, K., Sato, A., et al.: Segmentation of page images using the area voronoi diagram. Comput. Vis. Image Underst. 70(3), 370–382 (1998)CrossRef

Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. IEEE Comput. 25(7), 10–22 (1992)

Leipeng, H., Liangcai, G., et al.: A table detection method for pdf documents based on convolutional neural networks. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 287–292. IEEE, Santorini, Greece (2016)

Shaoqing, R., Kaiming, H., et al.: Faster-RCNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRef

Kaiming, H.,Georgia, G., Piotr, D., et al.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2017)

Yang, X., Yumer, E., et al.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, USA (2017)

10.

Kai, C., Seuret, M.: Convolutional neural networks for page segmentation of historical document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, Kyoto, Japan (2017)

11.

Viana, P.M., Oliveira, D.A.B.: Fast CNN-based document layout analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1173–1180. IEEE Computer Society, Venice, Italy (2017)

12.

Nikolenko, S.I.: Synthetic Data for Deep Learning. https://arxiv.org/abs/1909.11512. arXiv (2019)

13.

Kamilaris, A., van den Brink, C., Karatsiolis, S.: Training deep learning models via synthetic data: application in unmanned aerial vehicles. In: Vento, M., Percannella, G., Colantonio, S., Giorgi, D., Matuszewski, Bogdan J., Kerdegari, H., Razaak, M. (eds.) CAIP 2019. CCIS, vol. 1089, pp. 81–90. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29930-9_8CrossRef

14.

Max, J., Karen, S., et al.: Synthetic data and artificial neural networks for natural scene text recognition. In: Computer Vision and Pattern Recognition (2014)

15.

Zhong, X., Tang, J., et al.: PubLayNet: largest dataset ever for document layout analysis. https://arxiv.org/pdf/1908.07836v1. arXiv (2019)

16.

Romen, .S.T., Roy, S., Imocha, S.O., et al.: A new local adaptive thresholding technique in binarization. Int. J. Comput. 8(6) (2011)

17.

Girshick, R., Donahue, J., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. https://arxiv.org/abs/1311.2524. arXiv (2014)

18.

Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

19.

Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

20.

Kaiming, H., Georgia, G., Piotrk, D., et al.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2980–2988 (2017)

Title: Data Synthesis for Document Layout Analysis
Authors: Lin Wan
Ju Zhou
Bailing Zhang
Publisher: Springer International Publishing
Book: Learning Technologies and Systems
Print ISBN: 978-3-030-66905-8

Electronic ISBN: 978-3-030-66906-5

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-66906-5_23

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner