nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

Page Segmentation Using Convolutional Neural Network and Graphical Model

verfasst von : Xiao-Hui Li, Fei Yin, Cheng-Lin Liu

Erschienen in: Document Analysis Systems

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Page segmentation of document images remains a challenge due to complex layout and heterogeneous image contents. Existing deep learning based methods usually follow the general semantic segmentation or object detection frameworks, without plentiful exploration of document image characteristics. In this paper, we propose an effective method for page segmentation using convolutional neural network (CNN) and graphical model, where the CNN is powerful for extracting visual features and the graphical model explores the relationship (spatial context) between visual primitives and regions. A page image is represented as a graph whose nodes represent the primitives and edges represent the relationships between neighboring primitives. We consider two types of graphical models: graph attention network (GAT) and conditional random field (CRF). Using a convolutional feature pyramid network (FPN) for feature extraction, its parameters can be estimated jointly with the GAT. The CRF can be used for joint prediction of primitive labels, and combined with the CNN and GAT. Experimental results on the PubLayNet dataset show that our method can extract various page regions with precise boundaries. The comparison of different configurations show that GAT improves the performance when using shallow backbone CNN, but the improvement with deep backbone CNN is not evident, while CRF is always effective to improve, even when combining on top of GAT.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents

Nächstes Kapitel The Notary in the Haystack – Countering Class Imbalance in Document Processing with CNNs

https://pytorch.org/get-started/locally/.

https://docs.dgl.ai/index.html.

https://github.com/opengm/opengm.

http://cocodataset.org/#detection-eval.

Shafait, F., Keysers, D., Breuel, T.: Performance evaluation and benchmarking of six-page segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30, 941–954 (2008)CrossRef

Li, X.-H., Yin, F., Liu, C.-L.: Page object detection from pdf document images by deep structured prediction and supervised clustering. In: ICPR, pp. 3627–3632 (2018)

Meier, B., Stadelmann, T., Stampfli, J., Arnold, M., Cieliebak, M.: Fully convolutional neural networks for newspaper article segmentation. In: ICDAR, pp. 414–419 (2017)

He, D., Cohen, S., Price, B., Kifer, D., Lee Giles, C.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: ICDAR, pp. 254–261 (2017)

Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: CVPR, pp. 5315–5324 (2017)

Li, X.-H., Yin, F., Xue, T., Liu, L., Ogier, J.-M., Liu, C.-L.: Instance aware document image segmentation using label pyramid networks and deep watershed transformation. In: ICDAR, pp. 514–519 (2019)

Huang, Y., et al.: A yolo-based table detection method. In: ICDAR, pp. 813–818 (2019)

Sun, N., Zhu, Y., Hu, X.: Faster R-CNN based table detection combining corner locating. In: ICDAR, pp. 1314–1319 (2019)

Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv:1804.02767 (2018)

10.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

11.

Saha, R., Mondal, A., Jawahar, CV.: Graphical object detection in document images. In: ICDAR, pp. 51–58 (2019)

12.

Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. In: ICDAR. (2019)

13.

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)

14.

Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: CNN based page object detection in document images. In: ICDAR, pp. 230–235 (2017)

15.

Dai, B., Zhang, Y., Lin, D.: Detecting visual relationships with deep relational networks. In: CVPR, pp. 3076–3086 (2017)

16.

Davis, B., Morse, B., Cohen, S., Price, B., Tensmeyer, C.: Deep visual template-free form parsing. In: ICDAR (2019)

17.

Mahdavi, M., Condon, M., Davila, K., Zanibbi, R.: LPGA: line-of-sight parsing with graph-based attention for math formula recognition. In: ICDAR (2019)

18.

Zhou, J., et al.: Graph neural networks: a review of methods and applications. arXiv:1812.08434 (2018)

19.

Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)MATH

20.

Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv:1609.02907 (2016)

21.

Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv:1710.10903 (2017)

22.

Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)

23.

Ye, J.-Y., Zhang, Y.-M., Yang, Q., Liu, C.-L.: Contextual stroke classification in online handwritten documents with graph attention networks. In: ICDAR, pp. 993–998 (2019)

24.

Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking table recognition using graph neural networks. In: ICDAR, pp. 142–147 (2019)

25.

Ye, J.-Y., Zhang, Y.-M., Liu, C.-L.: Joint training of conditional random fields and neural networks for stroke classification in online handwritten documents. In: ICPR, pp. 3264–3269 (2016)

26.

Li, X.-H., Yin, F., Liu, C.-L.: Printed/handwritten texts and graphics separation in complex documents using conditional random fields. In: DAS, pp. 145–150 (2018)

27.

Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)

28.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

29.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

Titel: Page Segmentation Using Convolutional Neural Network and Graphical Model
verfasst von: Xiao-Hui Li
Fei Yin
Cheng-Lin Liu
Verlag: Springer International Publishing
Buch: Document Analysis Systems
Print ISBN: 978-3-030-57057-6

Electronic ISBN: 978-3-030-57058-3

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-57058-3_17

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"