Skip to main content

2020 | OriginalPaper | Buchkapitel

Page Segmentation Using Convolutional Neural Network and Graphical Model

verfasst von : Xiao-Hui Li, Fei Yin, Cheng-Lin Liu

Erschienen in: Document Analysis Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Page segmentation of document images remains a challenge due to complex layout and heterogeneous image contents. Existing deep learning based methods usually follow the general semantic segmentation or object detection frameworks, without plentiful exploration of document image characteristics. In this paper, we propose an effective method for page segmentation using convolutional neural network (CNN) and graphical model, where the CNN is powerful for extracting visual features and the graphical model explores the relationship (spatial context) between visual primitives and regions. A page image is represented as a graph whose nodes represent the primitives and edges represent the relationships between neighboring primitives. We consider two types of graphical models: graph attention network (GAT) and conditional random field (CRF). Using a convolutional feature pyramid network (FPN) for feature extraction, its parameters can be estimated jointly with the GAT. The CRF can be used for joint prediction of primitive labels, and combined with the CNN and GAT. Experimental results on the PubLayNet dataset show that our method can extract various page regions with precise boundaries. The comparison of different configurations show that GAT improves the performance when using shallow backbone CNN, but the improvement with deep backbone CNN is not evident, while CRF is always effective to improve, even when combining on top of GAT.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Shafait, F., Keysers, D., Breuel, T.: Performance evaluation and benchmarking of six-page segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30, 941–954 (2008)CrossRef Shafait, F., Keysers, D., Breuel, T.: Performance evaluation and benchmarking of six-page segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30, 941–954 (2008)CrossRef
2.
Zurück zum Zitat Li, X.-H., Yin, F., Liu, C.-L.: Page object detection from pdf document images by deep structured prediction and supervised clustering. In: ICPR, pp. 3627–3632 (2018) Li, X.-H., Yin, F., Liu, C.-L.: Page object detection from pdf document images by deep structured prediction and supervised clustering. In: ICPR, pp. 3627–3632 (2018)
3.
Zurück zum Zitat Meier, B., Stadelmann, T., Stampfli, J., Arnold, M., Cieliebak, M.: Fully convolutional neural networks for newspaper article segmentation. In: ICDAR, pp. 414–419 (2017) Meier, B., Stadelmann, T., Stampfli, J., Arnold, M., Cieliebak, M.: Fully convolutional neural networks for newspaper article segmentation. In: ICDAR, pp. 414–419 (2017)
4.
Zurück zum Zitat He, D., Cohen, S., Price, B., Kifer, D., Lee Giles, C.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: ICDAR, pp. 254–261 (2017) He, D., Cohen, S., Price, B., Kifer, D., Lee Giles, C.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: ICDAR, pp. 254–261 (2017)
5.
Zurück zum Zitat Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: CVPR, pp. 5315–5324 (2017) Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: CVPR, pp. 5315–5324 (2017)
6.
Zurück zum Zitat Li, X.-H., Yin, F., Xue, T., Liu, L., Ogier, J.-M., Liu, C.-L.: Instance aware document image segmentation using label pyramid networks and deep watershed transformation. In: ICDAR, pp. 514–519 (2019) Li, X.-H., Yin, F., Xue, T., Liu, L., Ogier, J.-M., Liu, C.-L.: Instance aware document image segmentation using label pyramid networks and deep watershed transformation. In: ICDAR, pp. 514–519 (2019)
7.
Zurück zum Zitat Huang, Y., et al.: A yolo-based table detection method. In: ICDAR, pp. 813–818 (2019) Huang, Y., et al.: A yolo-based table detection method. In: ICDAR, pp. 813–818 (2019)
8.
Zurück zum Zitat Sun, N., Zhu, Y., Hu, X.: Faster R-CNN based table detection combining corner locating. In: ICDAR, pp. 1314–1319 (2019) Sun, N., Zhu, Y., Hu, X.: Faster R-CNN based table detection combining corner locating. In: ICDAR, pp. 1314–1319 (2019)
10.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
11.
Zurück zum Zitat Saha, R., Mondal, A., Jawahar, CV.: Graphical object detection in document images. In: ICDAR, pp. 51–58 (2019) Saha, R., Mondal, A., Jawahar, CV.: Graphical object detection in document images. In: ICDAR, pp. 51–58 (2019)
12.
Zurück zum Zitat Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. In: ICDAR. (2019) Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. In: ICDAR. (2019)
13.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
14.
Zurück zum Zitat Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: CNN based page object detection in document images. In: ICDAR, pp. 230–235 (2017) Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: CNN based page object detection in document images. In: ICDAR, pp. 230–235 (2017)
15.
Zurück zum Zitat Dai, B., Zhang, Y., Lin, D.: Detecting visual relationships with deep relational networks. In: CVPR, pp. 3076–3086 (2017) Dai, B., Zhang, Y., Lin, D.: Detecting visual relationships with deep relational networks. In: CVPR, pp. 3076–3086 (2017)
16.
Zurück zum Zitat Davis, B., Morse, B., Cohen, S., Price, B., Tensmeyer, C.: Deep visual template-free form parsing. In: ICDAR (2019) Davis, B., Morse, B., Cohen, S., Price, B., Tensmeyer, C.: Deep visual template-free form parsing. In: ICDAR (2019)
17.
Zurück zum Zitat Mahdavi, M., Condon, M., Davila, K., Zanibbi, R.: LPGA: line-of-sight parsing with graph-based attention for math formula recognition. In: ICDAR (2019) Mahdavi, M., Condon, M., Davila, K., Zanibbi, R.: LPGA: line-of-sight parsing with graph-based attention for math formula recognition. In: ICDAR (2019)
19.
Zurück zum Zitat Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)MATH Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge (2009)MATH
21.
22.
Zurück zum Zitat Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001) Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
23.
Zurück zum Zitat Ye, J.-Y., Zhang, Y.-M., Yang, Q., Liu, C.-L.: Contextual stroke classification in online handwritten documents with graph attention networks. In: ICDAR, pp. 993–998 (2019) Ye, J.-Y., Zhang, Y.-M., Yang, Q., Liu, C.-L.: Contextual stroke classification in online handwritten documents with graph attention networks. In: ICDAR, pp. 993–998 (2019)
24.
Zurück zum Zitat Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking table recognition using graph neural networks. In: ICDAR, pp. 142–147 (2019) Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking table recognition using graph neural networks. In: ICDAR, pp. 142–147 (2019)
25.
Zurück zum Zitat Ye, J.-Y., Zhang, Y.-M., Liu, C.-L.: Joint training of conditional random fields and neural networks for stroke classification in online handwritten documents. In: ICPR, pp. 3264–3269 (2016) Ye, J.-Y., Zhang, Y.-M., Liu, C.-L.: Joint training of conditional random fields and neural networks for stroke classification in online handwritten documents. In: ICPR, pp. 3264–3269 (2016)
26.
Zurück zum Zitat Li, X.-H., Yin, F., Liu, C.-L.: Printed/handwritten texts and graphics separation in complex documents using conditional random fields. In: DAS, pp. 145–150 (2018) Li, X.-H., Yin, F., Liu, C.-L.: Printed/handwritten texts and graphics separation in complex documents using conditional random fields. In: DAS, pp. 145–150 (2018)
27.
Zurück zum Zitat Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017) Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
28.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
29.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Metadaten
Titel
Page Segmentation Using Convolutional Neural Network and Graphical Model
verfasst von
Xiao-Hui Li
Fei Yin
Cheng-Lin Liu
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-57058-3_17