Top

Published in:

2021 | OriginalPaper | Chapter

MSNet: A Multi-scale Segmentation Network for Documents Layout Analysis

Authors : Bo Wang, Ju Zhou, Bailing Zhang

Published in: Learning Technologies and Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Layout analysis is often a crucial step in document image analysis and understanding. In this paper, we propose a deep learning-based layout analysis approach to identify and categorize the regions of interests in the scanned image of text document. Although semantic segmentation has been applied at pixel-level of document image for geometric layout analysis with much progress, many challenges remain with complex and heterogeneous documents which often have a sparse structure without closed boundaries and fine typologies with variable scales. We propose a multi-scale segmentation network, called MSNet, for high-resolution document image. The model is characterized by the enlarged receptive field size and multi-scale feature extraction. Experiments are conducted on a Chinese document dataset with satisfying performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A Classification Method of Inventory Spare Parts Based on Improved Super Efficient DEA-ABC Model

next chapter Research on Intelligent Transportation Platform Based on Big Data Technology

Xu, Y., Yin F., Zhang Z.X., et al.: Page segmentation for historical handwritten documents using fully convolutional networks. In: 27th International Joint Conference on Artificial Intelligence, pp. 1057–1063. IEEE Computer Society, Kyoto (2017)

Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRef

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE, Boston (2015)

Badrinarayanan, V., Kendal, L.A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). https://doi.org/10.1109/tpami.2016.2644615

Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention, MICCAI 2015. MICCAI 2015. LNCS, vol 9351. pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

Chen, L.C., Papandreou, G., Kokkinos, I., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). https://doi.org/10.1109/TPAMI.2017.2699184CrossRef

Chen, L.C., Papandreou, G., Schroff, F., et al.: Rethinking atrous convolution for semantic image segmentation. arXiv (2017). https://arxiv.org/abs/1706.05587

Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49CrossRef

He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, Las Vegas (2016)

10.

Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: 30th Neural Information Processing Systems, pp. 6000–6010. Curran Associates, Long Beach (2017)

11.

Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154. IEEE, Long Beach (2019)

12.

Hu, J., Shen, L., Albanie, S., et al.: Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. IEEE, Salt Lake City (2018)

13.

Simonyan, k., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv (2015), https://arxiv.org/abs/1409.1556

14.

Zhao, H., Shi, J., Qi, X., et al.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 6230–6239. IEEE, Honolulu (2017)

15.

Lin, G., Milan, A., Shen, C., et al.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1926–1934. IEEE, Honolulu (2017)

16.

Zhang, H., Dana, K., Shi, J., et al.: Context encoding for semantic segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160. IEEE, Salt Lake City (2018)

17.

Yu, F., Koltun, V.: Multi-Scale Context Aggregation by Dilated Convolutions. arXiv (2016). https://arxiv.org/abs/1511.07122

18.

Li, H., Xiong, P., An, J., et al.: Pyramid attention network for semantic segmentation. arXiv (2018). https://arxiv.org/abs/1805.10180

19.

Yu, C., Wang, J., Peng, C., et al.: Learning a discriminative feature network for semantic segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866. IEEE, Salt Lake City (2018)

20.

Oktay, O., Schlemper, J., Folgoc, L.L., et al.: Attention U-Net: learning where to look for the pancreas. arXiv (2018). https://arxiv.org/abs/1804.03999

21.

Sun, K., Xiao, B., Liu, D., et al.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703. IEEE, Long Beach (2019)

22.

Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20CrossRef

23.

Yu, C., Gao, C., Wang, J., et al.: BiSeNet v2: bilateral network with guided aggregation for real-time semantic segmentation. arXiv (2020). https://arxiv.org/abs/2004.02147

24.

Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. arXiv (2019). https://arxiv.org/abs/1908.07836

25.

Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: 28th International Conference on Neural Information Processing Systems, pp. 91–99. Curran Associates, Montreal (2015)

26.

He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2018). https://doi.org/10.1109/TPAMI.2018.2844175CrossRef

27.

Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv (2018). https://arxiv.org/abs/1804.02767

28.

Gilani, A., Qasim, S.R., Malik, I., et al.: Table detection using deep learning. In: 14th IAPR International Conference on Document Analysis and Recognition, pp. 771–776. IEEE, Kyoto (2017)

29.

Tensmeyer, C., Davis, B., Wigington, C., et al.: PageNet: page boundary extraction in historical handwritten documents. arXiv (2017). https://arxiv.org/abs/1709.01618

30.

Tuan, T.A., Oh, K., Na, I.S., et al.: A robust system for document layout analysis using multilevel homogeneity structure. Expert Syst. Appl. 85(1), 99–113 (2017)

31.

Oliveira, D.A.B., Viana, M.P.: Fast CNN-based document layout analysis. In: 2017 IEEE Conference on International Conference on Computer Vision, pp. 1173–1180. IEEE, Venice (2017)

32.

Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv (2017). https://arxiv.org/abs/1704.04861

33.

Xie, S., Girshick, R., Dollar, P., et al.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500. IEEE, Honolulu (2017)

34.

Yang, M., Yu, K., Zhang, C., et al.: DenseASPP for semantic segmentation in street scenes. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3684–3692. IEEE, Salt Lake City (2018)

Title: MSNet: A Multi-scale Segmentation Network for Documents Layout Analysis
Authors: Bo Wang
Ju Zhou
Bailing Zhang
Publisher: Springer International Publishing
Book: Learning Technologies and Systems
Print ISBN: 978-3-030-66905-8

Electronic ISBN: 978-3-030-66906-5

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-66906-5_21

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner