Skip to main content
Top

2021 | OriginalPaper | Chapter

MSNet: A Multi-scale Segmentation Network for Documents Layout Analysis

Authors : Bo Wang, Ju Zhou, Bailing Zhang

Published in: Learning Technologies and Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Layout analysis is often a crucial step in document image analysis and understanding. In this paper, we propose a deep learning-based layout analysis approach to identify and categorize the regions of interests in the scanned image of text document. Although semantic segmentation has been applied at pixel-level of document image for geometric layout analysis with much progress, many challenges remain with complex and heterogeneous documents which often have a sparse structure without closed boundaries and fine typologies with variable scales. We propose a multi-scale segmentation network, called MSNet, for high-resolution document image. The model is characterized by the enlarged receptive field size and multi-scale feature extraction. Experiments are conducted on a Chinese document dataset with satisfying performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Xu, Y., Yin F., Zhang Z.X., et al.: Page segmentation for historical handwritten documents using fully convolutional networks. In: 27th International Joint Conference on Artificial Intelligence, pp. 1057–1063. IEEE Computer Society, Kyoto (2017) Xu, Y., Yin F., Zhang Z.X., et al.: Page segmentation for historical handwritten documents using fully convolutional networks. In: 27th International Joint Conference on Artificial Intelligence, pp. 1057–1063. IEEE Computer Society, Kyoto (2017)
2.
go back to reference Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRef Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRef
3.
go back to reference Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE, Boston (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE, Boston (2015)
5.
go back to reference Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention, MICCAI 2015. MICCAI 2015. LNCS, vol 9351. pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28 Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention, MICCAI 2015. MICCAI 2015. LNCS, vol 9351. pp. 234–241. Springer, Cham (2015). https://​doi.​org/​10.​1007/​978-3-319-24574-4_​28
9.
go back to reference He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, Las Vegas (2016) He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, Las Vegas (2016)
10.
go back to reference Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: 30th Neural Information Processing Systems, pp. 6000–6010. Curran Associates, Long Beach (2017) Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: 30th Neural Information Processing Systems, pp. 6000–6010. Curran Associates, Long Beach (2017)
11.
go back to reference Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154. IEEE, Long Beach (2019) Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154. IEEE, Long Beach (2019)
12.
go back to reference Hu, J., Shen, L., Albanie, S., et al.: Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. IEEE, Salt Lake City (2018) Hu, J., Shen, L., Albanie, S., et al.: Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. IEEE, Salt Lake City (2018)
14.
go back to reference Zhao, H., Shi, J., Qi, X., et al.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 6230–6239. IEEE, Honolulu (2017) Zhao, H., Shi, J., Qi, X., et al.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 6230–6239. IEEE, Honolulu (2017)
15.
go back to reference Lin, G., Milan, A., Shen, C., et al.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1926–1934. IEEE, Honolulu (2017) Lin, G., Milan, A., Shen, C., et al.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1926–1934. IEEE, Honolulu (2017)
16.
go back to reference Zhang, H., Dana, K., Shi, J., et al.: Context encoding for semantic segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160. IEEE, Salt Lake City (2018) Zhang, H., Dana, K., Shi, J., et al.: Context encoding for semantic segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160. IEEE, Salt Lake City (2018)
19.
go back to reference Yu, C., Wang, J., Peng, C., et al.: Learning a discriminative feature network for semantic segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866. IEEE, Salt Lake City (2018) Yu, C., Wang, J., Peng, C., et al.: Learning a discriminative feature network for semantic segmentation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866. IEEE, Salt Lake City (2018)
21.
go back to reference Sun, K., Xiao, B., Liu, D., et al.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703. IEEE, Long Beach (2019) Sun, K., Xiao, B., Liu, D., et al.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703. IEEE, Long Beach (2019)
25.
go back to reference Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: 28th International Conference on Neural Information Processing Systems, pp. 91–99. Curran Associates, Montreal (2015) Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: 28th International Conference on Neural Information Processing Systems, pp. 91–99. Curran Associates, Montreal (2015)
28.
go back to reference Gilani, A., Qasim, S.R., Malik, I., et al.: Table detection using deep learning. In: 14th IAPR International Conference on Document Analysis and Recognition, pp. 771–776. IEEE, Kyoto (2017) Gilani, A., Qasim, S.R., Malik, I., et al.: Table detection using deep learning. In: 14th IAPR International Conference on Document Analysis and Recognition, pp. 771–776. IEEE, Kyoto (2017)
30.
go back to reference Tuan, T.A., Oh, K., Na, I.S., et al.: A robust system for document layout analysis using multilevel homogeneity structure. Expert Syst. Appl. 85(1), 99–113 (2017) Tuan, T.A., Oh, K., Na, I.S., et al.: A robust system for document layout analysis using multilevel homogeneity structure. Expert Syst. Appl. 85(1), 99–113 (2017)
31.
go back to reference Oliveira, D.A.B., Viana, M.P.: Fast CNN-based document layout analysis. In: 2017 IEEE Conference on International Conference on Computer Vision, pp. 1173–1180. IEEE, Venice (2017) Oliveira, D.A.B., Viana, M.P.: Fast CNN-based document layout analysis. In: 2017 IEEE Conference on International Conference on Computer Vision, pp. 1173–1180. IEEE, Venice (2017)
33.
go back to reference Xie, S., Girshick, R., Dollar, P., et al.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500. IEEE, Honolulu (2017) Xie, S., Girshick, R., Dollar, P., et al.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500. IEEE, Honolulu (2017)
34.
go back to reference Yang, M., Yu, K., Zhang, C., et al.: DenseASPP for semantic segmentation in street scenes. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3684–3692. IEEE, Salt Lake City (2018) Yang, M., Yu, K., Zhang, C., et al.: DenseASPP for semantic segmentation in street scenes. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3684–3692. IEEE, Salt Lake City (2018)
Metadata
Title
MSNet: A Multi-scale Segmentation Network for Documents Layout Analysis
Authors
Bo Wang
Ju Zhou
Bailing Zhang
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-66906-5_21

Premium Partner