Skip to main content
Top

2021 | OriginalPaper | Chapter

LSTMVAEF: Vivid Layout via LSTM-Based Variational Autoencoder Framework

Authors : Jie He, Xingjiao Wu, Wenxin Hu, Jing Yang

Published in: Document Analysis and Recognition – ICDAR 2021

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The lack of training data is still a challenge in the Document Layout Analysis task (DLA). Synthetic data is an effective way to tackle this challenge. In this paper, we propose an LSTM-based Variational Autoencoder framework (LSTMVAF) to synthesize layouts for DLA. Compared with the previous method, our method can generate more complicated layouts and only need training data from DLA without extra annotation. We use LSTM models as basic models to learn the potential representing of class and position information of elements within a page. It is worth mentioning that we design a weight adaptation strategy to help model train faster. The experiment shows our model can generate more vivid layouts that only need a few real document pages.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017) Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)
3.
go back to reference Clark, C.A., Divvala, S.: Looking beyond text: extracting figures, tables and captions from computer science papers. In: AAAI (2015) Clark, C.A., Divvala, S.: Looking beyond text: extracting figures, tables and captions from computer science papers. In: AAAI (2015)
4.
5.
go back to reference Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (1999)CrossRef Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (1999)CrossRef
6.
go back to reference Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS (2014) Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS (2014)
7.
go back to reference Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2013) Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2013)
8.
go back to reference Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: NIPS, pp. 5767–5777 (2017) Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: NIPS, pp. 5767–5777 (2017)
9.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
10.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
11.
go back to reference Huang, R., Zhang, S., Li, T., He, R.: Beyond face rotation: global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In: ICCV, pp. 2439–2448 (2017) Huang, R., Zhang, S., Li, T., He, R.: Beyond face rotation: global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In: ICCV, pp. 2439–2448 (2017)
12.
go back to reference Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017) Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR, pp. 1125–1134 (2017)
13.
go back to reference Jia, X., Gavves, E., Fernando, B., Tuytelaars, T.: Guiding the long-short term memory model for image caption generation. In: ICCV, pp. 2407–2415 (2015) Jia, X., Gavves, E., Fernando, B., Tuytelaars, T.: Guiding the long-short term memory model for image caption generation. In: ICCV, pp. 2407–2415 (2015)
14.
go back to reference Jyothi, A.A., Durand, T., He, J., Sigal, L., Mori, G.: Layoutvae: stochastic scene layout generation from a label set. In: ICCV, pp. 9895–9904 (2019) Jyothi, A.A., Durand, T., He, J., Sigal, L., Mori, G.: Layoutvae: stochastic scene layout generation from a label set. In: ICCV, pp. 9895–9904 (2019)
15.
go back to reference Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014) Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
16.
go back to reference Li, J., Yang, J., Hertzmann, A., Zhang, J., Xu, T.: Layoutgan: generating graphic layouts with wireframe discriminators. In: ICLR (2019) Li, J., Yang, J., Hertzmann, A., Zhang, J., Xu, T.: Layoutgan: generating graphic layouts with wireframe discriminators. In: ICLR (2019)
17.
go back to reference Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: CVPR, pp. 12174–12182 (2019) Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: CVPR, pp. 12174–12182 (2019)
18.
go back to reference Mehri, M., Nayef, N., Héroux, P., Gomez-Krämer, P., Mullot, R.: Learning texture features for enhancement and segmentation of historical document images. In: ICDAR, pp. 47–54 (2015) Mehri, M., Nayef, N., Héroux, P., Gomez-Krämer, P., Mullot, R.: Learning texture features for enhancement and segmentation of historical document images. In: ICDAR, pp. 47–54 (2015)
19.
go back to reference Patil, A.G., Ben-Eliezer, O., Perel, O., Averbuch-Elor, H.: READ: recursive autoencoders for document layout generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2316–2325 (2020) Patil, A.G., Ben-Eliezer, O., Perel, O., Averbuch-Elor, H.: READ: recursive autoencoders for document layout generation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2316–2325 (2020)
20.
go back to reference Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2015) Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2015)
21.
go back to reference Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music. In: ICML (2018) Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music. In: ICML (2018)
22.
go back to reference Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: Mocogan: decomposing motion and content for video generation. In: CVPR, pp. 1526–1535 (2018) Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: Mocogan: decomposing motion and content for video generation. In: CVPR, pp. 1526–1535 (2018)
23.
go back to reference Wang, T., Wan, X.: T-CVAE: transformer-based conditioned variational autoencoder for story completion. In: IJCAI, pp. 5233–5239 (2019) Wang, T., Wan, X.: T-CVAE: transformer-based conditioned variational autoencoder for story completion. In: IJCAI, pp. 5233–5239 (2019)
25.
go back to reference Wu, X., Hu, Z., Du, X., Yang, J., He, L.: Document layout analysis via dynamic residual feature fusion. In: ICME (2021) Wu, X., Hu, Z., Du, X., Yang, J., He, L.: Document layout analysis via dynamic residual feature fusion. In: ICME (2021)
26.
go back to reference Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: CVPR, pp. 5315–5324 (2017) Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Lee Giles, C.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: CVPR, pp. 5315–5324 (2017)
27.
go back to reference Yeh, R.A., Chen, C., Yian Lim, T., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models. In: CVPR, pp. 5485–5493 (2017) Yeh, R.A., Chen, C., Yian Lim, T., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models. In: CVPR, pp. 5485–5493 (2017)
28.
go back to reference Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: CVPR, pp. 4694–4702 (2015) Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: CVPR, pp. 4694–4702 (2015)
29.
go back to reference Zheng, X., Qiao, X., Cao, Y., Lau, R.W.: Content-aware generative modeling of graphic design layouts. ACM Trans. Graph. 38(4), 1–15 (2019)CrossRef Zheng, X., Qiao, X., Cao, Y., Lau, R.W.: Content-aware generative modeling of graphic design layouts. ACM Trans. Graph. 38(4), 1–15 (2019)CrossRef
30.
go back to reference Zheng, Y., Kong, S., Zhu, W., Ye, H.: Scalable document image information extraction with application to domain-specific analysis. In: IEEE International Conference on Big Data (2019) Zheng, Y., Kong, S., Zhu, W., Ye, H.: Scalable document image information extraction with application to domain-specific analysis. In: IEEE International Conference on Big Data (2019)
31.
go back to reference Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017) Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2223–2232 (2017)
Metadata
Title
LSTMVAEF: Vivid Layout via LSTM-Based Variational Autoencoder Framework
Authors
Jie He
Xingjiao Wu
Wenxin Hu
Jing Yang
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-86331-9_12

Premium Partner