Skip to main content
Top
Published in: International Journal of Computer Vision 11/2023

01-07-2023

Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding

Authors: Wenfeng Song, Xinyu Zhang, Yuting Guo, Shuai Li, Aimin Hao, Hong Qin

Published in: International Journal of Computer Vision | Issue 11/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Although novel 3D animation techniques could be boosted by a large variety of deep learning methods, flexible automatic 3D applications (involving animated figures such as humans and low-life animals) are still rarely studied in 3D computer vision. This is due to lacking of arbitrary 3D data acquisition environment, especially those involving human populated scenes. Given a single image, the 3D animation aided by contextual inference is still plagued by limited reconstruction clues without prior knowledge pertinent to the identified figures/objects and/or their possible relationship w.r.t. the environment. To alleviate such difficulty in time-varying 3D animation, we devise a dynamic scene creation framework via a dynamic knowledge graph (DKG). The DKG encodes both temporal and spatial contextual clues to enable and facilitate human interactions with the affordance environment. Furthermore, we construct the DKG-driven variational auto-encoder (DVAE) upon animation kinematics knowledge conveyed by meta-motion sequences, which are disentangled from videos of prior scenes. It is then possible to utilize the DKG to induce the animations in certain scenes, thus, we could automatically and physically generate plausible 3D animations that afford vivid interactions among humans, low-and life animals in the environment. The extensive experimental results and comprehensive evaluations confirm our DKGs’ representation and modeling power towards new animation production in 3D graphics and vision applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Andriluka, M., Iqbal, U., Milan, A., et al. (2018). PoseTrack: A benchmark for human pose estimation and tracking. In CVPR (pp. 5167–5176). Andriluka, M., Iqbal, U., Milan, A., et al. (2018). PoseTrack: A benchmark for human pose estimation and tracking. In CVPR (pp. 5167–5176).
go back to reference Arnab, A., Doersch, C. & Zisserman, A. (2019). Exploiting temporal context for 3D human pose estimation in the wild. In CVPR (pp. 3390–3399). Arnab, A., Doersch, C. & Zisserman, A. (2019). Exploiting temporal context for 3D human pose estimation in the wild. In CVPR (pp. 3390–3399).
go back to reference Au, O.K.-C., Tai, C.-L., Chu, H.-K., et al. (2008). Skeleton extraction by mesh contraction. TOG, 27(3), 1–10.CrossRef Au, O.K.-C., Tai, C.-L., Chu, H.-K., et al. (2008). Skeleton extraction by mesh contraction. TOG, 27(3), 1–10.CrossRef
go back to reference Bhatnagar, B. L., Xie, X., Petrov, I. A., Sminchisescu, C., Theobalt, C. & Pons-Moll, G. (2022). Behave: Dataset and method for tracking human object interactions. In CVPR (pp. 15935–15946). Bhatnagar, B. L., Xie, X., Petrov, I. A., Sminchisescu, C., Theobalt, C. & Pons-Moll, G. (2022). Behave: Dataset and method for tracking human object interactions. In CVPR (pp. 15935–15946).
go back to reference Cao, J., Pang, Y., Xie, J., et al. (2021). From handcrafted to deep features for pedestrian detection: A survey. TPAMI, 44(9), 4913–4934.CrossRef Cao, J., Pang, Y., Xie, J., et al. (2021). From handcrafted to deep features for pedestrian detection: A survey. TPAMI, 44(9), 4913–4934.CrossRef
go back to reference Carreira, J. & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In CVPR (pp. 4724–4733). Carreira, J. & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In CVPR (pp. 4724–4733).
go back to reference Chao, Y.-W., Liu, Y., Liu, X., et al. (2018). Learning to detect human-object interactions. In WACV (pp. 381–389). Chao, Y.-W., Liu, Y., Liu, X., et al. (2018). Learning to detect human-object interactions. In WACV (pp. 381–389).
go back to reference Chen, T., Yu, W., Chen, R., et al. (2019). Knowledge-embedded routing network for scene graph generation. In CVPR (pp. 6156–6164). Chen, T., Yu, W., Chen, R., et al. (2019). Knowledge-embedded routing network for scene graph generation. In CVPR (pp. 6156–6164).
go back to reference Cho, K., Van Merriënboer, B., Gulcehre, C., et al. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP. Cho, K., Van Merriënboer, B., Gulcehre, C., et al. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP.
go back to reference Choi, H., Moon, G., & Lee, K. M. (2021). Beyond static features for temporally consistent 3D human pose and shape from a video. In CVPR (pp. 1964–1973). Choi, H., Moon, G., & Lee, K. M. (2021). Beyond static features for temporally consistent 3D human pose and shape from a video. In CVPR (pp. 1964–1973).
go back to reference Dhamo, H., Farshad, A., & Laina, I., et al. (2020). Semantic image manipulation using scene graphs. In CVPR (pp. 5212–5221). Dhamo, H., Farshad, A., & Laina, I., et al. (2020). Semantic image manipulation using scene graphs. In CVPR (pp. 5212–5221).
go back to reference Dhiman, C., & Vishwakarma, D. K. (2020). View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. TIP, 29, 3835–3844.MATH Dhiman, C., & Vishwakarma, D. K. (2020). View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. TIP, 29, 3835–3844.MATH
go back to reference Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680). Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).
go back to reference Guo, C., Zuo, X., Wang, S., Zou, S., Sun, Q., Deng, A., Gong, M. & Cheng, L. (2020). Action2motion: Conditioned generation of 3D human motions. In ACM MM (pp. 2021–2029). Guo, C., Zuo, X., Wang, S., Zou, S., Sun, Q., Deng, A., Gong, M. & Cheng, L. (2020). Action2motion: Conditioned generation of 3D human motions. In ACM MM (pp. 2021–2029).
go back to reference Guo, Y., Cai, L., & Zhang, J. (2021). 3D face from X: Learning face shape from diverse sources. TIP, 30, 3815–3827. Guo, Y., Cai, L., & Zhang, J. (2021). 3D face from X: Learning face shape from diverse sources. TIP, 30, 3815–3827.
go back to reference Harvey, F. G., Yurick, M., Nowrouzezahrai, D., et al. (2020). Robust motion in-betweening. TOG, 39(4), 60.CrossRef Harvey, F. G., Yurick, M., Nowrouzezahrai, D., et al. (2020). Robust motion in-betweening. TOG, 39(4), 60.CrossRef
go back to reference Hassan, M., Choutas, V., Tzionas, D., et al. (2019). Resolving 3D human pose ambiguities with 3D scene constraints. In ICCV (pp. 2282–2292). Hassan, M., Choutas, V., Tzionas, D., et al. (2019). Resolving 3D human pose ambiguities with 3D scene constraints. In ICCV (pp. 2282–2292).
go back to reference Hastie, T., Tibshirani, R., Friedman, J. H., et al. (2004). The elements of statistical learning: Data mining, inference, and prediction. In ASA. Hastie, T., Tibshirani, R., Friedman, J. H., et al. (2004). The elements of statistical learning: Data mining, inference, and prediction. In ASA.
go back to reference He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778). He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
go back to reference Herzig, R., Raboh, M., Chechik, G., et al. (2018). Mapping images to scene graphs with permutation-invariant structured prediction. In NIPS (pp. 7211–7221). Herzig, R., Raboh, M., Chechik, G., et al. (2018). Mapping images to scene graphs with permutation-invariant structured prediction. In NIPS (pp. 7211–7221).
go back to reference Ionescu, C., Papava, D., & Olaru, V. (2014). Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 36(7), 1325–1339.CrossRef Ionescu, C., Papava, D., & Olaru, V. (2014). Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 36(7), 1325–1339.CrossRef
go back to reference Jiang, Y.-G., Dai, Q., Liu, W., et al. (2015). Human action recognition in unconstrained videos by explicit motion modeling. TIP, 24(11), 3781–3795.MathSciNetMATH Jiang, Y.-G., Dai, Q., Liu, W., et al. (2015). Human action recognition in unconstrained videos by explicit motion modeling. TIP, 24(11), 3781–3795.MathSciNetMATH
go back to reference Ji, Z., Liu, X., Pang, Y., et al. (2021). Few-shot human-object interaction recognition with semantic-guided attentive prototypes network. TIP, 30, 1648–1661. Ji, Z., Liu, X., Pang, Y., et al. (2021). Few-shot human-object interaction recognition with semantic-guided attentive prototypes network. TIP, 30, 1648–1661.
go back to reference Ji, S., Pan, S., Cambria, E., Marttinen, P., & Philip, S. Y. (2021). A survey on knowledge graphs: Representation acquisition and applications. TNNLS, 33(2), 1–21.MathSciNet Ji, S., Pan, S., Cambria, E., Marttinen, P., & Philip, S. Y. (2021). A survey on knowledge graphs: Representation acquisition and applications. TNNLS, 33(2), 1–21.MathSciNet
go back to reference Johnson, J., Gupta, A. & Fei-Fei, L. (2018). Image generation from scene graphs. In CVPR (pp. 1219–1228). Johnson, J., Gupta, A. & Fei-Fei, L. (2018). Image generation from scene graphs. In CVPR (pp. 1219–1228).
go back to reference Johnson, J., Krishna, R., Stark, M., et al. (2015). Image retrieval using scene graphs. In CVPR (pp. 3668–3678). Johnson, J., Krishna, R., Stark, M., et al. (2015). Image retrieval using scene graphs. In CVPR (pp. 3668–3678).
go back to reference Kanazawa, A., Black, M. J., Jacobs, D. W., et al. (2018). End-to-end recovery of human shape and pose. In CVPR (pp. 7122–7131). Kanazawa, A., Black, M. J., Jacobs, D. W., et al. (2018). End-to-end recovery of human shape and pose. In CVPR (pp. 7122–7131).
go back to reference Kanazawa, A., Zhang, J. Y., Felsen, P., et al. (2019). Learning 3D human dynamics from video. In CVPR (pp. 5607–5616). Kanazawa, A., Zhang, J. Y., Felsen, P., et al. (2019). Learning 3D human dynamics from video. In CVPR (pp. 5607–5616).
go back to reference Kocabas, M., Athanasiou, N. & Black, M. J. (2020). VIBE: Video inference for human body pose and shape estimation. In CVPR (pp. 5252–5262). Kocabas, M., Athanasiou, N. & Black, M. J. (2020). VIBE: Video inference for human body pose and shape estimation. In CVPR (pp. 5252–5262).
go back to reference Kolotouros, N., Pavlakos, G., & Black, M. J., et al. (2019). Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In ICCV (pp. 2252–2261). Kolotouros, N., Pavlakos, G., & Black, M. J., et al. (2019). Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In ICCV (pp. 2252–2261).
go back to reference Kolotouros, N., Pavlakos, G., Jayaraman, D., et al. (2021). Probabilistic modeling for human mesh recovery. In ICCV (pp. 11605–11614). Kolotouros, N., Pavlakos, G., Jayaraman, D., et al. (2021). Probabilistic modeling for human mesh recovery. In ICCV (pp. 11605–11614).
go back to reference Kong, Y., & Fu, Y. (2016). Close human interaction recognition using patch-aware models. TIP, 25(1), 167–178.MathSciNetMATH Kong, Y., & Fu, Y. (2016). Close human interaction recognition using patch-aware models. TIP, 25(1), 167–178.MathSciNetMATH
go back to reference Li, B., Tian, J., et al. (2021). Multitask non-autoregressive model for human motion prediction. TIP, 30, 2562–2574. Li, B., Tian, J., et al. (2021). Multitask non-autoregressive model for human motion prediction. TIP, 30, 2562–2574.
go back to reference Li, Y., Ouyang, W., Zhou, B., et al. (2017). Scene graph generation from objects, phrases and region captions. In ICCV (pp. 1270–1279). Li, Y., Ouyang, W., Zhou, B., et al. (2017). Scene graph generation from objects, phrases and region captions. In ICCV (pp. 1270–1279).
go back to reference Li, Y., Ouyang, W., Zhou, B., Shi, J., Zhang, C. & Wang, X. (2018). Factorizable net: An efficient subgraph-based framework for scene graph generation. In ECCV (pp. 335–351). Li, Y., Ouyang, W., Zhou, B., Shi, J., Zhang, C. & Wang, X. (2018). Factorizable net: An efficient subgraph-based framework for scene graph generation. In ECCV (pp. 335–351).
go back to reference Li, Y.-L., Xu, L., Liu, X., et al. (2020). PaStaNet: Toward human activity knowledge engine. In CVPR (pp. 379–388). Li, Y.-L., Xu, L., Liu, X., et al. (2020). PaStaNet: Toward human activity knowledge engine. In CVPR (pp. 379–388).
go back to reference Loper, M., Mahmood, N., Romero, J., et al. (2015). SMPL: A skinned multi-person linear model. TOG, 34(6), 1–16.CrossRef Loper, M., Mahmood, N., Romero, J., et al. (2015). SMPL: A skinned multi-person linear model. TOG, 34(6), 1–16.CrossRef
go back to reference Mahmood, N., Ghorbani, N., Troje, N. F., et al. (2019). AMASS: Archive of motion capture as surface shapes. In ICCV (pp. 5441–5450). Mahmood, N., Ghorbani, N., Troje, N. F., et al. (2019). AMASS: Archive of motion capture as surface shapes. In ICCV (pp. 5441–5450).
go back to reference Mehta, D., Rhodin, H., Casas, D., et al. (2017). Monocular 3D human pose estimation in the wild using improved CNN supervision. In 3DV (pp. 506–516). Mehta, D., Rhodin, H., Casas, D., et al. (2017). Monocular 3D human pose estimation in the wild using improved CNN supervision. In 3DV (pp. 506–516).
go back to reference Pavlakos, G., Choutas, V., Ghorbani, N., et al. (2019). Expressive body capture: 3D hands, face, and body from a single image. In CVPR (pp. 10967–10977). Pavlakos, G., Choutas, V., Ghorbani, N., et al. (2019). Expressive body capture: 3D hands, face, and body from a single image. In CVPR (pp. 10967–10977).
go back to reference Pavlakos, G., Malik, J., & Kanazawa, A. (2022). Human mesh recovery from multiple shots. In CVPR (pp. 1485–1495). Pavlakos, G., Malik, J., & Kanazawa, A. (2022). Human mesh recovery from multiple shots. In CVPR (pp. 1485–1495).
go back to reference Peng, X. B., Kanazawa, A., Malik, J., et al. (2018). SFV: Reinforcement learning of physical skills from videos. TOG, 37(6), 178–117814.CrossRef Peng, X. B., Kanazawa, A., Malik, J., et al. (2018). SFV: Reinforcement learning of physical skills from videos. TOG, 37(6), 178–117814.CrossRef
go back to reference Petrovich, M., Black, M. J. & Varol, G. (2021). Action-conditioned 3D human motion synthesis with transformer VAE. In ICCV (pp. 10985–10995). Petrovich, M., Black, M. J. & Varol, G. (2021). Action-conditioned 3D human motion synthesis with transformer VAE. In ICCV (pp. 10985–10995).
go back to reference Punnakkal, A. R., Chandrasekaran, A., Athanasiou, N., et al. (2021). BABEL: Bodies, action and behavior with english labels. In CVPR (pp. 722–731). Punnakkal, A. R., Chandrasekaran, A., Athanasiou, N., et al. (2021). BABEL: Bodies, action and behavior with english labels. In CVPR (pp. 722–731).
go back to reference Sohn, K., Lee, H. & Yan, X. (2015). Learning structured output representation using deep conditional generative models. In NIPS (pp. 3483–3491). Sohn, K., Lee, H. & Yan, X. (2015). Learning structured output representation using deep conditional generative models. In NIPS (pp. 3483–3491).
go back to reference Song, W., Li, S., Liu, J., Qin, H., Zhang, B., Zhang, S., & Hao, A. (2018). Multitask cascade convolution neural networks for automatic thyroid nodule detection and recognition. IEEE Journal of Biomedical Health Information, 23(3), 1215–1224.CrossRef Song, W., Li, S., Liu, J., Qin, H., Zhang, B., Zhang, S., & Hao, A. (2018). Multitask cascade convolution neural networks for automatic thyroid nodule detection and recognition. IEEE Journal of Biomedical Health Information, 23(3), 1215–1224.CrossRef
go back to reference Song, W., Hou, X. & Li, S., et al. (2022). An intelligent virtual standard patient for medical students training based on oral knowledge graph. In IEEE TMM (pp. 1–14). Song, W., Hou, X. & Li, S., et al. (2022). An intelligent virtual standard patient for medical students training based on oral knowledge graph. In IEEE TMM (pp. 1–14).
go back to reference Song, W., Wang, X., Gao, Y., Hao, A., & Hou, X. (2022). Real-time expressive avatar animation generation based on monocular videos. In ISMAR-Adjunct (pp. 429–434). Song, W., Wang, X., Gao, Y., Hao, A., & Hou, X. (2022). Real-time expressive avatar animation generation based on monocular videos. In ISMAR-Adjunct (pp. 429–434).
go back to reference Song, W., Zhang, X., Gao, Y., Luo, Y., Wang, H., Wang, X. & Hou, X. (2023). UPSR: A unified proxy skeleton retargeting method for heterogeneous avatar animation. In VRW (pp. 867–868). Song, W., Zhang, X., Gao, Y., Luo, Y., Wang, H., Wang, X. & Hou, X. (2023). UPSR: A unified proxy skeleton retargeting method for heterogeneous avatar animation. In VRW (pp. 867–868).
go back to reference Von Marcard, T., Pons-Moll, G., & Rosenhahn, B. (2016). Human pose estimation from video and IMUs. TPAMI, 38(8), 1533–1547.CrossRef Von Marcard, T., Pons-Moll, G., & Rosenhahn, B. (2016). Human pose estimation from video and IMUs. TPAMI, 38(8), 1533–1547.CrossRef
go back to reference Von Marcard, T., Henschel, R., Black, M., et al. (2018). Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In ECCV (pp. 614–631). Von Marcard, T., Henschel, R., Black, M., et al. (2018). Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In ECCV (pp. 614–631).
go back to reference Xu, B., Wong, Y., Li, J., et al. (2019). Learning to detect human-object interactions with knowledge. In CVPR (pp. 2019–2028). Xu, B., Wong, Y., Li, J., et al. (2019). Learning to detect human-object interactions with knowledge. In CVPR (pp. 2019–2028).
go back to reference Xu, D., Zhu, Y., Choy, C. B., et al. (2017). Scene graph generation by iterative message passing. In CVPR (pp. 3097–3106). Xu, D., Zhu, Y., Choy, C. B., et al. (2017). Scene graph generation by iterative message passing. In CVPR (pp. 3097–3106).
go back to reference Xu, X., Chen, H., Moreno-Noguer, F., et al. (2021). 3D human pose, shape and texture from low-resolution images and videos. TPAMI, 44(9), 4490–4504. Xu, X., Chen, H., Moreno-Noguer, F., et al. (2021). 3D human pose, shape and texture from low-resolution images and videos. TPAMI, 44(9), 4490–4504.
go back to reference Xu, X. & Li, B. (2009). Exploiting motion correlations in 3-D articulated human motion tracking. TIP,18(6). Xu, X. & Li, B. (2009). Exploiting motion correlations in 3-D articulated human motion tracking. TIP,18(6).
go back to reference Yuan, Y. & Kitani, K. M. (2020). DLow: Diversifying latent flows for diverse human motion prediction. In ECCV (pp. 346–364). Yuan, Y. & Kitani, K. M. (2020). DLow: Diversifying latent flows for diverse human motion prediction. In ECCV (pp. 346–364).
go back to reference Zhang, H., Tian, Y., Zhou, X., Ouyang, W., Liu, Y., Wang, L. & Sun, Z. (2021). Pymaf: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In ICCV (pp. 11446–11456). Zhang, H., Tian, Y., Zhou, X., Ouyang, W., Liu, Y., Wang, L. & Sun, Z. (2021). Pymaf: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In ICCV (pp. 11446–11456).
go back to reference Zhang, W., Zhu, M. & Derpanis, K. G. (2013). From actemes to action: A strongly-supervised representation for detailed action understanding. In ICCV (pp. 2248–2255). Zhang, W., Zhu, M. & Derpanis, K. G. (2013). From actemes to action: A strongly-supervised representation for detailed action understanding. In ICCV (pp. 2248–2255).
go back to reference Zhang, Y., Hassan, M., & Neumann, H., et al. (2020). Generating 3D people in scenes without people. In CVPR (pp. 6193–6203). Zhang, Y., Hassan, M., & Neumann, H., et al. (2020). Generating 3D people in scenes without people. In CVPR (pp. 6193–6203).
go back to reference Zhang, Y., Zhou, W., Wang, M., et al. (2021). Deep relation embedding for cross-modal retrieval. TIP, 30, 617–627. Zhang, Y., Zhou, W., Wang, M., et al. (2021). Deep relation embedding for cross-modal retrieval. TIP, 30, 617–627.
go back to reference Zollhöfer, M., Thies, J., Garrido, P., et al. (2018). State of the art on monocular 3d face reconstruction, tracking, and applications. CGF37. Zollhöfer, M., Thies, J., Garrido, P., et al. (2018). State of the art on monocular 3d face reconstruction, tracking, and applications. CGF37.
go back to reference Zou, C., Wang, B., Hu, Y., et al. (2021). End-to-end human object interaction detection with HOI transformer. In CVPR (pp. 11825–11834). Zou, C., Wang, B., Hu, Y., et al. (2021). End-to-end human object interaction detection with HOI transformer. In CVPR (pp. 11825–11834).
go back to reference Zou, S., Zuo, X., Wang, S., et al. (2021). Detailed avatar recovery from single image. TPAMI, PP(11), 7363–7379. Zou, S., Zuo, X., Wang, S., et al. (2021). Detailed avatar recovery from single image. TPAMI, PP(11), 7363–7379.
go back to reference Zuffi, S., Kanazawa, A. & Black, M. J. (2018). Lions and tigers and bears: Capturing non-rigid, 3D, articulated shape from images. In CVPR (pp. 3955–3963). Zuffi, S., Kanazawa, A. & Black, M. J. (2018). Lions and tigers and bears: Capturing non-rigid, 3D, articulated shape from images. In CVPR (pp. 3955–3963).
Metadata
Title
Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding
Authors
Wenfeng Song
Xinyu Zhang
Yuting Guo
Shuai Li
Aimin Hao
Hong Qin
Publication date
01-07-2023
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 11/2023
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-023-01839-1

Other articles of this Issue 11/2023

International Journal of Computer Vision 11/2023 Go to the issue

S.I. : Traditional Computer Vision in the Age of Deep Learning

Improving Domain Adaptation Through Class Aware Frequency Transformation

Premium Partner