Skip to main content
Erschienen in: International Journal of Computer Vision 11/2023

01.07.2023

Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding

verfasst von: Wenfeng Song, Xinyu Zhang, Yuting Guo, Shuai Li, Aimin Hao, Hong Qin

Erschienen in: International Journal of Computer Vision | Ausgabe 11/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Although novel 3D animation techniques could be boosted by a large variety of deep learning methods, flexible automatic 3D applications (involving animated figures such as humans and low-life animals) are still rarely studied in 3D computer vision. This is due to lacking of arbitrary 3D data acquisition environment, especially those involving human populated scenes. Given a single image, the 3D animation aided by contextual inference is still plagued by limited reconstruction clues without prior knowledge pertinent to the identified figures/objects and/or their possible relationship w.r.t. the environment. To alleviate such difficulty in time-varying 3D animation, we devise a dynamic scene creation framework via a dynamic knowledge graph (DKG). The DKG encodes both temporal and spatial contextual clues to enable and facilitate human interactions with the affordance environment. Furthermore, we construct the DKG-driven variational auto-encoder (DVAE) upon animation kinematics knowledge conveyed by meta-motion sequences, which are disentangled from videos of prior scenes. It is then possible to utilize the DKG to induce the animations in certain scenes, thus, we could automatically and physically generate plausible 3D animations that afford vivid interactions among humans, low-and life animals in the environment. The extensive experimental results and comprehensive evaluations confirm our DKGs’ representation and modeling power towards new animation production in 3D graphics and vision applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Andriluka, M., Iqbal, U., Milan, A., et al. (2018). PoseTrack: A benchmark for human pose estimation and tracking. In CVPR (pp. 5167–5176). Andriluka, M., Iqbal, U., Milan, A., et al. (2018). PoseTrack: A benchmark for human pose estimation and tracking. In CVPR (pp. 5167–5176).
Zurück zum Zitat Arnab, A., Doersch, C. & Zisserman, A. (2019). Exploiting temporal context for 3D human pose estimation in the wild. In CVPR (pp. 3390–3399). Arnab, A., Doersch, C. & Zisserman, A. (2019). Exploiting temporal context for 3D human pose estimation in the wild. In CVPR (pp. 3390–3399).
Zurück zum Zitat Au, O.K.-C., Tai, C.-L., Chu, H.-K., et al. (2008). Skeleton extraction by mesh contraction. TOG, 27(3), 1–10.CrossRef Au, O.K.-C., Tai, C.-L., Chu, H.-K., et al. (2008). Skeleton extraction by mesh contraction. TOG, 27(3), 1–10.CrossRef
Zurück zum Zitat Bhatnagar, B. L., Xie, X., Petrov, I. A., Sminchisescu, C., Theobalt, C. & Pons-Moll, G. (2022). Behave: Dataset and method for tracking human object interactions. In CVPR (pp. 15935–15946). Bhatnagar, B. L., Xie, X., Petrov, I. A., Sminchisescu, C., Theobalt, C. & Pons-Moll, G. (2022). Behave: Dataset and method for tracking human object interactions. In CVPR (pp. 15935–15946).
Zurück zum Zitat Cao, J., Pang, Y., Xie, J., et al. (2021). From handcrafted to deep features for pedestrian detection: A survey. TPAMI, 44(9), 4913–4934.CrossRef Cao, J., Pang, Y., Xie, J., et al. (2021). From handcrafted to deep features for pedestrian detection: A survey. TPAMI, 44(9), 4913–4934.CrossRef
Zurück zum Zitat Carreira, J. & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In CVPR (pp. 4724–4733). Carreira, J. & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In CVPR (pp. 4724–4733).
Zurück zum Zitat Chao, Y.-W., Liu, Y., Liu, X., et al. (2018). Learning to detect human-object interactions. In WACV (pp. 381–389). Chao, Y.-W., Liu, Y., Liu, X., et al. (2018). Learning to detect human-object interactions. In WACV (pp. 381–389).
Zurück zum Zitat Chen, T., Yu, W., Chen, R., et al. (2019). Knowledge-embedded routing network for scene graph generation. In CVPR (pp. 6156–6164). Chen, T., Yu, W., Chen, R., et al. (2019). Knowledge-embedded routing network for scene graph generation. In CVPR (pp. 6156–6164).
Zurück zum Zitat Cho, K., Van Merriënboer, B., Gulcehre, C., et al. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP. Cho, K., Van Merriënboer, B., Gulcehre, C., et al. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP.
Zurück zum Zitat Choi, H., Moon, G., & Lee, K. M. (2021). Beyond static features for temporally consistent 3D human pose and shape from a video. In CVPR (pp. 1964–1973). Choi, H., Moon, G., & Lee, K. M. (2021). Beyond static features for temporally consistent 3D human pose and shape from a video. In CVPR (pp. 1964–1973).
Zurück zum Zitat Dhamo, H., Farshad, A., & Laina, I., et al. (2020). Semantic image manipulation using scene graphs. In CVPR (pp. 5212–5221). Dhamo, H., Farshad, A., & Laina, I., et al. (2020). Semantic image manipulation using scene graphs. In CVPR (pp. 5212–5221).
Zurück zum Zitat Dhiman, C., & Vishwakarma, D. K. (2020). View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. TIP, 29, 3835–3844.MATH Dhiman, C., & Vishwakarma, D. K. (2020). View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. TIP, 29, 3835–3844.MATH
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680). Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).
Zurück zum Zitat Guo, C., Zuo, X., Wang, S., Zou, S., Sun, Q., Deng, A., Gong, M. & Cheng, L. (2020). Action2motion: Conditioned generation of 3D human motions. In ACM MM (pp. 2021–2029). Guo, C., Zuo, X., Wang, S., Zou, S., Sun, Q., Deng, A., Gong, M. & Cheng, L. (2020). Action2motion: Conditioned generation of 3D human motions. In ACM MM (pp. 2021–2029).
Zurück zum Zitat Guo, Y., Cai, L., & Zhang, J. (2021). 3D face from X: Learning face shape from diverse sources. TIP, 30, 3815–3827. Guo, Y., Cai, L., & Zhang, J. (2021). 3D face from X: Learning face shape from diverse sources. TIP, 30, 3815–3827.
Zurück zum Zitat Harvey, F. G., Yurick, M., Nowrouzezahrai, D., et al. (2020). Robust motion in-betweening. TOG, 39(4), 60.CrossRef Harvey, F. G., Yurick, M., Nowrouzezahrai, D., et al. (2020). Robust motion in-betweening. TOG, 39(4), 60.CrossRef
Zurück zum Zitat Hassan, M., Choutas, V., Tzionas, D., et al. (2019). Resolving 3D human pose ambiguities with 3D scene constraints. In ICCV (pp. 2282–2292). Hassan, M., Choutas, V., Tzionas, D., et al. (2019). Resolving 3D human pose ambiguities with 3D scene constraints. In ICCV (pp. 2282–2292).
Zurück zum Zitat Hastie, T., Tibshirani, R., Friedman, J. H., et al. (2004). The elements of statistical learning: Data mining, inference, and prediction. In ASA. Hastie, T., Tibshirani, R., Friedman, J. H., et al. (2004). The elements of statistical learning: Data mining, inference, and prediction. In ASA.
Zurück zum Zitat He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778). He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In CVPR (pp. 770–778).
Zurück zum Zitat Herzig, R., Raboh, M., Chechik, G., et al. (2018). Mapping images to scene graphs with permutation-invariant structured prediction. In NIPS (pp. 7211–7221). Herzig, R., Raboh, M., Chechik, G., et al. (2018). Mapping images to scene graphs with permutation-invariant structured prediction. In NIPS (pp. 7211–7221).
Zurück zum Zitat Ionescu, C., Papava, D., & Olaru, V. (2014). Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 36(7), 1325–1339.CrossRef Ionescu, C., Papava, D., & Olaru, V. (2014). Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 36(7), 1325–1339.CrossRef
Zurück zum Zitat Jiang, Y.-G., Dai, Q., Liu, W., et al. (2015). Human action recognition in unconstrained videos by explicit motion modeling. TIP, 24(11), 3781–3795.MathSciNetMATH Jiang, Y.-G., Dai, Q., Liu, W., et al. (2015). Human action recognition in unconstrained videos by explicit motion modeling. TIP, 24(11), 3781–3795.MathSciNetMATH
Zurück zum Zitat Ji, Z., Liu, X., Pang, Y., et al. (2021). Few-shot human-object interaction recognition with semantic-guided attentive prototypes network. TIP, 30, 1648–1661. Ji, Z., Liu, X., Pang, Y., et al. (2021). Few-shot human-object interaction recognition with semantic-guided attentive prototypes network. TIP, 30, 1648–1661.
Zurück zum Zitat Ji, S., Pan, S., Cambria, E., Marttinen, P., & Philip, S. Y. (2021). A survey on knowledge graphs: Representation acquisition and applications. TNNLS, 33(2), 1–21.MathSciNet Ji, S., Pan, S., Cambria, E., Marttinen, P., & Philip, S. Y. (2021). A survey on knowledge graphs: Representation acquisition and applications. TNNLS, 33(2), 1–21.MathSciNet
Zurück zum Zitat Johnson, J., Gupta, A. & Fei-Fei, L. (2018). Image generation from scene graphs. In CVPR (pp. 1219–1228). Johnson, J., Gupta, A. & Fei-Fei, L. (2018). Image generation from scene graphs. In CVPR (pp. 1219–1228).
Zurück zum Zitat Johnson, J., Krishna, R., Stark, M., et al. (2015). Image retrieval using scene graphs. In CVPR (pp. 3668–3678). Johnson, J., Krishna, R., Stark, M., et al. (2015). Image retrieval using scene graphs. In CVPR (pp. 3668–3678).
Zurück zum Zitat Kanazawa, A., Black, M. J., Jacobs, D. W., et al. (2018). End-to-end recovery of human shape and pose. In CVPR (pp. 7122–7131). Kanazawa, A., Black, M. J., Jacobs, D. W., et al. (2018). End-to-end recovery of human shape and pose. In CVPR (pp. 7122–7131).
Zurück zum Zitat Kanazawa, A., Zhang, J. Y., Felsen, P., et al. (2019). Learning 3D human dynamics from video. In CVPR (pp. 5607–5616). Kanazawa, A., Zhang, J. Y., Felsen, P., et al. (2019). Learning 3D human dynamics from video. In CVPR (pp. 5607–5616).
Zurück zum Zitat Kocabas, M., Athanasiou, N. & Black, M. J. (2020). VIBE: Video inference for human body pose and shape estimation. In CVPR (pp. 5252–5262). Kocabas, M., Athanasiou, N. & Black, M. J. (2020). VIBE: Video inference for human body pose and shape estimation. In CVPR (pp. 5252–5262).
Zurück zum Zitat Kolotouros, N., Pavlakos, G., & Black, M. J., et al. (2019). Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In ICCV (pp. 2252–2261). Kolotouros, N., Pavlakos, G., & Black, M. J., et al. (2019). Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In ICCV (pp. 2252–2261).
Zurück zum Zitat Kolotouros, N., Pavlakos, G., Jayaraman, D., et al. (2021). Probabilistic modeling for human mesh recovery. In ICCV (pp. 11605–11614). Kolotouros, N., Pavlakos, G., Jayaraman, D., et al. (2021). Probabilistic modeling for human mesh recovery. In ICCV (pp. 11605–11614).
Zurück zum Zitat Kong, Y., & Fu, Y. (2016). Close human interaction recognition using patch-aware models. TIP, 25(1), 167–178.MathSciNetMATH Kong, Y., & Fu, Y. (2016). Close human interaction recognition using patch-aware models. TIP, 25(1), 167–178.MathSciNetMATH
Zurück zum Zitat Li, B., Tian, J., et al. (2021). Multitask non-autoregressive model for human motion prediction. TIP, 30, 2562–2574. Li, B., Tian, J., et al. (2021). Multitask non-autoregressive model for human motion prediction. TIP, 30, 2562–2574.
Zurück zum Zitat Li, Y., Ouyang, W., Zhou, B., et al. (2017). Scene graph generation from objects, phrases and region captions. In ICCV (pp. 1270–1279). Li, Y., Ouyang, W., Zhou, B., et al. (2017). Scene graph generation from objects, phrases and region captions. In ICCV (pp. 1270–1279).
Zurück zum Zitat Li, Y., Ouyang, W., Zhou, B., Shi, J., Zhang, C. & Wang, X. (2018). Factorizable net: An efficient subgraph-based framework for scene graph generation. In ECCV (pp. 335–351). Li, Y., Ouyang, W., Zhou, B., Shi, J., Zhang, C. & Wang, X. (2018). Factorizable net: An efficient subgraph-based framework for scene graph generation. In ECCV (pp. 335–351).
Zurück zum Zitat Li, Y.-L., Xu, L., Liu, X., et al. (2020). PaStaNet: Toward human activity knowledge engine. In CVPR (pp. 379–388). Li, Y.-L., Xu, L., Liu, X., et al. (2020). PaStaNet: Toward human activity knowledge engine. In CVPR (pp. 379–388).
Zurück zum Zitat Loper, M., Mahmood, N., Romero, J., et al. (2015). SMPL: A skinned multi-person linear model. TOG, 34(6), 1–16.CrossRef Loper, M., Mahmood, N., Romero, J., et al. (2015). SMPL: A skinned multi-person linear model. TOG, 34(6), 1–16.CrossRef
Zurück zum Zitat Mahmood, N., Ghorbani, N., Troje, N. F., et al. (2019). AMASS: Archive of motion capture as surface shapes. In ICCV (pp. 5441–5450). Mahmood, N., Ghorbani, N., Troje, N. F., et al. (2019). AMASS: Archive of motion capture as surface shapes. In ICCV (pp. 5441–5450).
Zurück zum Zitat Mehta, D., Rhodin, H., Casas, D., et al. (2017). Monocular 3D human pose estimation in the wild using improved CNN supervision. In 3DV (pp. 506–516). Mehta, D., Rhodin, H., Casas, D., et al. (2017). Monocular 3D human pose estimation in the wild using improved CNN supervision. In 3DV (pp. 506–516).
Zurück zum Zitat Pavlakos, G., Choutas, V., Ghorbani, N., et al. (2019). Expressive body capture: 3D hands, face, and body from a single image. In CVPR (pp. 10967–10977). Pavlakos, G., Choutas, V., Ghorbani, N., et al. (2019). Expressive body capture: 3D hands, face, and body from a single image. In CVPR (pp. 10967–10977).
Zurück zum Zitat Pavlakos, G., Malik, J., & Kanazawa, A. (2022). Human mesh recovery from multiple shots. In CVPR (pp. 1485–1495). Pavlakos, G., Malik, J., & Kanazawa, A. (2022). Human mesh recovery from multiple shots. In CVPR (pp. 1485–1495).
Zurück zum Zitat Peng, X. B., Kanazawa, A., Malik, J., et al. (2018). SFV: Reinforcement learning of physical skills from videos. TOG, 37(6), 178–117814.CrossRef Peng, X. B., Kanazawa, A., Malik, J., et al. (2018). SFV: Reinforcement learning of physical skills from videos. TOG, 37(6), 178–117814.CrossRef
Zurück zum Zitat Petrovich, M., Black, M. J. & Varol, G. (2021). Action-conditioned 3D human motion synthesis with transformer VAE. In ICCV (pp. 10985–10995). Petrovich, M., Black, M. J. & Varol, G. (2021). Action-conditioned 3D human motion synthesis with transformer VAE. In ICCV (pp. 10985–10995).
Zurück zum Zitat Punnakkal, A. R., Chandrasekaran, A., Athanasiou, N., et al. (2021). BABEL: Bodies, action and behavior with english labels. In CVPR (pp. 722–731). Punnakkal, A. R., Chandrasekaran, A., Athanasiou, N., et al. (2021). BABEL: Bodies, action and behavior with english labels. In CVPR (pp. 722–731).
Zurück zum Zitat Sohn, K., Lee, H. & Yan, X. (2015). Learning structured output representation using deep conditional generative models. In NIPS (pp. 3483–3491). Sohn, K., Lee, H. & Yan, X. (2015). Learning structured output representation using deep conditional generative models. In NIPS (pp. 3483–3491).
Zurück zum Zitat Song, W., Li, S., Liu, J., Qin, H., Zhang, B., Zhang, S., & Hao, A. (2018). Multitask cascade convolution neural networks for automatic thyroid nodule detection and recognition. IEEE Journal of Biomedical Health Information, 23(3), 1215–1224.CrossRef Song, W., Li, S., Liu, J., Qin, H., Zhang, B., Zhang, S., & Hao, A. (2018). Multitask cascade convolution neural networks for automatic thyroid nodule detection and recognition. IEEE Journal of Biomedical Health Information, 23(3), 1215–1224.CrossRef
Zurück zum Zitat Song, W., Hou, X. & Li, S., et al. (2022). An intelligent virtual standard patient for medical students training based on oral knowledge graph. In IEEE TMM (pp. 1–14). Song, W., Hou, X. & Li, S., et al. (2022). An intelligent virtual standard patient for medical students training based on oral knowledge graph. In IEEE TMM (pp. 1–14).
Zurück zum Zitat Song, W., Wang, X., Gao, Y., Hao, A., & Hou, X. (2022). Real-time expressive avatar animation generation based on monocular videos. In ISMAR-Adjunct (pp. 429–434). Song, W., Wang, X., Gao, Y., Hao, A., & Hou, X. (2022). Real-time expressive avatar animation generation based on monocular videos. In ISMAR-Adjunct (pp. 429–434).
Zurück zum Zitat Song, W., Zhang, X., Gao, Y., Luo, Y., Wang, H., Wang, X. & Hou, X. (2023). UPSR: A unified proxy skeleton retargeting method for heterogeneous avatar animation. In VRW (pp. 867–868). Song, W., Zhang, X., Gao, Y., Luo, Y., Wang, H., Wang, X. & Hou, X. (2023). UPSR: A unified proxy skeleton retargeting method for heterogeneous avatar animation. In VRW (pp. 867–868).
Zurück zum Zitat Von Marcard, T., Pons-Moll, G., & Rosenhahn, B. (2016). Human pose estimation from video and IMUs. TPAMI, 38(8), 1533–1547.CrossRef Von Marcard, T., Pons-Moll, G., & Rosenhahn, B. (2016). Human pose estimation from video and IMUs. TPAMI, 38(8), 1533–1547.CrossRef
Zurück zum Zitat Von Marcard, T., Henschel, R., Black, M., et al. (2018). Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In ECCV (pp. 614–631). Von Marcard, T., Henschel, R., Black, M., et al. (2018). Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In ECCV (pp. 614–631).
Zurück zum Zitat Xu, B., Wong, Y., Li, J., et al. (2019). Learning to detect human-object interactions with knowledge. In CVPR (pp. 2019–2028). Xu, B., Wong, Y., Li, J., et al. (2019). Learning to detect human-object interactions with knowledge. In CVPR (pp. 2019–2028).
Zurück zum Zitat Xu, D., Zhu, Y., Choy, C. B., et al. (2017). Scene graph generation by iterative message passing. In CVPR (pp. 3097–3106). Xu, D., Zhu, Y., Choy, C. B., et al. (2017). Scene graph generation by iterative message passing. In CVPR (pp. 3097–3106).
Zurück zum Zitat Xu, X., Chen, H., Moreno-Noguer, F., et al. (2021). 3D human pose, shape and texture from low-resolution images and videos. TPAMI, 44(9), 4490–4504. Xu, X., Chen, H., Moreno-Noguer, F., et al. (2021). 3D human pose, shape and texture from low-resolution images and videos. TPAMI, 44(9), 4490–4504.
Zurück zum Zitat Xu, X. & Li, B. (2009). Exploiting motion correlations in 3-D articulated human motion tracking. TIP,18(6). Xu, X. & Li, B. (2009). Exploiting motion correlations in 3-D articulated human motion tracking. TIP,18(6).
Zurück zum Zitat Yuan, Y. & Kitani, K. M. (2020). DLow: Diversifying latent flows for diverse human motion prediction. In ECCV (pp. 346–364). Yuan, Y. & Kitani, K. M. (2020). DLow: Diversifying latent flows for diverse human motion prediction. In ECCV (pp. 346–364).
Zurück zum Zitat Zhang, H., Tian, Y., Zhou, X., Ouyang, W., Liu, Y., Wang, L. & Sun, Z. (2021). Pymaf: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In ICCV (pp. 11446–11456). Zhang, H., Tian, Y., Zhou, X., Ouyang, W., Liu, Y., Wang, L. & Sun, Z. (2021). Pymaf: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In ICCV (pp. 11446–11456).
Zurück zum Zitat Zhang, W., Zhu, M. & Derpanis, K. G. (2013). From actemes to action: A strongly-supervised representation for detailed action understanding. In ICCV (pp. 2248–2255). Zhang, W., Zhu, M. & Derpanis, K. G. (2013). From actemes to action: A strongly-supervised representation for detailed action understanding. In ICCV (pp. 2248–2255).
Zurück zum Zitat Zhang, Y., Hassan, M., & Neumann, H., et al. (2020). Generating 3D people in scenes without people. In CVPR (pp. 6193–6203). Zhang, Y., Hassan, M., & Neumann, H., et al. (2020). Generating 3D people in scenes without people. In CVPR (pp. 6193–6203).
Zurück zum Zitat Zhang, Y., Zhou, W., Wang, M., et al. (2021). Deep relation embedding for cross-modal retrieval. TIP, 30, 617–627. Zhang, Y., Zhou, W., Wang, M., et al. (2021). Deep relation embedding for cross-modal retrieval. TIP, 30, 617–627.
Zurück zum Zitat Zollhöfer, M., Thies, J., Garrido, P., et al. (2018). State of the art on monocular 3d face reconstruction, tracking, and applications. CGF37. Zollhöfer, M., Thies, J., Garrido, P., et al. (2018). State of the art on monocular 3d face reconstruction, tracking, and applications. CGF37.
Zurück zum Zitat Zou, C., Wang, B., Hu, Y., et al. (2021). End-to-end human object interaction detection with HOI transformer. In CVPR (pp. 11825–11834). Zou, C., Wang, B., Hu, Y., et al. (2021). End-to-end human object interaction detection with HOI transformer. In CVPR (pp. 11825–11834).
Zurück zum Zitat Zou, S., Zuo, X., Wang, S., et al. (2021). Detailed avatar recovery from single image. TPAMI, PP(11), 7363–7379. Zou, S., Zuo, X., Wang, S., et al. (2021). Detailed avatar recovery from single image. TPAMI, PP(11), 7363–7379.
Zurück zum Zitat Zuffi, S., Kanazawa, A. & Black, M. J. (2018). Lions and tigers and bears: Capturing non-rigid, 3D, articulated shape from images. In CVPR (pp. 3955–3963). Zuffi, S., Kanazawa, A. & Black, M. J. (2018). Lions and tigers and bears: Capturing non-rigid, 3D, articulated shape from images. In CVPR (pp. 3955–3963).
Metadaten
Titel
Automatic Generation of 3D Scene Animation Based on Dynamic Knowledge Graphs and Contextual Encoding
verfasst von
Wenfeng Song
Xinyu Zhang
Yuting Guo
Shuai Li
Aimin Hao
Hong Qin
Publikationsdatum
01.07.2023
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 11/2023
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-023-01839-1

Weitere Artikel der Ausgabe 11/2023

International Journal of Computer Vision 11/2023 Zur Ausgabe

S.I. : Traditional Computer Vision in the Age of Deep Learning

Improving Domain Adaptation Through Class Aware Frequency Transformation

Premium Partner