Skip to main content
Erschienen in: International Journal of Computer Vision 10/2023

18.06.2023 | Manuscript

Conditional Temporal Variational AutoEncoder for Action Video Prediction

verfasst von: Xiaogang Xu, Yi Wang, Liwei Wang, Bei Yu, Jiaya Jia

Erschienen in: International Journal of Computer Vision | Ausgabe 10/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To synthesize a realistic action sequence based on a single human image, it is crucial to model both motion patterns and diversity in the action video. This paper proposes an Action Conditional Temporal Variational AutoEncoder (ACT-VAE) to improve motion prediction accuracy and capture movement diversity. ACT-VAE predicts pose sequences for an action clip from a single input image. It is implemented as a deep generative model that maintains temporal coherence according to the action category with a novel temporal modeling on latent space. Further, ACT-VAE is a general action sequence prediction framework. When connected with a plug-and-play Pose-to-Image network, ACT-VAE can synthesize image sequences. Extensive experiments bear out our approach can predict accurate pose and synthesize realistic image sequences, surpassing state-of-the-art approaches. Compared to existing methods, ACT-VAE improves model accuracy and preserves diversity.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Aberman, K., Wu, R., Lischinski, D., Chen, B., & Cohen-Or, D. (2019). Learning character-agnostic motion for motion retargeting in 2d. arXiv:1905.01680. Aberman, K., Wu, R., Lischinski, D., Chen, B., & Cohen-Or, D. (2019). Learning character-agnostic motion for motion retargeting in 2d. arXiv:​1905.​01680.
Zurück zum Zitat Adeli, V., Ehsanpour, M., Reid, I., Niebles, J. C., Savarese, S., Adeli, E., & Rezatofighi, H. (2021). Tripod: Human trajectory and pose dynamics forecasting in the wild. In International conference on computer vision. Adeli, V., Ehsanpour, M., Reid, I., Niebles, J. C., Savarese, S., Adeli, E., & Rezatofighi, H. (2021). Tripod: Human trajectory and pose dynamics forecasting in the wild. In International conference on computer vision.
Zurück zum Zitat Ahuja, C., & Morency, L. P. (2019). Language2pose: Natural language grounded pose forecasting. In 2019 International conference on 3D vision (3DV). Ahuja, C., & Morency, L. P. (2019). Language2pose: Natural language grounded pose forecasting. In 2019 International conference on 3D vision (3DV).
Zurück zum Zitat Aliakbarian, S., Saleh, F. S., Salzmann, M., Petersson, L., & Gould, S. (2020). A stochastic conditioning scheme for diverse human motion prediction. In IEEE conference on computer vision and pattern recognition. Aliakbarian, S., Saleh, F. S., Salzmann, M., Petersson, L., & Gould, S. (2020). A stochastic conditioning scheme for diverse human motion prediction. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R. H., & Levine, S. (2017). Stochastic variational video prediction. arXiv:1710.11252. Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R. H., & Levine, S. (2017). Stochastic variational video prediction. arXiv:​1710.​11252.
Zurück zum Zitat Balaji, Y., Min, M. R., Bai, B., Chellappa, R., & Graf, H. P. (2019). Conditional gan with discriminative filter generation for text-to-video synthesis. In IJCAI. Balaji, Y., Min, M. R., Bai, B., Chellappa, R., & Graf, H. P. (2019). Conditional gan with discriminative filter generation for text-to-video synthesis. In IJCAI.
Zurück zum Zitat Cai, H., Bai, C., Tai, Y. W., & Tang, C. K. (2018). Deep video generation, prediction and completion of human action sequences. In The European Conference on Computer Vision. Cai, H., Bai, C., Tai, Y. W., & Tang, C. K. (2018). Deep video generation, prediction and completion of human action sequences. In The European Conference on Computer Vision.
Zurück zum Zitat Cai, Y., Huang, L., Wang, Y., Cham, T. J., Cai, J., Yuan, J., Liu, J., Yang, X., Zhu, Y., Shen, X., et al. (2020). Learning progressive joint propagation for human motion prediction. In The european conference on computer vision. Cai, Y., Huang, L., Wang, Y., Cham, T. J., Cai, J., Yuan, J., Liu, J., Yang, X., Zhu, Y., Shen, X., et al. (2020). Learning progressive joint propagation for human motion prediction. In The european conference on computer vision.
Zurück zum Zitat Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., & Sheikh, Y. A. (2019). Openpose: Realtime multi-person 2d pose estimation using part affinity fields. In IEEE Transactions on Pattern Analysis and Machine Intelligence. Intell. Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., & Sheikh, Y. A. (2019). Openpose: Realtime multi-person 2d pose estimation using part affinity fields. In IEEE Transactions on Pattern Analysis and Machine Intelligence. Intell.
Zurück zum Zitat Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE conference on computer vision and pattern recognition. Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Castrejon, L., Ballas, N., & Courville, A. (2019). Improved conditional vrnns for video prediction. In International Conference on Computer Vision. Castrejon, L., Ballas, N., & Courville, A. (2019). Improved conditional vrnns for video prediction. In International Conference on Computer Vision.
Zurück zum Zitat Chen, G., Li, J., Lu, J., & Zhou, J. (2021). Human trajectory prediction via counterfactual analysis. In International Conference on Computer Vision. Chen, G., Li, J., Lu, J., & Zhou, J. (2021). Human trajectory prediction via counterfactual analysis. In International Conference on Computer Vision.
Zurück zum Zitat Chen, W., & Hays, J. (2018). Sketchygan: Towards diverse and realistic sketch to image synthesis. In IEEE conference on computer vision and pattern recognition. Chen, W., & Hays, J. (2018). Sketchygan: Towards diverse and realistic sketch to image synthesis. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Cheng, Y., Yang, B., Wang, B., Yan, W., & Tan, R. T. (2019). Occlusion-aware networks for 3d human pose estimation in video. In International conference on computer vision. Cheng, Y., Yang, B., Wang, B., Yan, W., & Tan, R. T. (2019). Occlusion-aware networks for 3d human pose estimation in video. In International conference on computer vision.
Zurück zum Zitat Choi, H., Moon, G., Chang, J. Y., & Lee, K. M. (2021). Beyond static features for temporally consistent 3d human pose and shape from a video. In IEEE conference on computer vision and pattern recognition. Choi, H., Moon, G., Chang, J. Y., & Lee, K. M. (2021). Beyond static features for temporally consistent 3d human pose and shape from a video. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., & Bengio, Y. (2015). A recurrent latent variable model for sequential data. In Advances in Neural Information Processing Systems. Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., & Bengio, Y. (2015). A recurrent latent variable model for sequential data. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Cui, A., McKee, D., & Lazebnik, S. (2021). Dressing in order: Recurrent person image generation for pose transfer, virtual try-on and outfit editing. In International Conference on Computer Vision Cui, A., McKee, D., & Lazebnik, S. (2021). Dressing in order: Recurrent person image generation for pose transfer, virtual try-on and outfit editing. In International Conference on Computer Vision
Zurück zum Zitat Duan, H., Zhao, Y., Chen, K., Lin, D., & Dai, B. (2022). Revisiting skeleton-based action recognition. In IEEE conference on computer vision and pattern recognition. Duan, H., Zhao, Y., Chen, K., Lin, D., & Dai, B. (2022). Revisiting skeleton-based action recognition. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Duan, J., Wang, L., Long, C., Zhou, S., Zheng, F., Shi, L., & Hua, G. (2022). Complementary attention gated network for pedestrian trajectory prediction. In AAAI. Duan, J., Wang, L., Long, C., Zhou, S., Zheng, F., Shi, L., & Hua, G. (2022). Complementary attention gated network for pedestrian trajectory prediction. In AAAI.
Zurück zum Zitat Finn, C., Goodfellow, I., & Levine, S. (2016). Unsupervised learning for physical interaction through video prediction. In Advances in Neural Information Processing Systems. Finn, C., Goodfellow, I., & Levine, S. (2016). Unsupervised learning for physical interaction through video prediction. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Frühstück, A., Singh, K. K., Shechtman, E., Mitra, N. J., Wonka, P., & Lu, J. (2022). Insetgan for full-body image generation. In IEEE conference on computer vision and pattern recognition. Frühstück, A., Singh, K. K., Shechtman, E., Mitra, N. J., Wonka, P., & Lu, J. (2022). Insetgan for full-body image generation. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Fu, J., Li, S., Jiang, Y., Lin, K. Y., Qian, C., Loy, C. C., Wu, W., & Liu, Z. (2022). Stylegan-human: A data-centric odyssey of human generation. In The European Conference on Computer Vision. Fu, J., Li, S., Jiang, Y., Lin, K. Y., Qian, C., Loy, C. C., Wu, W., & Liu, Z. (2022). Stylegan-human: A data-centric odyssey of human generation. In The European Conference on Computer Vision.
Zurück zum Zitat Gafni, O., Ashual, O., & Wolf, L. (2021). Single-shot freestyle dance reenactment. In IEEE conference on computer vision and pattern recognition. Gafni, O., Ashual, O., & Wolf, L. (2021). Single-shot freestyle dance reenactment. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Ge, C., Song, Y., Ge, Y., Yang, H., Liu, W., & Luo, P. (2021). Disentangled cycle consistency for highly-realistic virtual try-on. In IEEE conference on computer vision and pattern recognition. Ge, C., Song, Y., Ge, Y., Yang, H., Liu, W., & Luo, P. (2021). Disentangled cycle consistency for highly-realistic virtual try-on. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., & Luo, P. (2021). Parser-free virtual try-on via distilling appearance flows. In IEEE conference on computer vision and pattern recognition. Ge, Y., Song, Y., Zhang, R., Ge, C., Liu, W., & Luo, P. (2021). Parser-free virtual try-on via distilling appearance flows. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Geng, Z., Sun, K., Xiao, B., Zhang, Z., & Wang, J. (2021). Bottom-up human pose estimation via disentangled keypoint regression. In IEEE conference on computer vision and pattern recognition. Geng, Z., Sun, K., Xiao, B., Zhang, Z., & Wang, J. (2021). Bottom-up human pose estimation via disentangled keypoint regression. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Ghosh, A., Zhang, R., Dokania, P.K., Wang, O., Efros, A.A., Torr, P.H., & Shechtman, E. (2019). Interactive sketch & fill: Multiclass sketch-to-image translation. In International conference on computer vision. Ghosh, A., Zhang, R., Dokania, P.K., Wang, O., Efros, A.A., Torr, P.H., & Shechtman, E. (2019). Interactive sketch & fill: Multiclass sketch-to-image translation. In International conference on computer vision.
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., & Ororbia, A.G. (2019). A neural temporal model for human motion prediction. In IEEE conference on computer vision and pattern recognition. Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., & Ororbia, A.G. (2019). A neural temporal model for human motion prediction. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2016). Lstm: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems. Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2016). Lstm: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems.
Zurück zum Zitat Guen, V. L., & Thome, N. (2020). Disentangling physical dynamics from unknown factors for unsupervised video prediction. In IEEE conference on computer vision and pattern recognition. Guen, V. L., & Thome, N. (2020). Disentangling physical dynamics from unknown factors for unsupervised video prediction. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Guo, C., Zou, S., Zuo, X., Wang, S., Ji, W., Li, X., & Cheng, L. (2022). Generating diverse and natural 3d human motions from text. In IEEE conference on computer vision and pattern recognition. Guo, C., Zou, S., Zuo, X., Wang, S., Ji, W., Li, X., & Cheng, L. (2022). Generating diverse and natural 3d human motions from text. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Guo, X., & Choi, J. (2019). Human motion prediction via learning local structure representations and temporal dependencies. In AAAI. Guo, X., & Choi, J. (2019). Human motion prediction via learning local structure representations and temporal dependencies. In AAAI.
Zurück zum Zitat Guo, X., Zhao, Y., & Li, J. (2021). Danceit: Music-inspired dancing video synthesis. IEEE Transactions on Image Process. Guo, X., Zhao, Y., & Li, J. (2021). Danceit: Music-inspired dancing video synthesis. IEEE Transactions on Image Process.
Zurück zum Zitat Han, L., Ren, J., Lee, H.Y., Barbieri, F., Olszewski, K., Minaee, S., Metaxas, D., & Tulyakov, S. (2022). Show me what and tell me how: Video synthesis via multimodal conditioning. In IEEE conference on computer vision and pattern recognition. Han, L., Ren, J., Lee, H.Y., Barbieri, F., Olszewski, K., Minaee, S., Metaxas, D., & Tulyakov, S. (2022). Show me what and tell me how: Video synthesis via multimodal conditioning. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Ho, T.T., Virtusio, J.J., Chen, Y.Y., Hsu, C.M., & Hua, K.L. (2020). Sketch-guided deep portrait generation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM). Ho, T.T., Virtusio, J.J., Chen, Y.Y., Hsu, C.M., & Hua, K.L. (2020). Sketch-guided deep portrait generation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM).
Zurück zum Zitat Huang, Y., Bi, H., Li, Z., Mao, T., & Wang, Z. (2019). Stgat: Modeling spatial-temporal interactions for human trajectory prediction. In International Conference on Computer Vision. Huang, Y., Bi, H., Li, Z., Mao, T., & Wang, Z. (2019). Stgat: Modeling spatial-temporal interactions for human trajectory prediction. In International Conference on Computer Vision.
Zurück zum Zitat Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2013). Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence. Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2013). Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Zurück zum Zitat Iqbal, U., Molchanov, P., & Kautz, J. (2020). Weakly-supervised 3d human pose learning via multi-view images in the wild. In IEEE conference on computer vision and pattern recognition. Iqbal, U., Molchanov, P., & Kautz, J. (2020). Weakly-supervised 3d human pose learning via multi-view images in the wild. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Jia, X., De Brabandere, B., Tuytelaars, T., & Gool, L. V. (2016). Dynamic filter networks. In Advances in Neural Information Processing Systems. Jia, X., De Brabandere, B., Tuytelaars, T., & Gool, L. V. (2016). Dynamic filter networks. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Jiang, Y., Yang, S., Qju, H., Wu, W., Loy, C. C., & Liu, Z. (2022). Text2human: Text-driven controllable human image generation. ACM Transactions on Graph. Jiang, Y., Yang, S., Qju, H., Wu, W., Loy, C. C., & Liu, Z. (2022). Text2human: Text-driven controllable human image generation. ACM Transactions on Graph.
Zurück zum Zitat Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In The European Conference on Computer Vision. Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In The European Conference on Computer Vision.
Zurück zum Zitat Kalchbrenner, N., van den Oord, A., Simonyan, K., Danihelka, I., Vinyals, O., Graves, A., & Kavukcuoglu, K. (2017). Video pixel networks. In ICML. Kalchbrenner, N., van den Oord, A., Simonyan, K., Danihelka, I., Vinyals, O., Graves, A., & Kavukcuoglu, K. (2017). Video pixel networks. In ICML.
Zurück zum Zitat Kappel, M., Golyanik, V., Elgharib, M., Henningson, J. O., Seidel, H. P., Castillo, S., Theobalt, C., & Magnor, M. (2021). High-fidelity neural human motion transfer from monocular video. In IEEE conference on computer vision and pattern recognition. Kappel, M., Golyanik, V., Elgharib, M., Henningson, J. O., Seidel, H. P., Castillo, S., Theobalt, C., & Magnor, M. (2021). High-fidelity neural human motion transfer from monocular video. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In IEEE IEEE conference on computer vision and pattern recognition. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In IEEE IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Kim, Y., Nam, S., Cho, I., & Kim, S.J. (2019). Unsupervised keypoint learning for guiding class-conditional video prediction. In Advances in Neural Information Processing Systems. Kim, Y., Nam, S., Cho, I., & Kim, S.J. (2019). Unsupervised keypoint learning for guiding class-conditional video prediction. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Kim, Y., Nam, S., Cho, I., & Kim, S. J. (2019). Unsupervised keypoint learning for guiding class-conditional video prediction. In Advances in Neural Information Processing Systems. Kim, Y., Nam, S., Cho, I., & Kim, S. J. (2019). Unsupervised keypoint learning for guiding class-conditional video prediction. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Kingma, D.P., & Welling, M. (2014). Auto-encoding variational bayes. In The International Conference on Learning Representations. Kingma, D.P., & Welling, M. (2014). Auto-encoding variational bayes. In The International Conference on Learning Representations.
Zurück zum Zitat Kocabas, M., Karagoz, S., & Akbas, E. (2019). Self-supervised learning of 3d human pose using multi-view geometry. In IEEE conference on computer vision and pattern recognition. Kocabas, M., Karagoz, S., & Akbas, E. (2019). Self-supervised learning of 3d human pose using multi-view geometry. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Kothari, P., Sifringer, B., & Alahi, A. (2021). Interpretable social anchors for human trajectory forecasting in crowds. In IEEE conference on computer vision and pattern recognition. Kothari, P., Sifringer, B., & Alahi, A. (2021). Interpretable social anchors for human trajectory forecasting in crowds. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Kumar, M., Babaeizadeh, M., Erhan, D., Finn, C., Levine, S., Dinh, L., & Kingma, D. (2019). Videoflow: A flow-based generative model for video. arXiv:1903.01434 Kumar, M., Babaeizadeh, M., Erhan, D., Finn, C., Levine, S., Dinh, L., & Kingma, D. (2019). Videoflow: A flow-based generative model for video. arXiv:​1903.​01434
Zurück zum Zitat Kwon, Y.H., & Park, M.G. (2019). Predicting future frames using retrospective cycle gan. In IEEE conference on computer vision and pattern recognition. Kwon, Y.H., & Park, M.G. (2019). Predicting future frames using retrospective cycle gan. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Lee, A.X., Zhang, R., Ebert, F., Abbeel, P., Finn, C., & Levine, S. (2018). Stochastic adversarial video prediction. arXiv:1804.01523. Lee, A.X., Zhang, R., Ebert, F., Abbeel, P., Finn, C., & Levine, S. (2018). Stochastic adversarial video prediction. arXiv:​1804.​01523.
Zurück zum Zitat Lee, H. Y., Yang, X., Liu, M. Y., Wang, T. C., Lu, Y. D., Yang, M. H., & Kautz, J. (2019). Dancing to music. In Advances in Neural Information Processing Systems. Lee, H. Y., Yang, X., Liu, M. Y., Wang, T. C., Lu, Y. D., Yang, M. H., & Kautz, J. (2019). Dancing to music. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Li, C., Zhang, Z., Sun Lee, W., & Hee Lee, G. (2018). Convolutional sequence to sequence model for human dynamics. In IEEE conference on computer vision and pattern recognition. Li, C., Zhang, Z., Sun Lee, W., & Hee Lee, G. (2018). Convolutional sequence to sequence model for human dynamics. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Li, L., Wang, S., Zhang, Z., Ding, Y., Zheng, Y., Yu, X., & Fan, C. (2021). Write-a-speaker: Text-based emotional and rhythmic talking-head generation. In AAAI. Li, L., Wang, S., Zhang, Z., Ding, Y., Zheng, Y., Yu, X., & Fan, C. (2021). Write-a-speaker: Text-based emotional and rhythmic talking-head generation. In AAAI.
Zurück zum Zitat Li, X., Zhang, J., Li, K., Vyas, S., & Rawat, Y.S. (2022). Pose-guided generative adversarial net for novel view action synthesis. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. Li, X., Zhang, J., Li, K., Vyas, S., & Rawat, Y.S. (2022). Pose-guided generative adversarial net for novel view action synthesis. In Proceedings of the IEEE/CVF winter conference on applications of computer vision.
Zurück zum Zitat Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M.H. (2018). Flow-grounded spatial-temporal video prediction from still images. In The European Conference on Computer Vision. Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M.H. (2018). Flow-grounded spatial-temporal video prediction from still images. In The European Conference on Computer Vision.
Zurück zum Zitat Li, Y., Li, Y., Lu, J., Shechtman, E., Lee, Y. J., & Singh, K. K. (2021). Collaging class-specific gans for semantic image synthesis. In International Conference on Computer Vision. Li, Y., Li, Y., Lu, J., Shechtman, E., Lee, Y. J., & Singh, K. K. (2021). Collaging class-specific gans for semantic image synthesis. In International Conference on Computer Vision.
Zurück zum Zitat Liu, D., Wu, L., Zheng, F., Liu, L., & Wang, M. (2022). Verbal-person nets: Pose-guided multi-granularity language-to-person generation. IEEE Transactions on Neural Networks and Learning Systems. Liu, D., Wu, L., Zheng, F., Liu, L., & Wang, M. (2022). Verbal-person nets: Pose-guided multi-granularity language-to-person generation. IEEE Transactions on Neural Networks and Learning Systems.
Zurück zum Zitat Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., & Van Gool, L. (2017). Pose guided person image generation. In Advances in Neural Information Processing Systems. Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., & Van Gool, L. (2017). Pose guided person image generation. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Mao, W., Liu, M., & Salzmann, M. (2020). History repeats itself: Human motion prediction via motion attention. In The European Conference on Computer Vision. Mao, W., Liu, M., & Salzmann, M. (2020). History repeats itself: Human motion prediction via motion attention. In The European Conference on Computer Vision.
Zurück zum Zitat Mao, W., Liu, M., Salzmann, M., & Li, H. (2019). Learning trajectory dependencies for human motion prediction. In The European Conference on Computer Vision. Mao, W., Liu, M., Salzmann, M., & Li, H. (2019). Learning trajectory dependencies for human motion prediction. In The European Conference on Computer Vision.
Zurück zum Zitat Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Paul Smolley, S. (2017). Least squares generative adversarial networks. In IEEE conference on computer vision and pattern recognition. Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Paul Smolley, S. (2017). Least squares generative adversarial networks. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In 11th Annual Conference of the International Speech Communication Association. Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., & Khudanpur, S. (2010). Recurrent neural network based language model. In 11th Annual Conference of the International Speech Communication Association.
Zurück zum Zitat Minderer, M., Sun, C., Villegas, R., Cole, F., Murphy, K. P., & Lee, H. (2019). Unsupervised learning of object structure and dynamics from videos. In Advances in Neural Information Processing Systems. Minderer, M., Sun, C., Villegas, R., Cole, F., Murphy, K. P., & Lee, H. (2019). Unsupervised learning of object structure and dynamics from videos. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Neverova, N., Alp Guler, R., & Kokkinos, I. (2018). Dense pose transfer. In The European Conference on Computer Vision. Neverova, N., Alp Guler, R., & Kokkinos, I. (2018). Dense pose transfer. In The European Conference on Computer Vision.
Zurück zum Zitat Oliu, M., Selva, J., & Escalera, S. (2018). Folded recurrent neural networks for future video prediction. In The European Conference on Computer Vision. Oliu, M., Selva, J., & Escalera, S. (2018). Folded recurrent neural networks for future video prediction. In The European Conference on Computer Vision.
Zurück zum Zitat Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Pavllo, D., Feichtenhofer, C., Grangier, D., & Auli, M. (2019). 3d human pose estimation in video with temporal convolutions and semi-supervised training. In IEEE conference on computer vision and pattern recognition. Pavllo, D., Feichtenhofer, C., Grangier, D., & Auli, M. (2019). 3d human pose estimation in video with temporal convolutions and semi-supervised training. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Perez, E., Strub, F., De Vries, H., Dumoulin, V., & Courville, A. (2018). Film: Visual reasoning with a general conditioning layer. In AAAI. Perez, E., Strub, F., De Vries, H., Dumoulin, V., & Courville, A. (2018). Film: Visual reasoning with a general conditioning layer. In AAAI.
Zurück zum Zitat Piergiovanni, A., Angelova, A., Toshev, A., & Ryoo, M.S. (2020). Adversarial generative grammars for human activity prediction. arXiv:2008.04888. Piergiovanni, A., Angelova, A., Toshev, A., & Ryoo, M.S. (2020). Adversarial generative grammars for human activity prediction. arXiv:​2008.​04888.
Zurück zum Zitat Razavi, A., Oord, A. V. D., Poole, B., & Vinyals, O. (2019). Preventing posterior collapse with delta-vaes. In ICML Razavi, A., Oord, A. V. D., Poole, B., & Vinyals, O. (2019). Preventing posterior collapse with delta-vaes. In ICML
Zurück zum Zitat Ren, X., Li, H., Huang, Z., & Chen, Q. (2020). Self-supervised dance video synthesis conditioned on music. In ACM International Conference on Multimedia. Ren, X., Li, H., Huang, Z., & Chen, Q. (2020). Self-supervised dance video synthesis conditioned on music. In ACM International Conference on Multimedia.
Zurück zum Zitat Ren, Y., Fan, X., Li, G., Liu, S., & Li, T.H. (2022). Neural texture extraction and distribution for controllable person image synthesis. In IEEE conference on computer vision and pattern recognition. Ren, Y., Fan, X., Li, G., Liu, S., & Li, T.H. (2022). Neural texture extraction and distribution for controllable person image synthesis. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Rhodin, H., Spörri, J., Katircioglu, I., Constantin, V., Meyer, F., Müller, E., Salzmann, M., & Fua, P. (2018). Learning monocular 3d human pose estimation from multi-view images. In IEEE conference on computer vision and pattern recognition. Rhodin, H., Spörri, J., Katircioglu, I., Constantin, V., Meyer, F., Müller, E., Salzmann, M., & Fua, P. (2018). Learning monocular 3d human pose estimation from multi-view images. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Shahroudy, A., Liu, J., Ng, T. T., & Wang, G. (2016). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In IEEE conference on computer vision and pattern recognition. Shahroudy, A., Liu, J., Ng, T. T., & Wang, G. (2016). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., & Sebe, N. (2019). First order motion model for image animation. In Advances in Neural Information Processing Systems. Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., & Sebe, N. (2019). First order motion model for image animation. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., & Sebe, N. (2019). Animating arbitrary objects via deep motion transfer. In IEEE conference on computer vision and pattern recognition. Siarohin, A., Lathuilière, S., Tulyakov, S., Ricci, E., & Sebe, N. (2019). Animating arbitrary objects via deep motion transfer. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems. Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems.
Zurück zum Zitat Siyao, L., Yu, W., Gu, T., Lin, C., Wang, Q., Qian, C., Loy, C. C., & Liu, Z. (2022). Bailando: 3d dance generation by actor-critic gpt with choreographic memory. In IEEE conference on computer vision and pattern recognition. Siyao, L., Yu, W., Gu, T., Lin, C., Wang, Q., Qian, C., Loy, C. C., & Liu, Z. (2022). Bailando: 3d dance generation by actor-critic gpt with choreographic memory. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Srivastava, N., Mansimov, E., & Salakhudinov, R. (2015). Unsupervised learning of video representations using lstms. In ICML. Srivastava, N., Mansimov, E., & Salakhudinov, R. (2015). Unsupervised learning of video representations using lstms. In ICML.
Zurück zum Zitat Tang, H., Bai, S., Zhang, L., Torr, P.H., & Sebe, N. (2020). Xinggan for person image generation. In The European Conference on Computer Vision. Tang, H., Bai, S., Zhang, L., Torr, P.H., & Sebe, N. (2020). Xinggan for person image generation. In The European Conference on Computer Vision.
Zurück zum Zitat Tulyakov, S., Liu, M.Y., Yang, X., & Kautz, J. (2018). Mocogan: Decomposing motion and content for video generation. In IEEE conference on computer vision and pattern recognition. Tulyakov, S., Liu, M.Y., Yang, X., & Kautz, J. (2018). Mocogan: Decomposing motion and content for video generation. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michalski, M., & Gelly, S. (2018). Towards accurate generative models of video: A new metric & challenges. arXiv:1812.01717. Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michalski, M., & Gelly, S. (2018). Towards accurate generative models of video: A new metric & challenges. arXiv:​1812.​01717.
Zurück zum Zitat Villegas, R., Yang, J., Ceylan, D., & Lee, H. (2018). Neural kinematic networks for unsupervised motion retargetting. In IEEE conference on computer vision and pattern recognition. Villegas, R., Yang, J., Ceylan, D., & Lee, H. (2018). Neural kinematic networks for unsupervised motion retargetting. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Villegas, R., Yang, J., Hong, S., Lin, X., & Lee, H. (2017). Decomposing motion and content for natural video sequence prediction. arXiv:1706.08033. Villegas, R., Yang, J., Hong, S., Lin, X., & Lee, H. (2017). Decomposing motion and content for natural video sequence prediction. arXiv:​1706.​08033.
Zurück zum Zitat Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., & Lee, H. (2017). Learning to generate long-term future via hierarchical prediction. In ICML. Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., & Lee, H. (2017). Learning to generate long-term future via hierarchical prediction. In ICML.
Zurück zum Zitat Walker, J., Marino, K., Gupta, A., & Hebert, M. (2017). The pose knows: Video forecasting by generating pose futures. In International Conference on Computer Vision. Walker, J., Marino, K., Gupta, A., & Hebert, M. (2017). The pose knows: Video forecasting by generating pose futures. In International Conference on Computer Vision.
Zurück zum Zitat Wandt, B., Rudolph, M., Zell, P., Rhodin, H., & Rosenhahn, B. (2021). Canonpose: Self-supervised monocular 3d human pose estimation in the wild. In IEEE conference on computer vision and pattern recognition. Wandt, B., Rudolph, M., Zell, P., Rhodin, H., & Rosenhahn, B. (2021). Canonpose: Self-supervised monocular 3d human pose estimation in the wild. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Wang, B., Adeli, E., Chiu, H. K., Huang, D. A., & Niebles, J. C. (2019). Imitation learning for human pose prediction. In International Conference on Computer Vision. Wang, B., Adeli, E., Chiu, H. K., Huang, D. A., & Niebles, J. C. (2019). Imitation learning for human pose prediction. In International Conference on Computer Vision.
Zurück zum Zitat Wang, T. C., Liu, M. Y., Zhu, J. Y., Liu, G., Tao, A., Kautz, J., & Catanzaro, B. (2018). Video-to-video synthesis. arXiv:1808.06601. Wang, T. C., Liu, M. Y., Zhu, J. Y., Liu, G., Tao, A., Kautz, J., & Catanzaro, B. (2018). Video-to-video synthesis. arXiv:​1808.​06601.
Zurück zum Zitat Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., & Catanzaro, B. (2018). High-resolution image synthesis and semantic manipulation with conditional gans. In IEEE conference on computer vision and pattern recognition. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., & Catanzaro, B. (2018). High-resolution image synthesis and semantic manipulation with conditional gans. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Wang, W., Alameda-Pineda, X., Xu, D., Fua, P., Ricci, E., & Sebe, N. (2018). Every smile is unique: Landmark-guided diverse smile generation. In IEEE conference on computer vision and pattern recognition. Wang, W., Alameda-Pineda, X., Xu, D., Fua, P., Ricci, E., & Sebe, N. (2018). Every smile is unique: Landmark-guided diverse smile generation. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Wang, Y., Li, M., Cai, H., Chen, W.M., & Han, S. (2022). Lite pose: Efficient architecture design for 2d human pose estimation. In IEEE conference on computer vision and pattern recognition. Wang, Y., Li, M., Cai, H., Chen, W.M., & Han, S. (2022). Lite pose: Efficient architecture design for 2d human pose estimation. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Wang, Y., Zhang, J., Zhu, H., Long, M., Wang, J., & Yu, P. S. (2019). Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In IEEE conference on computer vision and pattern recognition. Wang, Y., Zhang, J., Zhu, H., Long, M., Wang, J., & Yu, P. S. (2019). Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Process. Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Process.
Zurück zum Zitat Wichers, N., Villegas, R., Erhan, D., & Lee, H. (2018). Hierarchical long-term video prediction without supervision. arXiv:1806.04768. Wichers, N., Villegas, R., Erhan, D., & Lee, H. (2018). Hierarchical long-term video prediction without supervision. arXiv:​1806.​04768.
Zurück zum Zitat Wu, Q., Chen, X., Huang, Z., & Wang, W. (2020). Generating future frames with mask-guided prediction. In The IEEE International Conference on Multimedia and Expo. Wu, Q., Chen, X., Huang, Z., & Wang, W. (2020). Generating future frames with mask-guided prediction. In The IEEE International Conference on Multimedia and Expo.
Zurück zum Zitat Xu, J., Ni, B., Li, Z., Cheng, S., & Yang, X. (2018). Structure preserving video prediction. In IEEE conference on computer vision and pattern recognition. Xu, J., Ni, B., Li, Z., Cheng, S., & Yang, X. (2018). Structure preserving video prediction. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Yan, X., Rastogi, A., Villegas, R., Sunkavalli, K., Shechtman, E., Hadap, S., Yumer, E., & Lee, H. (2018). Mt-vae: Learning motion transformations to generate multimodal human dynamics. In The European Conference on Computer Vision. Yan, X., Rastogi, A., Villegas, R., Sunkavalli, K., Shechtman, E., Hadap, S., Yumer, E., & Lee, H. (2018). Mt-vae: Learning motion transformations to generate multimodal human dynamics. In The European Conference on Computer Vision.
Zurück zum Zitat Yang, C., Wang, Z., Zhu, X., Huang, C., Shi, J., & Lin, D. (2018). Pose guided human video generation. In The European Conference on Computer Vision. Yang, C., Wang, Z., Zhu, X., Huang, C., Shi, J., & Lin, D. (2018). Pose guided human video generation. In The European Conference on Computer Vision.
Zurück zum Zitat Yang, Z., Zhu, W., Wu, W., Qian, C., Zhou, Q., Zhou, B., & Loy, C.C. (2020). Transmomo: Invariance-driven unsupervised video motion retargeting. In IEEE conference on computer vision and pattern recognition. Yang, Z., Zhu, W., Wu, W., Qian, C., Zhou, Q., Zhou, B., & Loy, C.C. (2020). Transmomo: Invariance-driven unsupervised video motion retargeting. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Yoo, Y., Yun, S., Jin Chang, H., Demiris, Y., & Young Choi, J. (2017). Variational autoencoded regression: high dimensional regression of visual data on complex manifold. In IEEE conference on computer vision and pattern recognition. Yoo, Y., Yun, S., Jin Chang, H., Demiris, Y., & Young Choi, J. (2017). Variational autoencoded regression: high dimensional regression of visual data on complex manifold. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Yoon, J.S., Liu, L., Golyanik, V., Sarkar, K., Park, H.S., & Theobalt, C. (2021). Pose-guided human animation from a single image in the wild. In IEEE conference on computer vision and pattern recognition. Yoon, J.S., Liu, L., Golyanik, V., Sarkar, K., Park, H.S., & Theobalt, C. (2021). Pose-guided human animation from a single image in the wild. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Yuan, Y., & Kitani, K. (2020). Dlow: Diversifying latent flows for diverse human motion prediction. In The European Conference on Computer Vision. Yuan, Y., & Kitani, K. (2020). Dlow: Diversifying latent flows for diverse human motion prediction. In The European Conference on Computer Vision.
Zurück zum Zitat Zhang, R., Isola, P., Efros, A.A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In IEEE conference on computer vision and pattern recognition. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Zhang, W., Zhu, M., & Derpanis, K.G. (2013). From actemes to action: A strongly-supervised representation for detailed action understanding. In International Conference on Computer Vision. Zhang, W., Zhu, M., & Derpanis, K.G. (2013). From actemes to action: A strongly-supervised representation for detailed action understanding. In International Conference on Computer Vision.
Zurück zum Zitat Zhao, L., Peng, X., Tian, Y., Kapadia, M., & Metaxas, D. (2018). Learning to forecast and refine residual motion for image-to-video generation. In The European Conference on Computer Vision. Zhao, L., Peng, X., Tian, Y., Kapadia, M., & Metaxas, D. (2018). Learning to forecast and refine residual motion for image-to-video generation. In The European Conference on Computer Vision.
Zurück zum Zitat Zhou, X., Huang, S., Li, B., Li, Y., Li, J., & Zhang, Z. (2019). Text guided person image synthesis. In IEEE conference on computer vision and pattern recognition. Zhou, X., Huang, S., Li, B., Li, Y., Li, J., & Zhang, Z. (2019). Text guided person image synthesis. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Zhu, J.Y., Park, T., Isola, P., & Efros, A.A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In International Conference on Computer Vision. Zhu, J.Y., Park, T., Isola, P., & Efros, A.A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In International Conference on Computer Vision.
Zurück zum Zitat Zhu, W., Yang, Z., Di, Z., Wu, W., Wang, Y., & Loy, C.C. (2022). Mocanet: Motion retargeting in-the-wild via canonicalization networks. In AAAI. Zhu, W., Yang, Z., Di, Z., Wu, W., Wang, Y., & Loy, C.C. (2022). Mocanet: Motion retargeting in-the-wild via canonicalization networks. In AAAI.
Zurück zum Zitat Zhu, Z., Huang, T., Shi, B., Yu, M., Wang, B., & Bai, X. (2019). Progressive pose attention transfer for person image generation. In IEEE conference on computer vision and pattern recognition. Zhu, Z., Huang, T., Shi, B., Yu, M., Wang, B., & Bai, X. (2019). Progressive pose attention transfer for person image generation. In IEEE conference on computer vision and pattern recognition.
Zurück zum Zitat Zhuang, W., Wang, C., Chai, J., Wang, Y., Shao, M., & Xia, S. (2022). Music2dance: Dancenet for music-driven dance generation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) Zhuang, W., Wang, C., Chai, J., Wang, Y., Shao, M., & Xia, S. (2022). Music2dance: Dancenet for music-driven dance generation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)
Metadaten
Titel
Conditional Temporal Variational AutoEncoder for Action Video Prediction
verfasst von
Xiaogang Xu
Yi Wang
Liwei Wang
Bei Yu
Jiaya Jia
Publikationsdatum
18.06.2023
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 10/2023
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-023-01832-8

Weitere Artikel der Ausgabe 10/2023

International Journal of Computer Vision 10/2023 Zur Ausgabe

S.I. : Computer Vision Approach for Animal Tracking and Modeling

DOVE: Learning Deformable 3D Objects by Watching Videos

Premium Partner