Skip to main content
Top
Published in: International Journal of Computer Vision 9/2021

16-06-2021

Multi-level Motion Attention for Human Motion Prediction

Authors: Wei Mao, Miaomiao Liu, Mathieu Salzmann, Hongdong Li

Published in: International Journal of Computer Vision | Issue 9/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Human motion prediction aims to forecast future human poses given a historical motion. Whether based on recurrent or feed-forward neural networks, existing learning based methods fail to model the observation that human motion tends to repeat itself, even for complex sports actions and cooking activities. Here, we introduce an attention based feed-forward network that explicitly leverages this observation. In particular, instead of modeling frame-wise attention via pose similarity, we propose to extract motion attention to capture the similarity between the current motion context and the historical motion sub-sequences. In this context, we study the use of different types of attention, computed at joint, body part, and full pose levels. Aggregating the relevant past motions and processing the result with a graph convolutional network allows us to effectively exploit motion patterns from the long-term history to predict the future poses. Our experiments on Human3.6M, AMASS and 3DPW validate the benefits of our approach for both periodical and non-periodical actions. Thanks to our attention model, it yields state-of-the-art results on all three datasets. Our code is available at https://​github.​com/​wei-mao-2019/​HisRepItself.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2009). Nonrigid structure from motion in trajectory space. In: Advances in neural information processing systems, pp 41–48. Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2009). Nonrigid structure from motion in trajectory space. In: Advances in neural information processing systems, pp 41–48.
go back to reference Arjovsky, M., & Bottou, L. (2017). Towards principled methods for training generative adversarial networks. In: ICLR. Arjovsky, M., & Bottou, L. (2017). Towards principled methods for training generative adversarial networks. In: ICLR.
go back to reference Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate.
go back to reference Brand, M., & Hertzmann, A. (2000). Style machines. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., pp 183–192. Brand, M., & Hertzmann, A. (2000). Style machines. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., pp 183–192.
go back to reference Butepage, J., Black, M.J., Kragic, D., & Kjellstrom, H. (2017). Deep representation learning for human motion prediction and classification. In: CVPR. Butepage, J., Black, M.J., Kragic, D., & Kjellstrom, H. (2017). Deep representation learning for human motion prediction and classification. In: CVPR.
go back to reference Cai, Y., Huang, L., Wang, Y., Cham, T.J., Cai, J., Yuan, J., Liu, J., Yang, X., Zhu, Y., Shen, X., et al. (2020). Learning progressive joint propagation for human motion prediction. In: ECCV. Cai, Y., Huang, L., Wang, Y., Cham, T.J., Cai, J., Yuan, J., Liu, J., Yang, X., Zhu, Y., Shen, X., et al. (2020). Learning progressive joint propagation for human motion prediction. In: ECCV.
go back to reference Fragkiadaki, K., Levine, S., Felsen, P., & Malik, J. (2015). Recurrent network models for human dynamics. In: ICCV, pp 4346–4354. Fragkiadaki, K., Levine, S., Felsen, P., & Malik, J. (2015). Recurrent network models for human dynamics. In: ICCV, pp 4346–4354.
go back to reference Gong, H., Sim, J., Likhachev, M., & Shi, J. (2011). Multi-hypothesis motion planning for visual object tracking. In: ICCV, IEEE, pp 619–626. Gong, H., Sim, J., Likhachev, M., & Shi, J. (2011). Multi-hypothesis motion planning for visual object tracking. In: ICCV, IEEE, pp 619–626.
go back to reference Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., & Ororbia, A.G. (2019). A neural temporal model for human motion prediction. In: CVPR, pp 12116–12125. Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., & Ororbia, A.G. (2019). A neural temporal model for human motion prediction. In: CVPR, pp 12116–12125.
go back to reference Gui, L.Y., Wang, Y.X., Liang, X., & Moura, J.M. (2018). Adversarial geometry-aware human motion prediction. In: ECCV, pp 786–803. Gui, L.Y., Wang, Y.X., Liang, X., & Moura, J.M. (2018). Adversarial geometry-aware human motion prediction. In: ECCV, pp 786–803.
go back to reference Hernandez, A., Gall, J., & Moreno-Noguer, F. (2019). Human motion prediction via spatio-temporal inpainting. In: ICCV, pp 7134–7143. Hernandez, A., Gall, J., & Moreno-Noguer, F. (2019). Human motion prediction via spatio-temporal inpainting. In: ICCV, pp 7134–7143.
go back to reference Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2014). Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1325–1339.CrossRef Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2014). Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1325–1339.CrossRef
go back to reference Jain, A., Zamir, A.R., Savarese, S., & Saxena, A. (2016). Structural-rnn: Deep learning on spatio-temporal graphs. In: CVPR, pp 5308–5317. Jain, A., Zamir, A.R., Savarese, S., & Saxena, A. (2016). Structural-rnn: Deep learning on spatio-temporal graphs. In: CVPR, pp 5308–5317.
go back to reference Kipf, T.N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In: ICLR. Kipf, T.N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In: ICLR.
go back to reference Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Skip-thought vectors. In: NIPS, pp 3294–3302. Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Skip-thought vectors. In: NIPS, pp 3294–3302.
go back to reference Koppula, H.S., & Saxena, A. (2013). Anticipating human activities for reactive robotic response. In: IROS, Tokyo, p 2071. Koppula, H.S., & Saxena, A. (2013). Anticipating human activities for reactive robotic response. In: IROS, Tokyo, p 2071.
go back to reference Kovar, L., Gleicher, M., & Pighin, F. (2008). Motion graphs. In: ACM SIGGRAPH 2008 classes, pp 1–10. Kovar, L., Gleicher, M., & Pighin, F. (2008). Motion graphs. In: ACM SIGGRAPH 2008 classes, pp 1–10.
go back to reference Levine, S., Wang, J. M., Haraux, A., Popović, Z., & Koltun, V. (2012). Continuous character control with low-dimensional embeddings. ACM Transactions on Graphics, 31(4), 28.CrossRef Levine, S., Wang, J. M., Haraux, A., Popović, Z., & Koltun, V. (2012). Continuous character control with low-dimensional embeddings. ACM Transactions on Graphics, 31(4), 28.CrossRef
go back to reference Li, C., Zhang, Z., Lee, W.S., Lee, G.H. (2018a). Convolutional sequence to sequence model for human dynamics. In: CVPR, pp 5226–5234. Li, C., Zhang, Z., Lee, W.S., Lee, G.H. (2018a). Convolutional sequence to sequence model for human dynamics. In: CVPR, pp 5226–5234.
go back to reference Li, X., Li, H., Joo, H., Liu, Y., & Sheikh, Y. (2018b). Structure from recurrent motion: From rigidity to recurrency. In: CVPR, pp 3032–3040. Li, X., Li, H., Joo, H., Liu, Y., & Sheikh, Y. (2018b). Structure from recurrent motion: From rigidity to recurrency. In: CVPR, pp 3032–3040.
go back to reference Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M. J. (2015). SMPL: A skinned multi-person linear model. ACM Trans Graphics (Proc SIGGRAPH Asia), 34(6), 248:1-248:16. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M. J. (2015). SMPL: A skinned multi-person linear model. ACM Trans Graphics (Proc SIGGRAPH Asia), 34(6), 248:1-248:16.
go back to reference Mao, W., Liu, M., Salzmann, M., & Li, H. (2019). Learning trajectory dependencies for human motion prediction. In: ICCV, pp 9489–9497. Mao, W., Liu, M., Salzmann, M., & Li, H. (2019). Learning trajectory dependencies for human motion prediction. In: ICCV, pp 9489–9497.
go back to reference Mao, W., Liu, M., & Salzmann, M. (2020). History repeats itself: Human motion prediction via motion attention. In: ECCV. Mao, W., Liu, M., & Salzmann, M. (2020). History repeats itself: Human motion prediction via motion attention. In: ECCV.
go back to reference von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., & Pons-Moll, G. (2018). Recovering accurate 3d human pose in the wild using imus and a moving camera. In: ECCV. von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., & Pons-Moll, G. (2018). Recovering accurate 3d human pose in the wild using imus and a moving camera. In: ECCV.
go back to reference Martinez, J., Black, M.J., & Romero, J. (2017). On human motion prediction using recurrent neural networks. In: CVPR. Martinez, J., Black, M.J., & Romero, J. (2017). On human motion prediction using recurrent neural networks. In: CVPR.
go back to reference Pavllo, D., Feichtenhofer, C., Auli, M., & Grangier, D. (2019). Modeling human motion with quaternion-based neural networks. IJCV pp 1–18. Pavllo, D., Feichtenhofer, C., Auli, M., & Grangier, D. (2019). Modeling human motion with quaternion-based neural networks. IJCV pp 1–18.
go back to reference Romero, J., Tzionas, D., & Black, M.J. (2017). Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc SIGGRAPH Asia) 36(6). Romero, J., Tzionas, D., & Black, M.J. (2017). Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc SIGGRAPH Asia) 36(6).
go back to reference Runia, T.F., Snoek, C.G., & Smeulders, A.W. (2018). Real-world repetition estimation by div, grad and curl. In: CVPR, pp 9009–9017. Runia, T.F., Snoek, C.G., & Smeulders, A.W. (2018). Real-world repetition estimation by div, grad and curl. In: CVPR, pp 9009–9017.
go back to reference Sidenbladh, H., Black, M.J., & Sigal, L. (2002). Implicit probabilistic models of human motion for synthesis and tracking. In: ECCV, Springer, pp 784–800. Sidenbladh, H., Black, M.J., & Sigal, L. (2002). Implicit probabilistic models of human motion for synthesis and tracking. In: ECCV, Springer, pp 784–800.
go back to reference Sutskever, I., Martens, J., & Hinton, G.E. (2011). Generating text with recurrent neural networks. In: ICML, pp 1017–1024. Sutskever, I., Martens, J., & Hinton, G.E. (2011). Generating text with recurrent neural networks. In: ICML, pp 1017–1024.
go back to reference Tang, Y., Ma, L., Liu, W., Zheng, W.S. (2018). Long-term human motion prediction by modeling motion context and enhancing motion dynamics. IJCAI 10.24963/ijcai.2018/130. Tang, Y., Ma, L., Liu, W., Zheng, W.S. (2018). Long-term human motion prediction by modeling motion context and enhancing motion dynamics. IJCAI 10.24963/ijcai.2018/130.
go back to reference Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In: NIPS, pp 5998–6008. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In: NIPS, pp 5998–6008.
go back to reference Wang, J. M., Fleet, D. J., & Hertzmann, A. (2008). Gaussian process dynamical models for human motion. Transactions on Pattern Analysis and Machine Intelligence, 30(2), 283–298.CrossRef Wang, J. M., Fleet, D. J., & Hertzmann, A. (2008). Gaussian process dynamical models for human motion. Transactions on Pattern Analysis and Machine Intelligence, 30(2), 283–298.CrossRef
Metadata
Title
Multi-level Motion Attention for Human Motion Prediction
Authors
Wei Mao
Miaomiao Liu
Mathieu Salzmann
Hongdong Li
Publication date
16-06-2021
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 9/2021
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-021-01483-7

Other articles of this Issue 9/2021

International Journal of Computer Vision 9/2021 Go to the issue

Premium Partner