Skip to main content
Top

2018 | OriginalPaper | Chapter

Human Motion Analysis with Deep Metric Learning

Authors : Huseyin Coskun, David Joseph Tan, Sailesh Conjeti, Nassir Navab, Federico Tombari

Published in: Computer Vision – ECCV 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Effectively measuring the similarity between two human motions is necessary for several computer vision tasks such as gait analysis, person identification and action retrieval. Nevertheless, we believe that traditional approaches such as L2 distance or Dynamic Time Warping based on hand-crafted local pose metrics fail to appropriately capture the semantic relationship across motions and, as such, are not suitable for being employed as metrics within these tasks. This work addresses this limitation by means of a triplet-based deep metric learning specifically tailored to deal with human motion data, in particular with the problem of varying input size and computationally expensive hard negative mining due to motion pair alignment. Specifically, we propose (1) a novel metric learning objective based on a triplet architecture and Maximum Mean Discrepancy; as well as, (2) a novel deep architecture based on attentive recurrent neural networks. One benefit of our objective function is that it enforces a better separation within the learned embedding space of the different motion categories by means of the associated distribution moments. At the same time, our attentive recurrent neural network allows processing varying input sizes to a fixed size of embedding while learning to focus on those motion parts that are semantically distinctive. Our experiments on two different datasets demonstrate significant improvements over conventional human motion metrics.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
4.
go back to reference Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, Seattle, WA, vol. 10, pp. 359–370 (1994) Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, Seattle, WA, vol. 10, pp. 359–370 (1994)
5.
go back to reference Che, Z., He, X., Xu, K., Liu, Y.: DECADE: a deep metric learning model for multivariate time series (2017) Che, Z., He, X., Xu, K., Liu, Y.: DECADE: a deep metric learning model for multivariate time series (2017)
6.
go back to reference Chen, C., Zhuang, Y., Nie, F., Yang, Y., Wu, F., Xiao, J.: Learning a 3D human pose distance metric from geometric pose descriptor. IEEE Trans. Vis. Comput. Graph. 17(11), 1676–1689 (2011)CrossRef Chen, C., Zhuang, Y., Nie, F., Yang, Y., Wu, F., Xiao, J.: Learning a 3D human pose distance metric from geometric pose descriptor. IEEE Trans. Vis. Comput. Graph. 17(11), 1676–1689 (2011)CrossRef
7.
go back to reference Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546. IEEE (2005) Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546. IEEE (2005)
8.
go back to reference Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
9.
go back to reference Cuturi, M., Vert, J.P., Birkenes, O., Matsui, T.: A kernel for time series based on global alignments. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 2, pp. II–413. IEEE (2007) Cuturi, M., Vert, J.P., Birkenes, O., Matsui, T.: A kernel for time series based on global alignments. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 2, pp. II–413. IEEE (2007)
10.
go back to reference Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 209–216. ACM (2007) Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 209–216. ACM (2007)
11.
go back to reference Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)CrossRef Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)CrossRef
12.
go back to reference Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013) Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)
13.
go back to reference Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)MathSciNetCrossRef Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)MathSciNetCrossRef
14.
go back to reference Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)MathSciNetMATH Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)MathSciNetMATH
15.
go back to reference Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006) Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006)
16.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
17.
go back to reference Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Patt. Anal. Mach. Intell. 36(7), 1325–1339 (2014)CrossRef Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Patt. Anal. Mach. Intell. 36(7), 1325–1339 (2014)CrossRef
18.
go back to reference Keogh, E.J., Pazzani, M.J.: Derivative dynamic time warping. In: Proceedings of the 2001 SIAM International Conference on Data Mining, pp. 1–11. SIAM (2001) Keogh, E.J., Pazzani, M.J.: Derivative dynamic time warping. In: Proceedings of the 2001 SIAM International Conference on Data Mining, pp. 1–11. SIAM (2001)
19.
go back to reference Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2657–2661. IEEE (2016) Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2657–2661. IEEE (2016)
20.
go back to reference Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1718–1727 (2015) Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1718–1727 (2015)
21.
go back to reference Lin, Z., et al.: A structured self-attentive sentence embedding. In: Proceedings of International Conference on Learning Representations (ICLR) (2017) Lin, Z., et al.: A structured self-attentive sentence embedding. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)
22.
go back to reference López-Méndez, A., Gall, J., Casas, J.R., Van Gool, L.J.: Metric learning from poses for temporal clustering of human motion. In: BMVC, pp. 1–12 (2012) López-Méndez, A., Gall, J., Casas, J.R., Van Gool, L.J.: Metric learning from poses for temporal clustering of human motion. In: BMVC, pp. 1–12 (2012)
23.
go back to reference Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
24.
go back to reference Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 44 (2017)CrossRef Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 44 (2017)CrossRef
25.
go back to reference Mei, J., Liu, M., Wang, Y.F., Gao, H.: Learning a mahalanobis distance-based dynamic time warping measure for multivariate time series classification. IEEE Trans. Cybern. 46(6), 1363–1374 (2016)CrossRef Mei, J., Liu, M., Wang, Y.F., Gao, H.: Learning a mahalanobis distance-based dynamic time warping measure for multivariate time series classification. IEEE Trans. Cybern. 46(6), 1363–1374 (2016)CrossRef
26.
go back to reference Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Proceedings Conference on Neural Information Processing Systems (NIPS), December 2017 Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Proceedings Conference on Neural Information Processing Systems (NIPS), December 2017
27.
go back to reference Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: The IEEE International Conference on Computer Vision (ICCV), October 2017 Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
29.
go back to reference Pei, W., Tax, D.M., van der Maaten, L.: Modeling time series similarity with siamese recurrent networks. CoRR abs/1603.04713 (2016) Pei, W., Tax, D.M., van der Maaten, L.: Modeling time series similarity with siamese recurrent networks. CoRR abs/1603.04713 (2016)
30.
go back to reference Ratanamahatana, C.A., Keogh, E.: Making time-series classification more accurate using learned constraints. In: SIAM (2004) Ratanamahatana, C.A., Keogh, E.: Making time-series classification more accurate using learned constraints. In: SIAM (2004)
31.
go back to reference Rippel, O., Paluri, M., Dollar, P., Bourdev, L.: Metric learning with adaptive density discrimination. In: International Conference on Learning Representations (2016) Rippel, O., Paluri, M., Dollar, P., Bourdev, L.: Metric learning with adaptive density discrimination. In: International Conference on Learning Representations (2016)
32.
go back to reference Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood component analysis. Adv. Neural Inf. Process. Syst. (NIPS) 17, 513–520 (2004) Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood component analysis. Adv. Neural Inf. Process. Syst. (NIPS) 17, 513–520 (2004)
33.
go back to reference Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015) Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
34.
go back to reference Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Advances in Neural Information Processing Systems, pp. 41–48 (2004) Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Advances in Neural Information Processing Systems, pp. 41–48 (2004)
35.
go back to reference Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016) Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)
36.
go back to reference Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4004–4012. IEEE (2016) Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4004–4012. IEEE (2016)
37.
go back to reference Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: The IEEE International Conference on Computer Vision (ICCV), vol. 2 (2017) Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: The IEEE International Conference on Computer Vision (ICCV), vol. 2 (2017)
38.
go back to reference Sutherland, D.J., et al.: Generative models and model criticism via optimized maximum mean discrepancy. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2017) (2017) Sutherland, D.J., et al.: Generative models and model criticism via optimized maximum mean discrepancy. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2017) (2017)
39.
go back to reference Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, pp. 1345–1352 (2007) Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, pp. 1345–1352 (2007)
40.
go back to reference Tian, B.F.Y., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in Euclidean space. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017) Tian, B.F.Y., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in Euclidean space. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)
41.
go back to reference Trigeorgis, G., Nicolaou, M.A., Schuller, B.W., Zafeiriou, S.: Deep canonical time warping for simultaneous alignment and representation learning of sequences. IEEE Trans. Patt. Anal. Mach. Intell. 5, 1128–1138 (2018)CrossRef Trigeorgis, G., Nicolaou, M.A., Schuller, B.W., Zafeiriou, S.: Deep canonical time warping for simultaneous alignment and representation learning of sequences. IEEE Trans. Patt. Anal. Mach. Intell. 5, 1128–1138 (2018)CrossRef
43.
go back to reference Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), October 2017 Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
44.
go back to reference Yin, X., Chen, Q.: Deep metric learning autoencoder for nonlinear temporal alignment of human motion. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2160–2166. IEEE (2016) Yin, X., Chen, Q.: Deep metric learning autoencoder for nonlinear temporal alignment of human motion. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2160–2166. IEEE (2016)
45.
go back to reference Zhang, X., Yu, F.X., Kumar, S., Chang, S.F.: Learning spread-out local feature descriptors. In: The IEEE International Conference on Computer Vision (ICCV), October 2017 Zhang, X., Yu, F.X., Kumar, S., Chang, S.F.: Learning spread-out local feature descriptors. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
46.
go back to reference Zheng, Y., Liu, Q., Chen, E., Zhao, J.L., He, L., Lv, G.: Convolutional nonlinear neighbourhood components analysis for time series classification. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 534–546. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18032-8_42CrossRef Zheng, Y., Liu, Q., Chen, E., Zhao, J.L., He, L., Lv, G.: Convolutional nonlinear neighbourhood components analysis for time series classification. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 534–546. Springer, Cham (2015). https://​doi.​org/​10.​1007/​978-3-319-18032-8_​42CrossRef
47.
go back to reference Zhou, F., Torre, F.: Canonical time warping for alignment of human behavior. In: Advances in Neural Information Processing Systems, pp. 2286–2294 (2009) Zhou, F., Torre, F.: Canonical time warping for alignment of human behavior. In: Advances in Neural Information Processing Systems, pp. 2286–2294 (2009)
48.
go back to reference Zhou, F., De la Torre, F.: Generalized canonical time warping. IEEE Trans. Patt. Anal. Mach. Intell. 38(2), 279–294 (2016)CrossRef Zhou, F., De la Torre, F.: Generalized canonical time warping. IEEE Trans. Patt. Anal. Mach. Intell. 38(2), 279–294 (2016)CrossRef
Metadata
Title
Human Motion Analysis with Deep Metric Learning
Authors
Huseyin Coskun
David Joseph Tan
Sailesh Conjeti
Nassir Navab
Federico Tombari
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-01264-9_41

Premium Partner