Skip to main content
main-content

Tipp

Weitere Kapitel dieses Buchs durch Wischen aufrufen

2018 | OriginalPaper | Buchkapitel

Human Motion Analysis with Deep Metric Learning

verfasst von : Huseyin Coskun, David Joseph Tan, Sailesh Conjeti, Nassir Navab, Federico Tombari

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

share
TEILEN

Abstract

Effectively measuring the similarity between two human motions is necessary for several computer vision tasks such as gait analysis, person identification and action retrieval. Nevertheless, we believe that traditional approaches such as L2 distance or Dynamic Time Warping based on hand-crafted local pose metrics fail to appropriately capture the semantic relationship across motions and, as such, are not suitable for being employed as metrics within these tasks. This work addresses this limitation by means of a triplet-based deep metric learning specifically tailored to deal with human motion data, in particular with the problem of varying input size and computationally expensive hard negative mining due to motion pair alignment. Specifically, we propose (1) a novel metric learning objective based on a triplet architecture and Maximum Mean Discrepancy; as well as, (2) a novel deep architecture based on attentive recurrent neural networks. One benefit of our objective function is that it enforces a better separation within the learned embedding space of the different motion categories by means of the associated distribution moments. At the same time, our attentive recurrent neural network allows processing varying input sizes to a fixed size of embedding while learning to focus on those motion parts that are semantically distinctive. Our experiments on two different datasets demonstrate significant improvements over conventional human motion metrics.
Literatur
4.
Zurück zum Zitat Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, Seattle, WA, vol. 10, pp. 359–370 (1994) Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, Seattle, WA, vol. 10, pp. 359–370 (1994)
5.
Zurück zum Zitat Che, Z., He, X., Xu, K., Liu, Y.: DECADE: a deep metric learning model for multivariate time series (2017) Che, Z., He, X., Xu, K., Liu, Y.: DECADE: a deep metric learning model for multivariate time series (2017)
6.
Zurück zum Zitat Chen, C., Zhuang, Y., Nie, F., Yang, Y., Wu, F., Xiao, J.: Learning a 3D human pose distance metric from geometric pose descriptor. IEEE Trans. Vis. Comput. Graph. 17(11), 1676–1689 (2011) CrossRef Chen, C., Zhuang, Y., Nie, F., Yang, Y., Wu, F., Xiao, J.: Learning a 3D human pose distance metric from geometric pose descriptor. IEEE Trans. Vis. Comput. Graph. 17(11), 1676–1689 (2011) CrossRef
7.
Zurück zum Zitat Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546. IEEE (2005) Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546. IEEE (2005)
8.
Zurück zum Zitat Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
9.
Zurück zum Zitat Cuturi, M., Vert, J.P., Birkenes, O., Matsui, T.: A kernel for time series based on global alignments. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 2, pp. II–413. IEEE (2007) Cuturi, M., Vert, J.P., Birkenes, O., Matsui, T.: A kernel for time series based on global alignments. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 2, pp. II–413. IEEE (2007)
10.
Zurück zum Zitat Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 209–216. ACM (2007) Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 209–216. ACM (2007)
11.
Zurück zum Zitat Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996) CrossRef Eddy, S.R.: Hidden markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996) CrossRef
12.
Zurück zum Zitat Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013) Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)
13.
Zurück zum Zitat Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017) MathSciNetCrossRef Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017) MathSciNetCrossRef
14.
Zurück zum Zitat Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012) MathSciNetMATH Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012) MathSciNetMATH
15.
Zurück zum Zitat Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006) Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006)
16.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) CrossRef
17.
Zurück zum Zitat Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Patt. Anal. Mach. Intell. 36(7), 1325–1339 (2014) CrossRef Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Patt. Anal. Mach. Intell. 36(7), 1325–1339 (2014) CrossRef
18.
Zurück zum Zitat Keogh, E.J., Pazzani, M.J.: Derivative dynamic time warping. In: Proceedings of the 2001 SIAM International Conference on Data Mining, pp. 1–11. SIAM (2001) Keogh, E.J., Pazzani, M.J.: Derivative dynamic time warping. In: Proceedings of the 2001 SIAM International Conference on Data Mining, pp. 1–11. SIAM (2001)
19.
Zurück zum Zitat Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2657–2661. IEEE (2016) Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch normalized recurrent neural networks. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2657–2661. IEEE (2016)
20.
Zurück zum Zitat Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1718–1727 (2015) Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1718–1727 (2015)
21.
Zurück zum Zitat Lin, Z., et al.: A structured self-attentive sentence embedding. In: Proceedings of International Conference on Learning Representations (ICLR) (2017) Lin, Z., et al.: A structured self-attentive sentence embedding. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)
22.
Zurück zum Zitat López-Méndez, A., Gall, J., Casas, J.R., Van Gool, L.J.: Metric learning from poses for temporal clustering of human motion. In: BMVC, pp. 1–12 (2012) López-Méndez, A., Gall, J., Casas, J.R., Van Gool, L.J.: Metric learning from poses for temporal clustering of human motion. In: BMVC, pp. 1–12 (2012)
23.
Zurück zum Zitat Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
24.
Zurück zum Zitat Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 44 (2017) CrossRef Mehta, D., et al.: VNect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 44 (2017) CrossRef
25.
Zurück zum Zitat Mei, J., Liu, M., Wang, Y.F., Gao, H.: Learning a mahalanobis distance-based dynamic time warping measure for multivariate time series classification. IEEE Trans. Cybern. 46(6), 1363–1374 (2016) CrossRef Mei, J., Liu, M., Wang, Y.F., Gao, H.: Learning a mahalanobis distance-based dynamic time warping measure for multivariate time series classification. IEEE Trans. Cybern. 46(6), 1363–1374 (2016) CrossRef
26.
Zurück zum Zitat Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Proceedings Conference on Neural Information Processing Systems (NIPS), December 2017 Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Proceedings Conference on Neural Information Processing Systems (NIPS), December 2017
27.
Zurück zum Zitat Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: The IEEE International Conference on Computer Vision (ICCV), October 2017 Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
29.
Zurück zum Zitat Pei, W., Tax, D.M., van der Maaten, L.: Modeling time series similarity with siamese recurrent networks. CoRR abs/1603.04713 (2016) Pei, W., Tax, D.M., van der Maaten, L.: Modeling time series similarity with siamese recurrent networks. CoRR abs/1603.04713 (2016)
30.
Zurück zum Zitat Ratanamahatana, C.A., Keogh, E.: Making time-series classification more accurate using learned constraints. In: SIAM (2004) Ratanamahatana, C.A., Keogh, E.: Making time-series classification more accurate using learned constraints. In: SIAM (2004)
31.
Zurück zum Zitat Rippel, O., Paluri, M., Dollar, P., Bourdev, L.: Metric learning with adaptive density discrimination. In: International Conference on Learning Representations (2016) Rippel, O., Paluri, M., Dollar, P., Bourdev, L.: Metric learning with adaptive density discrimination. In: International Conference on Learning Representations (2016)
32.
Zurück zum Zitat Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood component analysis. Adv. Neural Inf. Process. Syst. (NIPS) 17, 513–520 (2004) Roweis, S., Hinton, G., Salakhutdinov, R.: Neighbourhood component analysis. Adv. Neural Inf. Process. Syst. (NIPS) 17, 513–520 (2004)
33.
Zurück zum Zitat Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015) Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
34.
Zurück zum Zitat Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Advances in Neural Information Processing Systems, pp. 41–48 (2004) Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Advances in Neural Information Processing Systems, pp. 41–48 (2004)
35.
Zurück zum Zitat Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016) Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)
36.
Zurück zum Zitat Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4004–4012. IEEE (2016) Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4004–4012. IEEE (2016)
37.
Zurück zum Zitat Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: The IEEE International Conference on Computer Vision (ICCV), vol. 2 (2017) Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: The IEEE International Conference on Computer Vision (ICCV), vol. 2 (2017)
38.
Zurück zum Zitat Sutherland, D.J., et al.: Generative models and model criticism via optimized maximum mean discrepancy. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2017) (2017) Sutherland, D.J., et al.: Generative models and model criticism via optimized maximum mean discrepancy. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2017) (2017)
39.
Zurück zum Zitat Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, pp. 1345–1352 (2007) Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, pp. 1345–1352 (2007)
40.
Zurück zum Zitat Tian, B.F.Y., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in Euclidean space. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017) Tian, B.F.Y., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in Euclidean space. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)
41.
Zurück zum Zitat Trigeorgis, G., Nicolaou, M.A., Schuller, B.W., Zafeiriou, S.: Deep canonical time warping for simultaneous alignment and representation learning of sequences. IEEE Trans. Patt. Anal. Mach. Intell. 5, 1128–1138 (2018) CrossRef Trigeorgis, G., Nicolaou, M.A., Schuller, B.W., Zafeiriou, S.: Deep canonical time warping for simultaneous alignment and representation learning of sequences. IEEE Trans. Patt. Anal. Mach. Intell. 5, 1128–1138 (2018) CrossRef
43.
Zurück zum Zitat Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), October 2017 Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
44.
Zurück zum Zitat Yin, X., Chen, Q.: Deep metric learning autoencoder for nonlinear temporal alignment of human motion. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2160–2166. IEEE (2016) Yin, X., Chen, Q.: Deep metric learning autoencoder for nonlinear temporal alignment of human motion. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 2160–2166. IEEE (2016)
45.
Zurück zum Zitat Zhang, X., Yu, F.X., Kumar, S., Chang, S.F.: Learning spread-out local feature descriptors. In: The IEEE International Conference on Computer Vision (ICCV), October 2017 Zhang, X., Yu, F.X., Kumar, S., Chang, S.F.: Learning spread-out local feature descriptors. In: The IEEE International Conference on Computer Vision (ICCV), October 2017
47.
Zurück zum Zitat Zhou, F., Torre, F.: Canonical time warping for alignment of human behavior. In: Advances in Neural Information Processing Systems, pp. 2286–2294 (2009) Zhou, F., Torre, F.: Canonical time warping for alignment of human behavior. In: Advances in Neural Information Processing Systems, pp. 2286–2294 (2009)
48.
Zurück zum Zitat Zhou, F., De la Torre, F.: Generalized canonical time warping. IEEE Trans. Patt. Anal. Mach. Intell. 38(2), 279–294 (2016) CrossRef Zhou, F., De la Torre, F.: Generalized canonical time warping. IEEE Trans. Patt. Anal. Mach. Intell. 38(2), 279–294 (2016) CrossRef
Metadaten
Titel
Human Motion Analysis with Deep Metric Learning
verfasst von
Huseyin Coskun
David Joseph Tan
Sailesh Conjeti
Nassir Navab
Federico Tombari
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01264-9_41

Premium Partner