Skip to main content

2018 | OriginalPaper | Buchkapitel

Refining the Pose: Training and Use of Deep Recurrent Autoencoders for Improving Human Pose Estimation

verfasst von : Niall McLaughlin, Jesus Martinez del Rincon

Erschienen in: Articulated Motion and Deformable Objects

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, a discriminative human pose estimation system based on deep learning is proposed for monocular video-sequences. Our approach combines a simple but efficient Convolutional Neural Network that directly regresses the 3D pose estimation with a recurrent denoising autoencoder that provides pose refinement using the temporal information contained in the sequence of previous frames. Our architecture is also able to provide an integrated training between both parts in order to better model the space of activities, where noisy but realistic poses produced by the partially trained CNN are used to enhance the training of the autoencoder. The system has been evaluated in two standard datasets, HumanEva-I and Human3.6M, comprising more than 15 different activities. We show that our simple architecture can provide state of the art results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bo, L., Sminchisescu, C.: Twin gaussian processes for structured prediction. IJCV (2010) Bo, L., Sminchisescu, C.: Twin gaussian processes for structured prediction. IJCV (2010)
2.
Zurück zum Zitat Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In: ECCV (2016)CrossRef Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In: ECCV (2016)CrossRef
3.
Zurück zum Zitat Coskun, H., Achilles, F., DiPietro, R., Navab, N., Tombari, F.: Long short-term memory kalman filters:recurrent neural estimators for pose regularization. In: ICCV (2017) Coskun, H., Achilles, F., DiPietro, R., Navab, N., Tombari, F.: Long short-term memory kalman filters:recurrent neural estimators for pose regularization. In: ICCV (2017)
4.
Zurück zum Zitat Gall, J., Rosenhahn, B., Brox, T., Seidel, H.P.: Optimization and filtering for human motion capture. IJCV (2010) Gall, J., Rosenhahn, B., Brox, T., Seidel, H.P.: Optimization and filtering for human motion capture. IJCV (2010)
5.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)CrossRef
6.
Zurück zum Zitat Ionescu, C., Papava, I., Olaru, V., Sminchisescu, C.: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. PAMI (2014)CrossRef Ionescu, C., Papava, I., Olaru, V., Sminchisescu, C.: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. PAMI (2014)CrossRef
8.
Zurück zum Zitat Lewandowski, M., Martinez, J., Makris, D., Nebel, J.C.: Temporal extension of laplacian eigenmaps for unsupervised dimensionality reduction of time series. In: ICPR (2010) Lewandowski, M., Martinez, J., Makris, D., Nebel, J.C.: Temporal extension of laplacian eigenmaps for unsupervised dimensionality reduction of time series. In: ICPR (2010)
9.
Zurück zum Zitat Lin, M., Liang, X., Wang, K., Lin, L.: Recurrent 3d pose sequence machines. In: CVPR (2017) Lin, M., Liang, X., Wang, K., Lin, L.: Recurrent 3d pose sequence machines. In: CVPR (2017)
10.
Zurück zum Zitat Mehta, D., H.Rhodin, Casas, D., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 3DV (2017) Mehta, D., H.Rhodin, Casas, D., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 3DV (2017)
11.
Zurück zum Zitat Milbich, T., Bautista, M., Sutter, E., Ommer, B.: Unsupervised video understanding by reconciliation of posture similarities (2017) Milbich, T., Bautista, M., Sutter, E., Ommer, B.: Unsupervised video understanding by reconciliation of posture similarities (2017)
12.
Zurück zum Zitat Moreno-Noguer, F.: 3d human pose estimation from a single image via distance matrix regression. In: CVPR (2017) Moreno-Noguer, F.: 3d human pose estimation from a single image via distance matrix regression. In: CVPR (2017)
13.
Zurück zum Zitat Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV. pp. 483–499. Springer (2016)CrossRef Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: ECCV. pp. 483–499. Springer (2016)CrossRef
14.
Zurück zum Zitat Popa, A.I., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2d and 3d human sensing (2017) Popa, A.I., Zanfir, M., Sminchisescu, C.: Deep multitask architecture for integrated 2d and 3d human sensing (2017)
15.
Zurück zum Zitat Sanzari, M., Ntouskos, V., Pirri, F.: Bayesian image based 3d pose estimation. In: ECCV. pp. 566–582. Springer (2016)CrossRef Sanzari, M., Ntouskos, V., Pirri, F.: Bayesian image based 3d pose estimation. In: ECCV. pp. 566–582. Springer (2016)CrossRef
16.
Zurück zum Zitat Sigal, L., Balan, A.O., Black, M.J.: Humaneva: Synchronized video and motion capture dataset and baseline for evaluation of articulated human motion. IJCV 87(1), 4–27 (2010)CrossRef Sigal, L., Balan, A.O., Black, M.J.: Humaneva: Synchronized video and motion capture dataset and baseline for evaluation of articulated human motion. IJCV 87(1), 4–27 (2010)CrossRef
17.
Zurück zum Zitat Sun, K., Lan, C., Xing, J., Zeng, W., Liu, D., Wang, J.: Human pose estimation using global and local normalization. In: arXiv:1709.07220v1 (2017) Sun, K., Lan, C., Xing, J., Zeng, W., Liu, D., Wang, J.: Human pose estimation using global and local normalization. In: arXiv:​1709.​07220v1 (2017)
18.
Zurück zum Zitat Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., Fua, P.: Structured prediction of 3d human pose with deep neural networks. In: BMVC (2016) Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., Fua, P.: Structured prediction of 3d human pose with deep neural networks. In: BMVC (2016)
19.
Zurück zum Zitat Tekin, B., Márquez-Neila, P., Salzmann, M., Fua, P.: Fusing 2d uncertainty and 3d cues for monocular body pose estimation. In: ICCV (2017) Tekin, B., Márquez-Neila, P., Salzmann, M., Fua, P.: Fusing 2d uncertainty and 3d cues for monocular body pose estimation. In: ICCV (2017)
20.
Zurück zum Zitat Tekin, B., Sun, X., Wang, X., Lepetit, V., Fua, P.: Predicting people’s 3d poses from short sequences. arXiv preprint arXiv:1504.08200 (2015) Tekin, B., Sun, X., Wang, X., Lepetit, V., Fua, P.: Predicting people’s 3d poses from short sequences. arXiv preprint arXiv:​1504.​08200 (2015)
21.
Zurück zum Zitat Tome, D., Russell, C., Agapito, L.: Lifting from the deep: Convolutional 3d pose estimation from a single image. In: CVPR (2017) Tome, D., Russell, C., Agapito, L.: Lifting from the deep: Convolutional 3d pose estimation from a single image. In: CVPR (2017)
22.
Zurück zum Zitat Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: CVPR. pp. 1653–1660 (2014) Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: CVPR. pp. 1653–1660 (2014)
23.
Zurück zum Zitat Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders. Journal of Machine Learning Research 11(Dec), 3371–3408 (2010) Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders. Journal of Machine Learning Research 11(Dec), 3371–3408 (2010)
24.
Zurück zum Zitat Zhou, X., Zhu, M., Leonardos, S., Derpanis, K., Daniilidis, K.: Sparseness meets deepness: 3d human pose estimation from monocular video. In: CVPR (2016) Zhou, X., Zhu, M., Leonardos, S., Derpanis, K., Daniilidis, K.: Sparseness meets deepness: 3d human pose estimation from monocular video. In: CVPR (2016)
25.
Zurück zum Zitat Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: ICCV (2017) Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: ICCV (2017)
Metadaten
Titel
Refining the Pose: Training and Use of Deep Recurrent Autoencoders for Improving Human Pose Estimation
verfasst von
Niall McLaughlin
Jesus Martinez del Rincon
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-94544-6_2