Skip to main content
Erschienen in: International Journal of Computer Vision 1/2015

01.05.2015

Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

verfasst von: Sijin Li, Zhi-Qiang Liu, Antoni B. Chan

Erschienen in: International Journal of Computer Vision | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a heterogeneous multi-task learning framework for human pose estimation from monocular images using a deep convolutional neural network. In particular, we simultaneously learn a human pose regressor and sliding-window body-part and joint-point detectors in a deep network architecture. We show that including the detection tasks helps to regularize the network, directing it to converge to a good solution. We report competitive and state-of-art results on several datasets. We also empirically show that the learned neurons in the middle layer of our network are tuned to localized body parts.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
2
As pointed out in (Hara and Chellappa 2013; Pishchulin et al. 2012), the code in the Buffy toolkit does not compute PCP correctly.
 
3
Since we have different definitions of torso and head parts, we do not show the evaluation of these parts here.
 
Literatur
Zurück zum Zitat Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. International Journal of Computer Vision, 87(1–2), 28–52.CrossRef Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. International Journal of Computer Vision, 87(1–2), 28–52.CrossRef
Zurück zum Zitat Dalal, N., & Triggs, B. (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition. Dalal, N., & Triggs, B. (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Dantone, M., Gall, J., Leistner, C., & van Gool L. (2013) Human pose estimation from still images using body parts dependent joint regressors. In: IEEE Conference on Computer Vision and Pattern Recognition. Dantone, M., Gall, J., Leistner, C., & van Gool L. (2013) Human pose estimation from still images using body parts dependent joint regressors. In: IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Eichner, M., & Ferrari, V. (2009a) Better appearance models for pictorial structures. In: British Machine Vision Conference, pp 1–11. Eichner, M., & Ferrari, V. (2009a) Better appearance models for pictorial structures. In: British Machine Vision Conference, pp 1–11.
Zurück zum Zitat Eichner, M., & Ferrari, V. (2010) We are family: Joint pose estimation of multiple persons. In: European Conference.on Computer Vision. Eichner, M., & Ferrari, V. (2010) We are family: Joint pose estimation of multiple persons. In: European Conference.on Computer Vision.
Zurück zum Zitat Eichner, M., & Ferrari, V. (2012). Human pose co-estimation and applications. IEEE Trans Pattern Anal Mach Intell. Eichner, M., & Ferrari, V. (2012). Human pose co-estimation and applications. IEEE Trans Pattern Anal Mach Intell.
Zurück zum Zitat Eichner, M., Marin-Jimenez, M., Zisserman, A., & Ferrari, V. (2012). 2d articulated human pose estimation and retrieval in (almost) unconstrained still images. International Journal of Computer Vision, 99(2), 190–214.CrossRefMathSciNet Eichner, M., Marin-Jimenez, M., Zisserman, A., & Ferrari, V. (2012). 2d articulated human pose estimation and retrieval in (almost) unconstrained still images. International Journal of Computer Vision, 99(2), 190–214.CrossRefMathSciNet
Zurück zum Zitat Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.MATHMathSciNet Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.MATHMathSciNet
Zurück zum Zitat Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1915–1929. Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1915–1929.
Zurück zum Zitat Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.CrossRef Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.CrossRef
Zurück zum Zitat Gülçehrem, C., & Bengio, Y. (2013) Knowledge matters: Importance of prior information for optimization. In: International Conference on Learning Representations. Gülçehrem, C., & Bengio, Y. (2013) Knowledge matters: Importance of prior information for optimization. In: International Conference on Learning Representations.
Zurück zum Zitat Hara, K., & Chellappa, R. (2013) Computationally efficient regression on a dependency graph for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition. Hara, K., & Chellappa, R. (2013) Computationally efficient regression on a dependency graph for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Jain, A., Tompson, J., Andriluka, M., Taylor, G. W., & Bregler, C. (2014) Learning human pose estimation features with convolutional networks. In: International Conference on Learning Representations. Jain, A., Tompson, J., Andriluka, M., Taylor, G. W., & Bregler, C. (2014) Learning human pose estimation features with convolutional networks. In: International Conference on Learning Representations.
Zurück zum Zitat Johnson, S., & Everingham, M. (2011) Learning effective human pose estimation from inaccurate annotation. In: IEEE Conference on Computer Vision and Pattern Recognition. Johnson, S., & Everingham, M. (2011) Learning effective human pose estimation from inaccurate annotation. In: IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems.
Zurück zum Zitat Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., & Ng, A. (2012) Building high-level features using large scale unsupervised learning. In: International Conference on Machine Learning. Le, Q., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G., Dean, J., & Ng, A. (2012) Building high-level features using large scale unsupervised learning. In: International Conference on Machine Learning.
Zurück zum Zitat van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.MATH van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.MATH
Zurück zum Zitat Nair, V., & Hinton, G. E. (2010) Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning. Nair, V., & Hinton, G. E. (2010) Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning.
Zurück zum Zitat Pishchulin, L., Jain, A., Andriluka, M., Thormaehlen, T., & Schiele, B. (2012) Articulated people detection and pose estimation: Reshaping the future. In: IEEE Conference on Computer Vision and Pattern Recognition. Pishchulin, L., Jain, A., Andriluka, M., Thormaehlen, T., & Schiele, B. (2012) Articulated people detection and pose estimation: Reshaping the future. In: IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Pishchulin, L., Andriluka, M., Gehler, P., & Schiele, B. (2013) Poselet conditioned pictorial structures. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 588–595. Pishchulin, L., Andriluka, M., Gehler, P., & Schiele, B. (2013) Poselet conditioned pictorial structures. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 588–595.
Zurück zum Zitat Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-propagating errors. In J. A. Anderson & E. Rosenfeld (Eds.), Neurocomputing: Foundations of research (pp. 696–699). Cambridge, MA: MIT Press. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-propagating errors. In J. A. Anderson & E. Rosenfeld (Eds.), Neurocomputing: Foundations of research (pp. 696–699). Cambridge, MA: MIT Press.
Zurück zum Zitat Sapp, B., & Taskar, B. (2013) Modec: Multimodal decomposable models for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition. Sapp, B., & Taskar, B. (2013) Modec: Multimodal decomposable models for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Sapp, B., Toshev, A., & Taskar, B. (2010) Cascaded models for articulated pose estimation. In: European Conference on Computer Vision. Sapp, B., Toshev, A., & Taskar, B. (2010) Cascaded models for articulated pose estimation. In: European Conference on Computer Vision.
Zurück zum Zitat Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011) Real-time human pose recognition in parts from single depth images. In: IEEE Conference on Computer Vision and Pattern Recognition. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011) Real-time human pose recognition in parts from single depth images. In: IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning, 15, 1929–1958. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning, 15, 1929–1958.
Zurück zum Zitat Sun, Y., Wang, X., & Tang, X. (2013) Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition. Sun, Y., Wang, X., & Tang, X. (2013) Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Toshev, A., & Szegedy, C. (2014) Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition. Toshev, A., & Szegedy, C. (2014) Deeppose: Human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Weston, J., Ratle, F., & Collobert, R. (2008) Deep learning via semi-supervised embedding. In: International Conference on Machine Learning. Weston, J., Ratle, F., & Collobert, R. (2008) Deep learning via semi-supervised embedding. In: International Conference on Machine Learning.
Zurück zum Zitat Yang, X., Kim, S., & Xing, E. P. (2009) Heterogeneous multitask learning with joint sparsity constraints. In: Neural Information Processing Systems. Yang, X., Kim, S., & Xing, E. P. (2009) Heterogeneous multitask learning with joint sparsity constraints. In: Neural Information Processing Systems.
Zurück zum Zitat Yang, Y., & Ramanan, D. (2011) Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Conference on Computer Vision and Pattern Recognition. Yang, Y., & Ramanan, D. (2011) Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Conference on Computer Vision and Pattern Recognition.
Zurück zum Zitat Yang, Y., & Ramanan, D. (2013). Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Analysis and Machine Intelligence, 35(12), 2878–2890.CrossRef Yang, Y., & Ramanan, D. (2013). Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Analysis and Machine Intelligence, 35(12), 2878–2890.CrossRef
Zurück zum Zitat Yu, K., Tresp, V., & Schwaighofer, A. (2005) Learning gaussian processes from multiple tasks. In: International Conference on Machine Learning, pp 1012–1019. Yu, K., Tresp, V., & Schwaighofer, A. (2005) Learning gaussian processes from multiple tasks. In: International Conference on Machine Learning, pp 1012–1019.
Zurück zum Zitat Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision – ECCV 2014. Lecture Notes in Computer Science (Vol. 8689, pp. 818–833). Springer. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision – ECCV 2014. Lecture Notes in Computer Science (Vol. 8689, pp. 818–833). Springer.
Metadaten
Titel
Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network
verfasst von
Sijin Li
Zhi-Qiang Liu
Antoni B. Chan
Publikationsdatum
01.05.2015
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 1/2015
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-014-0767-8

Weitere Artikel der Ausgabe 1/2015

International Journal of Computer Vision 1/2015 Zur Ausgabe

Premium Partner