Skip to main content
Erschienen in: Machine Vision and Applications 1-2/2017

20.07.2016 | Original Paper

Human arm pose modeling with learned features using joint convolutional neural network

verfasst von: Chongguo Li, Nelson H. C. Yung, Xing Sun, Edmund Y. Lam

Erschienen in: Machine Vision and Applications | Ausgabe 1-2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a new approach to model human arm pose configuration from still images based on learned features and arm part structure constraints. The subjects in still images have no assumption with regards to clothing style, action category and background, so our model has to accommodate these uncertainties. Proposed approach uses an energy model that incorporates the dependence relationships among arm joints and arm parts, where the potentials represent their occurrence probabilities. Positive and negative instances are computed from input image, using multi-scale image patches to capture the details of arm joints and arm parts. A joint convolutional neural network is then developed for feature extraction. Local rigidity of arm part is used to constrain occurrence of arm joints and arm parts, and these constraints can be efficiently incorporated in dynamic programming for human arm pose inference. Our experimental results show better performance than alternative approaches using hand-crafted features for various still images.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang, L., Yung, N.: Bayesian 3d model based human detection in crowded scenes using efficient optimization. In: 2011 IEEE Workshop on Applications of Computer Vision (WACV), pp. 557–563. IEEE (2011) Wang, L., Yung, N.: Bayesian 3d model based human detection in crowded scenes using efficient optimization. In: 2011 IEEE Workshop on Applications of Computer Vision (WACV), pp. 557–563. IEEE (2011)
2.
Zurück zum Zitat Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. In: IEEE International Conference on Computer Vision (ICCV), pp. 3312–3319. IEEE (2013) Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. In: IEEE International Conference on Computer Vision (ICCV), pp. 3312–3319. IEEE (2013)
3.
Zurück zum Zitat Li, C., Yung, N.: Action categorization based on arm pose modeling. In: Proceedings of the 9th International Conference on Computer Vision Theory and Applications, vol. 2, pp. 39–47 (2014) Li, C., Yung, N.: Action categorization based on arm pose modeling. In: Proceedings of the 9th International Conference on Computer Vision Theory and Applications, vol. 2, pp. 39–47 (2014)
4.
Zurück zum Zitat Li, C., Yung, N.: Categorization of human actions with high dynamics in upper extremities based on arm pose modeling. Mach. Vis. Appl. 26(5), 619–632 (2015)CrossRef Li, C., Yung, N.: Categorization of human actions with high dynamics in upper extremities based on arm pose modeling. Mach. Vis. Appl. 26(5), 619–632 (2015)CrossRef
5.
Zurück zum Zitat Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1385–1392. IEEE (2011) Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1385–1392. IEEE (2011)
6.
Zurück zum Zitat Sapp, B., Taskar, B.: Modec: multimodal decomposable models for human pose estimation. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3674–3681. IEEE (2013) Sapp, B., Taskar, B.: Modec: multimodal decomposable models for human pose estimation. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3674–3681. IEEE (2013)
7.
Zurück zum Zitat Palastanga, N., Field, D., Soames, R.: Anatomy and Human Movement: Structure and Function, vol. 20056. Elsevier Health Sciences, Amsterdam (2006) Palastanga, N., Field, D., Soames, R.: Anatomy and Human Movement: Structure and Function, vol. 20056. Elsevier Health Sciences, Amsterdam (2006)
8.
Zurück zum Zitat Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483. IEEE (2013) Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483. IEEE (2013)
9.
Zurück zum Zitat Conaire, C., O’Connor, N., Smeaton, A.: Detector adaptation by maximising agreement between independent data sources. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–6. IEEE (2007) Conaire, C., O’Connor, N., Smeaton, A.: Detector adaptation by maximising agreement between independent data sources. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–6. IEEE (2007)
10.
Zurück zum Zitat Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 29(1), 51–59 (1996)CrossRef Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 29(1), 51–59 (1996)CrossRef
11.
Zurück zum Zitat Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, vol. 1, pp. 886–893. IEEE (2005) Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, vol. 1, pp. 886–893. IEEE (2005)
12.
Zurück zum Zitat Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef
13.
Zurück zum Zitat Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int. J. Comput. Vis. 43(1), 7–27 (2001)CrossRefMATH Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int. J. Comput. Vis. 43(1), 7–27 (2001)CrossRefMATH
14.
Zurück zum Zitat Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 762–769. IEEE (2004) Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 762–769. IEEE (2004)
15.
Zurück zum Zitat Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefMATH Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefMATH
16.
Zurück zum Zitat LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook Of Brain Theory and Neural Networks, vol. 3361(10) (1995) LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook Of Brain Theory and Neural Networks, vol. 3361(10) (1995)
17.
Zurück zum Zitat Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014) Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
18.
Zurück zum Zitat Chen, X., Yuille, A.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Neural Information Processing Systems, pp. 1736–1744 (2014) Chen, X., Yuille, A.: Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Neural Information Processing Systems, pp. 1736–1744 (2014)
19.
Zurück zum Zitat Jain, A., Tompson, J., LeCun, Y., Bregler, C.: Modeep: a deep learning framework using motion features for human pose estimation. In: Computer Vision-ACCV, pp. 302–315 (2014) Jain, A., Tompson, J., LeCun, Y., Bregler, C.: Modeep: a deep learning framework using motion features for human pose estimation. In: Computer Vision-ACCV, pp. 302–315 (2014)
20.
Zurück zum Zitat Pfister, T., Simonyan, K., Charles, J., Zisserman, A.: Deep convolutional neural networks for efficient pose estimation in gesture videos. In: Computer Vision-ACCV, pp. 538–552 (2014) Pfister, T., Simonyan, K., Charles, J., Zisserman, A.: Deep convolutional neural networks for efficient pose estimation in gesture videos. In: Computer Vision-ACCV, pp. 538–552 (2014)
21.
Zurück zum Zitat Ramakrishna, V., Munoz, D., Hebert, M., Bagnell, J.A., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Computer Vision-ECCV, pp. 33–47. Springer, Berlin (2014) Ramakrishna, V., Munoz, D., Hebert, M., Bagnell, J.A., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In: Computer Vision-ECCV, pp. 33–47. Springer, Berlin (2014)
22.
Zurück zum Zitat Park, D., Ramanan, D.: N-best maximal decoders for part models. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2627–2634. IEEE (2011) Park, D., Ramanan, D.: N-best maximal decoders for part models. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2627–2634. IEEE (2011)
23.
Zurück zum Zitat LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F.: A tutorial on energy-based learning. In: Predicting structured data. MIT press (2006) LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F.: A tutorial on energy-based learning. In: Predicting structured data. MIT press (2006)
24.
Zurück zum Zitat Felzenszwalb, P.F., Zabih, R.: Dynamic programming and graph algorithms in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 721–740 (2011)CrossRef Felzenszwalb, P.F., Zabih, R.: Dynamic programming and graph algorithms in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 721–740 (2011)CrossRef
25.
Zurück zum Zitat Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1365–1372. IEEE (2009) Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1365–1372. IEEE (2009)
26.
Zurück zum Zitat LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
27.
Zurück zum Zitat Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2809–2813. IEEE (2011) Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2809–2813. IEEE (2011)
28.
Zurück zum Zitat Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006) Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006)
29.
Zurück zum Zitat Liu, C.: Probabilistic siamese network for learning representations. Master’s thesis, University of Toronto (2013) Liu, C.: Probabilistic siamese network for learning representations. Master’s thesis, University of Toronto (2013)
30.
Zurück zum Zitat Delalleau, O., Bengio, Y.: Parallel Stochastic Gradient Descent. CIAR Summer School, Toronto (2007) Delalleau, O., Bengio, Y.: Parallel Stochastic Gradient Descent. CIAR Summer School, Toronto (2007)
31.
Zurück zum Zitat Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), vol. 4, p. 3. Austin, TX (2010) Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), vol. 4, p. 3. Austin, TX (2010)
32.
Zurück zum Zitat Goodfellow, I.J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., Bengio, Y.: Pylearn2: a machine learning research library (2013). arXiv preprint arXiv:1308.4214 Goodfellow, I.J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., Bengio, Y.: Pylearn2: a machine learning research library (2013). arXiv preprint arXiv:​1308.​4214
33.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Metadaten
Titel
Human arm pose modeling with learned features using joint convolutional neural network
verfasst von
Chongguo Li
Nelson H. C. Yung
Xing Sun
Edmund Y. Lam
Publikationsdatum
20.07.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Machine Vision and Applications / Ausgabe 1-2/2017
Print ISSN: 0932-8092
Elektronische ISSN: 1432-1769
DOI
https://doi.org/10.1007/s00138-016-0796-0

Weitere Artikel der Ausgabe 1-2/2017

Machine Vision and Applications 1-2/2017 Zur Ausgabe