Skip to main content
Top

2016 | OriginalPaper | Chapter

Human Attribute Recognition by Deep Hierarchical Contexts

Authors : Yining Li, Chen Huang, Chen Change Loy, Xiaoou Tang

Published in: Computer Vision – ECCV 2016

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present an approach for recognizing human attributes in unconstrained settings. We train a Convolutional Neural Network (CNN) to select the most attribute-descriptive human parts from all poselet detections, and combine them with the whole body as a pose-normalized deep representation. We further improve by using deep hierarchical contexts ranging from human-centric level to scene level. Human-centric context captures human relations, which we compute from the nearest neighbor parts of other people on a pyramid of CNN feature maps. The matched parts are then average pooled and they act as a similarity regularization. To utilize the scene context, we re-score human-centric predictions by the global scene classification score jointly learned in our CNN, yielding final scene-aware predictions. To facilitate our study, a large-scale WIDER Attribute dataset(Dataset URL: http://​mmlab.​ie.​cuhk.​edu.​hk/​projects/​WIDERAttribute) is introduced with human attribute and image event annotations, and our method surpasses competitive baselines on this dataset and other popular ones.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
The term ‘Hierarchical Context’ is used in this paper to denote the tree-structured organization of object classes in a scene. We use the same term but with a different meaning of (human) object-object and object-scene contextual relations at two semantic levels, which is also more complete in the coverage of image information.
 
Literature
1.
go back to reference Layne, R., Hospedales, T.M., Gong, S.: Person re-identification by attributes. In: British Machine Vision Conference, pp. 1–11 (2012) Layne, R., Hospedales, T.M., Gong, S.: Person re-identification by attributes. In: British Machine Vision Conference, pp. 1–11 (2012)
2.
go back to reference Liu, C., Gong, S., Loy, C.C.: On-the-fly feature importance mining for person re-identification. Pattern Recogn. 47(4), 1602–1615 (2014)CrossRef Liu, C., Gong, S., Loy, C.C.: On-the-fly feature importance mining for person re-identification. Pattern Recogn. 47(4), 1602–1615 (2014)CrossRef
3.
go back to reference Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L.S., Gao, W.: Multi-task learning with low rank attribute embedding for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3739–3747 (2015) Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L.S., Gao, W.: Multi-task learning with low rank attribute embedding for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3739–3747 (2015)
4.
go back to reference Gong, S., Cristani, M., Yan, S., Loy, C.C.: Person Re-Identification, vol. 1. Springer, London (2014)CrossRefMATH Gong, S., Cristani, M., Yan, S., Loy, C.C.: Person Re-Identification, vol. 1. Springer, London (2014)CrossRefMATH
5.
go back to reference Bourdev, L., Maji, S., Malik, J.: Describing people: poselet-based attribute classification. In: IEEE International Conference on Computer Vision, pp. 1543–1550 (2011) Bourdev, L., Maji, S., Malik, J.: Describing people: poselet-based attribute classification. In: IEEE International Conference on Computer Vision, pp. 1543–1550 (2011)
6.
go back to reference Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: IEEE International Conference on Computer Vision, pp. 729–736 (2013) Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: IEEE International Conference on Computer Vision, pp. 729–736 (2013)
7.
go back to reference Joo, J., Wang, S., Zhu, S.C.: Human attribute recognition by rich appearance dictionary. In: IEEE International Conference on Computer Vision, pp. 721–728 (2013) Joo, J., Wang, S., Zhu, S.C.: Human attribute recognition by rich appearance dictionary. In: IEEE International Conference on Computer Vision, pp. 721–728 (2013)
8.
go back to reference Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.D.: PANDA: pose aligned networks for deep attribute modeling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1644 (2014) Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.D.: PANDA: pose aligned networks for deep attribute modeling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637–1644 (2014)
9.
go back to reference Gkioxari, G., Girshick, R., Malik, J.: Actions and attributes from wholes and parts. In: IEEE International Conference on Computer Vision, pp. 2470–2478 (2015) Gkioxari, G., Girshick, R., Malik, J.: Actions and attributes from wholes and parts. In: IEEE International Conference on Computer Vision, pp. 2470–2478 (2015)
10.
go back to reference Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with R*CNN. In: IEEE International Conference on Computer Vision, pp. 1080–1088 (2015) Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with R*CNN. In: IEEE International Conference on Computer Vision, pp. 1080–1088 (2015)
11.
go back to reference Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: European Conference on Computer Vision, pp. 834–849 (2014) Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: European Conference on Computer Vision, pp. 834–849 (2014)
12.
go back to reference Branson, S., Horn, G.V., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. In: British Machine Vision Conference (2014) Branson, S., Horn, G.V., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. In: British Machine Vision Conference (2014)
13.
go back to reference Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)CrossRef Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)CrossRef
14.
go back to reference Xiong, Y., Zhu, K., Lin, D., Tang, X.: Recognize complex events from static images by fusing deep channels. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1600–1609 (2015) Xiong, Y., Zhu, K., Lin, D., Tang, X.: Recognize complex events from static images by fusing deep channels. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1600–1609 (2015)
15.
go back to reference Sharma, G., Jurie, F.: Learning discriminative spatial representation for image classification. In: British Machine Vision Conference, pp. 1–11 (2011) Sharma, G., Jurie, F.: Learning discriminative spatial representation for image classification. In: British Machine Vision Conference, pp. 1–11 (2011)
16.
go back to reference Hall, D., Perona, P.: Fine-grained classification of pedestrians in video: benchmark and state of the art. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5482–5491 (2015) Hall, D., Perona, P.: Fine-grained classification of pedestrians in video: benchmark and state of the art. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5482–5491 (2015)
17.
go back to reference Sudowe, P., Spitzer, H., Leibe, B.: Person attribute recognition with a jointly-trained holistic CNN model. In: IEEE International Conference on Computer Vision Workshop, pp. 329–337 (2015) Sudowe, P., Spitzer, H., Leibe, B.: Person attribute recognition with a jointly-trained holistic CNN model. In: IEEE International Conference on Computer Vision Workshop, pp. 329–337 (2015)
18.
go back to reference Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785 (2009) Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785 (2009)
19.
go back to reference Huang, C., Change Loy, C., Tang, X.: Unsupervised learning of discriminative attributes and visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5175–5184 (2016) Huang, C., Change Loy, C., Tang, X.: Unsupervised learning of discriminative attributes and visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5175–5184 (2016)
20.
go back to reference Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–958 (2009) Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–958 (2009)
21.
go back to reference Moghaddam, B., Yang, M.H.: Learning gender with support faces. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 707–711 (2002)CrossRef Moghaddam, B., Yang, M.H.: Learning gender with support faces. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 707–711 (2002)CrossRef
22.
go back to reference Shakhnarovich, G., Viola, P.A., Moghaddam, B.: A unified learning framework for real time face detection and classification. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 16–26 (2002) Shakhnarovich, G., Viola, P.A., Moghaddam, B.: A unified learning framework for real time face detection and classification. In: IEEE International Conference on Automatic Face & Gesture Recognition, pp. 16–26 (2002)
23.
go back to reference Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: IEEE International Conference on Computer Vision, pp. 365–372 (2009) Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: IEEE International Conference on Computer Vision, pp. 365–372 (2009)
24.
go back to reference Kumar, N., Belhumeur, P., Nayar, S.: FaceTracer: a search engine for large collections of images with faces. In: European Conference on Computer Vision, pp. 340–353 (2008) Kumar, N., Belhumeur, P., Nayar, S.: FaceTracer: a search engine for large collections of images with faces. In: European Conference on Computer Vision, pp. 340–353 (2008)
25.
go back to reference Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008) Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
26.
go back to reference McCann, S., Lowe, D.G.: Local naive bayes nearest neighbor for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3650–3656 (2012) McCann, S., Lowe, D.G.: Local naive bayes nearest neighbor for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3650–3656 (2012)
27.
go back to reference Zhang, N., Farrell, R., Darrell, T.: Pose pooling kernels for sub-category recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3665–3672 (2012) Zhang, N., Farrell, R., Darrell, T.: Pose pooling kernels for sub-category recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3665–3672 (2012)
28.
go back to reference Johnson, J., Ballan, L., Li, F.: Love thy neighbors: Image annotation by exploiting image metadata. In: IEEE International Conference on Computer Vision, pp. 4624–4632 (2015) Johnson, J., Ballan, L., Li, F.: Love thy neighbors: Image annotation by exploiting image metadata. In: IEEE International Conference on Computer Vision, pp. 4624–4632 (2015)
29.
go back to reference Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)CrossRef Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)CrossRef
30.
go back to reference Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRef Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRef
31.
go back to reference Choi, M.J., Lim, J.J., Torralba, A., Willsky, A.S.: Exploiting hierarchical context on a large database of object categories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 129–136 (2010) Choi, M.J., Lim, J.J., Torralba, A., Willsky, A.S.: Exploiting hierarchical context on a large database of object categories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 129–136 (2010)
32.
go back to reference Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., Yuille, A.: The role of context for object detection and semantic segmentation in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 891–898 (2014) Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., Yuille, A.: The role of context for object detection and semantic segmentation in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 891–898 (2014)
33.
go back to reference Russell, B., Torralba, A., Liu, C., Fergus, R., Freeman, W.T.: Object recognition by scene alignment. In: Advances in Neural Information Processing Systems, pp. 1241–1248 (2007) Russell, B., Torralba, A., Liu, C., Fergus, R., Freeman, W.T.: Object recognition by scene alignment. In: Advances in Neural Information Processing Systems, pp. 1241–1248 (2007)
35.
go back to reference Li, C., Parikh, D., Chen, T.: Extracting adaptive contextual cues from unlabeled regions. In: IEEE International Conference on Computer Vision, pp. 511–518 (2011) Li, C., Parikh, D., Chen, T.: Extracting adaptive contextual cues from unlabeled regions. In: IEEE International Conference on Computer Vision, pp. 511–518 (2011)
36.
go back to reference Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
37.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
38.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
39.
go back to reference Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3D human pose annotations. In: IEEE International Conference on Computer Vision, pp. 1365–1372 (2009) Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3D human pose annotations. In: IEEE International Conference on Computer Vision, pp. 1365–1372 (2009)
40.
go back to reference Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: European Conference on Computer Vision, pp. 168–181 (2010) Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: European Conference on Computer Vision, pp. 168–181 (2010)
41.
go back to reference Deng, Y., Luo, P., Loy, C.C., Tang, X.: Pedestrian attribute recognition at far distance. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 789–792. ACM (2014) Deng, Y., Luo, P., Loy, C.C., Tang, X.: Pedestrian attribute recognition at far distance. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 789–792. ACM (2014)
42.
go back to reference Sharma, G., Jurie, F., Schmid, C.: Expanded parts model for semantic description of humans in still images. IEEE Trans. Pattern Anal. Mach. Intell. (2016) Sharma, G., Jurie, F., Schmid, C.: Expanded parts model for semantic description of humans in still images. IEEE Trans. Pattern Anal. Mach. Intell. (2016)
Metadata
Title
Human Attribute Recognition by Deep Hierarchical Contexts
Authors
Yining Li
Chen Huang
Chen Change Loy
Xiaoou Tang
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-46466-4_41

Premium Partner