Skip to main content
Erschienen in: International Journal of Computer Vision 3/2013

01.12.2013

Coloring Action Recognition in Still Images

verfasst von: Fahad Shahbaz Khan, Rao Muhammad Anwer, Joost van de Weijer, Andrew D. Bagdanov, Antonio M. Lopez, Michael Felsberg

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this article we investigate the problem of human action recognition in static images. By action recognition we intend a class of problems which includes both action classification and action detection (i.e. simultaneous localization and classification). Bag-of-words image representations yield promising results for action classification, and deformable part models perform very well object detection. The representations for action recognition typically use only shape cues and ignore color information. Inspired by the recent success of color in image classification and object detection, we investigate the potential of color for action classification and detection in static images. We perform a comprehensive evaluation of color descriptors and fusion approaches for action recognition. Experiments were conducted on the three datasets most used for benchmarking action recognition in still images: Willow, PASCAL VOC 2010 and Stanford-40. Our experiments demonstrate that incorporating color information considerably improves recognition performance, and that a descriptor based on color names outperforms pure color descriptors. Our experiments demonstrate that late fusion of color and shape information outperforms other approaches on action recognition. Finally, we show that the different color–shape fusion approaches result in complementary information and combining them yields state-of-the-art performance for action classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
As evidenced by the First Workshop on Action Recognition and Pose Estimation in Still Images held in conjunction with ECCV 2012: http://​vision.​stanford.​edu/​apsi2012/​index.​html.
 
2
Only in experiment 6.2.3 do we add additional information from the background.
 
3
Note that the terminology of early and late fusion varies. In some communities early fusion refers to combination before the classifier and late fusion to combination after the classifier (Lan et al. 2012).
 
4
The combined vocabulary \(sc\) is constructed by concatenating the shape and color features before constructing the vocabulary in the combined feature-space.
 
5
Due to the absence of a vocabulary stage, several of the fusion methods explained in Sect. 4 cannot be applied to part-based object detection.
 
9
We also performed experiments replacing HOG with pure color descriptors but significantly inferior results were obtained.
 
10
The confusion matrix is constructed by assigning each image to the class for which it gets the highest classification score.
 
11
Top ranked detections are the top \(N_j\) detections of a class, where \(N_j\) is equal to the number of positive examples for that class.
 
Literatur
Zurück zum Zitat Benavente, R., Vanrell, M., & Baldrich, R. (2008). Parametric fuzzy sets for automatic color naming. Journal of the Optical Society of America A, 25(10), 2582–2593.CrossRef Benavente, R., Vanrell, M., & Baldrich, R. (2008). Parametric fuzzy sets for automatic color naming. Journal of the Optical Society of America A, 25(10), 2582–2593.CrossRef
Zurück zum Zitat Berlin, B., & Kay, P. (1969). Basic color terms: Their universality and evolution. Berkeley, CA: University of California Press. Berlin, B., & Kay, P. (1969). Basic color terms: Their universality and evolution. Berkeley, CA: University of California Press.
Zurück zum Zitat Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via plsa. In Proceedings of the European conference on computer vision. Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via plsa. In Proceedings of the European conference on computer vision.
Zurück zum Zitat Bosch, A., Zisserman, A., & Munoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4), 712–727.CrossRef Bosch, A., Zisserman, A., & Munoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4), 712–727.CrossRef
Zurück zum Zitat Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Conference on computer vision and pattern recognition. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Delaitre, V., Laptev, I., & Sivic, J. (2010). Recognizing human actions in still images: a study of bag-of-features and part-based representations. In Proceedings of the British machine vision conference. Delaitre, V., Laptev, I., & Sivic, J. (2010). Recognizing human actions in still images: a study of bag-of-features and part-based representations. In Proceedings of the British machine vision conference.
Zurück zum Zitat Delaitre, V., Sivic, J., & Laptev, I. (2011). Learning person-object interactions for action recognition in still images. In Advances in neural information processing systems. Delaitre, V., Sivic, J., & Laptev, I. (2011). Learning person-object interactions for action recognition in still images. In Advances in neural information processing systems.
Zurück zum Zitat Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In Proceedings of the European conference on computer vision Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In Proceedings of the European conference on computer vision
Zurück zum Zitat Elfiky, N., Khan, F. S., van de Weijer, J., & Gonzalez, J. (2012). Discriminative compact pyramids for object and scene recognition. Pattern Recognition, 45(4), 1627–1636.MATHCrossRef Elfiky, N., Khan, F. S., van de Weijer, J., & Gonzalez, J. (2012). Discriminative compact pyramids for object and scene recognition. Pattern Recognition, 45(4), 1627–1636.MATHCrossRef
Zurück zum Zitat Everingham, M., Gool, L.V., Williams, C.K.I., JWinn, Zisserman A. (2009). The pascal visual object classes challenge 2009 (VOC2009) results. Everingham, M., Gool, L.V., Williams, C.K.I., JWinn, Zisserman A. (2009). The pascal visual object classes challenge 2009 (VOC2009) results.
Zurück zum Zitat Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338. Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Zurück zum Zitat Felsberg, M., & Hedborg, J. (2007). Real-time view-based pose recognition and interpolation for tracking initialization. Journal of Real-Time Image Processing, 2(3), 103–115.CrossRef Felsberg, M., & Hedborg, J. (2007). Real-time view-based pose recognition and interpolation for tracking initialization. Journal of Real-Time Image Processing, 2(3), 103–115.CrossRef
Zurück zum Zitat Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef
Zurück zum Zitat Gaidon, A., Harchaoui, Z., & Schmid, C. (2011). Actom sequence models for efficient action detection. In Conference on computer vision and pattern recognition. Gaidon, A., Harchaoui, Z., & Schmid, C. (2011). Actom sequence models for efficient action detection. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Gehler, P. V., & Nowozin, S. (2009). On feature combination for multiclass object classification. In Proceedings of IEEE international conference on computer vision. Gehler, P. V., & Nowozin, S. (2009). On feature combination for multiclass object classification. In Proceedings of IEEE international conference on computer vision.
Zurück zum Zitat Geusebroek, J. M., van den Boomgaard, R., Smeulders, A. W. M., & Geerts, H. (2001). Color invariance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12), 1338–1350.CrossRef Geusebroek, J. M., van den Boomgaard, R., Smeulders, A. W. M., & Geerts, H. (2001). Color invariance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12), 1338–1350.CrossRef
Zurück zum Zitat Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In European conference on computer vision. Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In European conference on computer vision.
Zurück zum Zitat Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., & Huang, T. S. (2009). Action detection in complex scenes with spatial and temporal ambiguities. In Proceedings of IEEE international conference on computer vision. Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., & Huang, T. S. (2009). Action detection in complex scenes with spatial and temporal ambiguities. In Proceedings of IEEE international conference on computer vision.
Zurück zum Zitat Khan, F. S., van de Weijer, J., Bagdanov, A. D., & Vanrell, M. (2011). Portmanteau vocabularies for multi-cue image representations. In Advances in neural information processing systems. Khan, F. S., van de Weijer, J., Bagdanov, A. D., & Vanrell, M. (2011). Portmanteau vocabularies for multi-cue image representations. In Advances in neural information processing systems.
Zurück zum Zitat Khan, F. S., Anwer, R. M., van de Weijer, J., Bagdanov, A. D., Vanrell, M., & Lopez, A. M. (2012a). Color attributes for object detection. In Conference on computer vision and pattern recognition. Khan, F. S., Anwer, R. M., van de Weijer, J., Bagdanov, A. D., Vanrell, M., & Lopez, A. M. (2012a). Color attributes for object detection. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Khan, F. S., van de Weijer, J., & Vanrell, M. (2012b). Modulating shape features by color attention for object recognition. International Journal of Computer Vision, 98(1), 49–64.CrossRef Khan, F. S., van de Weijer, J., & Vanrell, M. (2012b). Modulating shape features by color attention for object recognition. International Journal of Computer Vision, 98(1), 49–64.CrossRef
Zurück zum Zitat Lan, Z. Z., Bao, L., Yu, S. I., Liu, W., & Hauptmann, A. G. (2012). Double fusion for multimedia event detection. In Multimedia Modeling. Lan, Z. Z., Bao, L., Yu, S. I., Liu, W., & Hauptmann, A. G. (2012). Double fusion for multimedia event detection. In Multimedia Modeling.
Zurück zum Zitat Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision & pattern recognition. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision & pattern recognition.
Zurück zum Zitat Lenz, R., Bui, T. H., & Hernandez-Andres, J. (2005). Group theoretical structure of spectral spaces. Journal of Mathematical Imaging and Vision, 23(3), 297–313.MathSciNetCrossRef Lenz, R., Bui, T. H., & Hernandez-Andres, J. (2005). Group theoretical structure of spectral spaces. Journal of Mathematical Imaging and Vision, 23(3), 297–313.MathSciNetCrossRef
Zurück zum Zitat Li, L. J., Su, H., Xing, E. P., & Li, F. F. (2010). Object bank: A high-level image representation for scene classification and semantic feature sparsification. In Advances in neural information processing systems. Li, L. J., Su, H., Xing, E. P., & Li, F. F. (2010). Object bank: A high-level image representation for scene classification and semantic feature sparsification. In Advances in neural information processing systems.
Zurück zum Zitat Lowe, D. G. (2004). Distinctive image features from scale-invariant points. International Journal of Computer Vision, 60(2), 91–110.CrossRef Lowe, D. G. (2004). Distinctive image features from scale-invariant points. International Journal of Computer Vision, 60(2), 91–110.CrossRef
Zurück zum Zitat Maji, S., Bourdev, L. D., & Malik, J. (2011). Action recognition from a distributed representation of pose and appearance. In Computer vision and pattern recognition. Maji, S., Bourdev, L. D., & Malik, J. (2011). Action recognition from a distributed representation of pose and appearance. In Computer vision and pattern recognition.
Zurück zum Zitat Mullen, K. T. (1985). The contrast sensitivity of human colour vision to red–green and blue–yellow chromatic gratings. The Journal of Physiology, 359, 381–400. Mullen, K. T. (1985). The contrast sensitivity of human colour vision to red–green and blue–yellow chromatic gratings. The Journal of Physiology, 359, 381–400.
Zurück zum Zitat Pagani, A., Stricker, D., & Felsberg, M. (2009). Integral p-channels for fast and robust region matching. In Proceedings of international consortium for intergenerational programmes. Pagani, A., Stricker, D., & Felsberg, M. (2009). Integral p-channels for fast and robust region matching. In Proceedings of international consortium for intergenerational programmes.
Zurück zum Zitat Prest, A., Schmid, C., & Ferrari, V. (2012). Weakly supervised learning of interactions between humans and objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 601–614. Prest, A., Schmid, C., & Ferrari, V. (2012). Weakly supervised learning of interactions between humans and objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 601–614.
Zurück zum Zitat van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596. van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596.
Zurück zum Zitat Shapovalova, N., Gong, W., Pedersoli, M., Roca, F. X., & Gonzalez, J. (2011). On importance of interactions and context in human action recognition. In Iberian conference on pattern recognition and image analysis. Shapovalova, N., Gong, W., Pedersoli, M., Roca, F. X., & Gonzalez, J. (2011). On importance of interactions and context in human action recognition. In Iberian conference on pattern recognition and image analysis.
Zurück zum Zitat Sharma, G., Jurie, F., & Schmid, C. (2012). Discriminative spatial saliency for image classification. In Conference on computer vision and pattern recognition. Sharma, G., Jurie, F., & Schmid, C. (2012). Discriminative spatial saliency for image classification. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Sharma, G., Jurie, F., & Schmid, C. (2013). Expanded parts model for human attribute and action recognition in still images. In Conference on computer vision and pattern recognition. Sharma, G., Jurie, F., & Schmid, C. (2013). Expanded parts model for human attribute and action recognition in still images. In Conference on computer vision and pattern recognition.
Zurück zum Zitat Tran, D., & Yuan, J. (2012). Max-margin structured output regression for spatio-temporal action localization. In Advances in neural information processing systems. Tran, D., & Yuan, J. (2012). Max-margin structured output regression for spatio-temporal action localization. In Advances in neural information processing systems.
Zurück zum Zitat Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In Proceedings of IEEE international conference on computer vision. Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In Proceedings of IEEE international conference on computer vision.
Zurück zum Zitat Vigo, D. A. R., Khan, F. S., van de Weijer, J. & Gevers, T. (2010). The impact of color on bag-of-words based object recognition. In Indian council of philosophical research. Vigo, D. A. R., Khan, F. S., van de Weijer, J. & Gevers, T. (2010). The impact of color on bag-of-words based object recognition. In Indian council of philosophical research.
Zurück zum Zitat Wang, J., Yang, J., Yu, K., Lv, F., Huang, T. S., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In Conference on computer vision and pattern recognition. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T. S., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In Conference on computer vision and pattern recognition.
Zurück zum Zitat van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In Proceedings of the European conference on computer vision. van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In Proceedings of the European conference on computer vision.
Zurück zum Zitat van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In International consortium for intergenerational programmes. van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In International consortium for intergenerational programmes.
Zurück zum Zitat van de Weijer, J., Schmid, C., Verbeek, J. J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transaction in Image Processing (TIP), 18(7), 1512–1524.CrossRef van de Weijer, J., Schmid, C., Verbeek, J. J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transaction in Image Processing (TIP), 18(7), 1512–1524.CrossRef
Zurück zum Zitat Yao, B., & Li, F. F. (2012). Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1691–1703.MathSciNetCrossRef Yao, B., & Li, F. F. (2012). Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1691–1703.MathSciNetCrossRef
Zurück zum Zitat Yao, B., Jiang, X., Khosla, A., Lin, A. L., Guibas, L. J., & Li, F. F. (2011). Human action recognition by learning bases of action attributes and parts. In Proceedings of IEEE international conference on computer vision. Yao, B., Jiang, X., Khosla, A., Lin, A. L., Guibas, L. J., & Li, F. F. (2011). Human action recognition by learning bases of action attributes and parts. In Proceedings of IEEE international conference on computer vision.
Zurück zum Zitat Yuan, J., Liu, Z., & Wu, Y. (2011). Discriminative video pattern search for efficient action detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1728–1743.CrossRef Yuan, J., Liu, Z., & Wu, Y. (2011). Discriminative video pattern search for efficient action detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1728–1743.CrossRef
Zurück zum Zitat Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object catergories: An in-depth study. A comprehensive study. International Journal of Computer Vision, 73(2), 213–218.CrossRef Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object catergories: An in-depth study. A comprehensive study. International Journal of Computer Vision, 73(2), 213–218.CrossRef
Zurück zum Zitat Zhang, J., Huang, K., Yu, Y., & Tan, T. (2010). Boosted local structured hog-lbp for object localization. In IEEE conference on computer vision & pattern recognition. Zhang, J., Huang, K., Yu, Y., & Tan, T. (2010). Boosted local structured hog-lbp for object localization. In IEEE conference on computer vision & pattern recognition.
Metadaten
Titel
Coloring Action Recognition in Still Images
verfasst von
Fahad Shahbaz Khan
Rao Muhammad Anwer
Joost van de Weijer
Andrew D. Bagdanov
Antonio M. Lopez
Michael Felsberg
Publikationsdatum
01.12.2013
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2013
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-013-0633-0

Weitere Artikel der Ausgabe 3/2013

International Journal of Computer Vision 3/2013 Zur Ausgabe